Got Juice?
One thing that has made me realize I’m old was the way I thought about block filesystems. When I setup my current Kubernetes cluster, I needed to replicate a hardware SAN, and the two stacks that were always trumpeted were Longhorn and Rook / Ceph. I’ve worked with Ceph in production on three separate occasions, and it makes my blood run cold even imaging troubleshooting Rook through Kubernetes in production.
That left me with Longhorn, and despie it’s early sharp edges that other people have experienced, by the time I started using it everything was humming along nicely, and why not? DRDB, Ganesha NFS, LVM, XFS. These are all super solid and stable components to use as a foundation. I’ve never had any serious problems with Longhorn, I export RWX volumes and understand the limitations of network filesystems. I’ve intentionally made sure to avoid anything not lock safe for NFS. No WAL mode Sqlite, or anything that would complicate write locks. I have a dedicated Postgres cluster to handle those SQL situations, and S3 for apps that are aware.
However, I had this nagging question in the back of my head. If S3, Redis, and SQL, all know how to distribute and balance themeselves, why am I having the block device layer try to do all of this underneath? Since I set my constraints to be lock safe filesystem aware applications, I can do without the low level replication altogether. Simple stateful storage is always the solution I strive for, so I started investigating JuiceFS as a way replace Ganesha.
JuiceFS exists as a Posix management layer on top of object storage, so my first task was choosing what to use for metadata. I do enjoy handling more infrastructure tasks with my PostgreSQL cluster, but I ended up choosing Redis. With Redis, I know performance will be good, operation will be simple, and I can persist back to s3 for backup/restore using a simpler mountpoint-s3 driver. I like the idea of storing all the filesystem state under s3, and backing up the same s3 source for both metadata and chunks.