Recently I fooled around with GlusterFS, a distributed file system especially intended for use as backend storage. The first impression is quite good – as long as you use it for the right task.
GlusterFS is a distributed file system which is supposed to scale to large storage sizes. Besides file distribution it also offers “RAID” like features: if you have two GlusterFS servers you can either stripe the data on both of them, or mirror them. Or, if you got more servers, you can even create more complex setups with a mixture of striping and mirroring. The client protocol is very similar to NFS, and thus clients can GlusterFS servers via GlusterFS-Fuse or directly via NFS.
The advantage of a distributed file system is obvious: if one server goes down, the clients fail over to another server, and you still have the data. Also, you can add more servers in case you need more storage. A distinct advantage of GlusterFS compared to for example AFS or Ceph is that you do not need to worry about setting up special servers dedicated to processes like addressing the data. Each GlusterFS server has the same rights and tasks as the next (in simple setups, highly complex setups may vary).
GlusterFS faces competition first of all in its own field: distributed file systems as mentioned above (Ceph, AFS, etc.). But for simpler setups you can also create similar results with a shared block level device like DRBD and and NFS or even cluster file systems like GFS and OCFS. And in the end GlusterFS steps up to fight against high cost storage systems (SAN, …).
For my tests of GlusterFS I used a couple of CentOS 6 VMs in KVM on my laptop. I followed the well written GlusterFS-CentOS-Howto by falco. However, one thing which is probably not outlined enough in the howto: GlusterFS really depends on host name resolution. And really depending means that an entry in
/etc/hosts on the GlusterFS machines is not enough. You should have a working DNS!
That said, the easiest way to accomplish this with a running KVM test setup is to just add the necessary entries to
/etc/hosts on the host side (!). These information are forwarded to the VM guests and provide a reliable name resolution.
Tests and Results
I run a couple of tests: shutting down a server, shutting down the interfaces, shutting down the GlusterFS services, etc. The data on the machines were a couple of various text files, nothing fancy.
And all in all, I must say I was impressed with the features and capabilities: the setup was reliable, files where automatically distributed, servers started offline were resynced automatically, and so on. The collision detection is pretty neat, it works on file level: even if there occurs a split brain, i/o is only blocked for files which are actually affected. This is – of course and by design – a huge advantage over shared block devices which block the access to the entire device once a split brain occurs.
The only disadvantage which I run into was that when a glusterfs service goes down, but the machine hosting the service is still there: you might run into a (configurable) network timeout of roughly 45 seconds till the client fails over to the other server. This is by design, but might have a bad impact on apps that need close-to-instant access to files. For such cases it might make sense to bring in STONITH or add a LVS or similar in front of the service.
Unfortunately I wasn’t able to perform usable performance tests. This was a virtualized setup on a laptop after all, so I cannot really say how the performance of GlusterFS is under heavy load, with large amounts of files, etc.
GlusterFS is an impressive distributed file system with many features which is really easy to use: the setup is much simpler than AFS or Ceph, and you don’t have to worry about the complex and fragile setups including cluster file systems and an outdated DLM.
So GlusterFS seems to be about perfect for sharing files among many clients while also providing HA features in an environment where you are not able or willing to use a SAN.
What is left now is to figure out in how far GlusterFS works as a backend for storing VM images: it is said that the performance of GlusterFS in version 3.3 was largely increased with respect to the usual access behavior of virtual machine images, many reads and writes on a single, large file. It is often said that there is the largest gap between GlusterFS and Ceph since Ceph can also export block devices.
But I haven’t tested that yet, that’s one for the next test run.