lenain.info • Scalable network filesystem with Gluster

Une version Française de ce post est disponible.

Gluster Ant Logo

Before starting to reconfigure my home infrastructure I realized that I needed shared content between my two main servers. This may not seems obvious at first sight, but you will understand why in the following posts.

I wanted distributed, replicated data. Kinda like a RAID 1 network filesystem.

I already heard about Gluster some time ago but never had the chance to use it. Let's give it a shot!

Gluster is a free and open source scalable network filesystem

Let's review some terms that we'll use in this post:

Trusted pool describe all the hosts in the Gluster cluster.
A Node or Peer is one of the server of the cluster.
A Brick is a filesystem used by Gluster as storage.
A Volume is a collection of one or more Brick.

We will set up a Gluster cluster made of three nodes, each of them will host one brick. These bricks will be part of a single volume where data will be replicated.

Sample node 1 will be ryokan.onsen.lan, IP 192.168.0.1
Sample node 2 will be uchiyu.onsen.lan, IP 192.168.0.2
Sample node 3 will be buro.onsen.lan, IP 192.168.0.7
Sample Gluster Volume will be onsen-gv0

Install all the things

Our three nodes have Debian as operating system. Gluster install is easy :

$ sudo apt install glusterfs-server

Managing the Gluster service

We want to have Gluster service started automatically at server boot. Let's enable the service and start it right now.

On all servers:

$ sudo systemctl enable glusterd
$ sudo service glusterd start

Managing the trusted pool

Gluster uses a pool of trusted nodes communicating with each other to share data. We'll make both servers recognize their peer.

On all servers, mind the IP:

$ sudo gluster peer probe 192.168.0.2
Probe successful

We can check peer status and view peers list of the trusted pool:

$ sudo gluster peer status
Number of Peers: 1

Hostname: 192.168.0.2
Uuid: 98b75cb8-12d3-477d-b2e8-a54e1df926ed
State: Peer in Cluster (Connected)

Hostname: 192.168.0.7
Uuid: bfb6a644-21f2-440d-920d-317a0cbdd836
State: Peer in Cluster (Connected)
$ sudo gluster pool list
UUID                    Hostname    State
98b75cb8-12d3-477d-b2e8-a54e1df926ed    192.168.0.2 Connected
bfb6a644-21f2-440d-920d-317a0cbdd836    192.168.0.7 Connected
406cdbd7-f748-40f4-850d-2230aa1f5431    localhost   Connected

Setting up storage

Gluster works with bricks. Each node should have a brick so it can be used in a volume. Let's create the required filesystems for the bricks on all nodes.

Our three nodes have an LVM volume group with remaining unallocated space. We'll create two logical volumes and format them as btrfs filesystems.

Gluster recommends XFS but any filesystem with extended attributes will do. We could also set up a thinly provisioned logical volume and then a thin volume, but I wanted to keep things simple at the moment.

We then add the created filesystems to nodes' /etc/fstab, mount them and create a subdirectory to host the brick data. We'll follow Gluster brick naming convention. Rather than using /data I decided to use /srv in the FHS because I consider this brick data being site-specific data.

On all servers, mind the hostname:

$ sudo lvcreate -L 1G -n lvgluster vgryokan
$ sudo mkfs.btrfs /dev/mapper/vgryokan-lvgluster
$ sudo echo "/dev/mapper/vgryokan-lvgluster /srv/glusterfs btrfs defaults 0 0" >> /etc/fstab
$ sudo mount /srv/glusterfs
$ sudo mkdir -p /srv/glusterfs/onsen-gv0/brick1

Setting up volume

We create a distributed replicated volume using the created bricks and start it.

On a single node:

$ sudo gluster volume create onsen-gv0 replica 3 192.168.0.1:/srv/glusterfs/onsen-gv0/brick1 192.168.0.2:/srv/glusterfs/onsen-gv0/brick1 192.168.0.77:/srv/glusterfs/onsen-gv0/brick1
volume create: onsen-gv0: success: please start the volume to access data
$ sudo gluster volume start onsen-gv0
volume start: onsen-gv0: success

Accessing the volume content

To access the volume content, we need to mount a filesystem of type glusterfs. We create a mount point directory and mount the volume on it.

On all servers:

$ sudo mkdir -p /srv/glusterfs/mnt
$ sudo mount -t glusterfs localhost:onsen-gv0 /srv/glusterfs/mnt

We finally create a file in the newly mounted volume and check on other systems that the file is correctly replicated.

$ sudo touch /srv/glusterfs/mnt/test # Replicated on the other nodes !

Mounting the Gluster volume at boot time

As we want the volume data to be available as soon as possible, we add the following line to each node /etc/fstab:

localhost:/onsen-gv0 /srv/glusterfs/mnt glusterfs defaults,_netdev 0 0

We use the following options:

_netdev : The filesystem resides on a device that requires network access (used to prevent the system from attempting to mount these filesystems until the network has been enabled on the system).

Rumble of thunder

Here are some operational things or issues I encountered, and how I delt with them.

Brick is already part of a volume

When I tried to add a brick that was previously used in another volume, I got this error:

failed: /srv/glusterfs/onsen-gv0/brick1/ is already part of a volume

To reuse it, I had to remove Gluster specific attributes and metadata on the brick:

$ sudo setfattr -x trusted.glusterfs.volume-id /srv/glusterfs/onsen-gv0/brick1/
$ sudo setfattr -x trusted.gfid /srv/glusterfs/onsen-gv0/brick1/
$ sudo rm -Rf /srv/glusterfs/onsen-gv0/brick1/.glusterfs/

Errors when trying to remove a brick from a replicated volume

After adding ofuro.onsen.lan as the 4th node, to replace buro.onsen.lan, I got these message when I tried to remove the buron.onsen.lan brick.

$ sudo gluster volume remove-brick onsen-gv0 192.168.0.7:/srv/glusterfs/onsen-gv0/brick1 start
Running remove-brick with cluster.force-migration enabled can result in data corruption. It is safer to disable this option so that files that receive writes during migration are not migrated.
Files that are not migrated can then be manually copied after the remove-brick commit operation.
Do you want to continue with your current cluster.force-migration settings? (y/n) y
...

But my volume hasn't been set up with cluster.force-migration:

$ sudo gluster volume get onsen-gv0 force-migration
Option                                  Value
------                                  -----
cluster.force-migration                 off

An issue is open here, and a rephrase has been proposed here. However, to my opinion, it would be more user friendly to show only this message when the volume has the option set.

...

volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.

Obviously, if we remove a brick from a 4 bricks replicated configuration, we have to specify that we want to reduce the number of replicas too.

Let's try again:

$ sudo gluster volume remove-brick onsen-gv0 replica 3 192.168.0.7:/srv/glusterfs/onsen-gv0/brick1 start
...
volume remove-brick start: failed: Migration of data is not needed when reducing replica count. Use the 'force' option

Well... mmh okay? 🤷 Gluster, why don't you tell those matters first? Or, better, can't you just handle that yourself?

root@ryokan:~# gluster volume remove-brick onsen-gv0 replica 3 192.168.0.7:/srv/glusterfs/onsen-gv0/brick1 force
Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume.
Do you want to continue? (y/n) y
volume remove-brick commit force: success