Une version Française de ce post est disponible.
Before starting to reconfigure my home infrastructure I realized that I needed shared content between my two main servers. This may not seems obvious at first sight, but you will understand why in the following posts.
I wanted distributed, replicated data. Kinda like a RAID 1 network filesystem.
I already heard about Gluster some time ago but never had the chance to use it. Let's give it a shot!
Gluster is a free and open source scalable network filesystem
Let's review some terms that we'll use in this post:
- Trusted pool describe all the hosts in the Gluster cluster.
- A Node or Peer is one of the server of the cluster.
- A Brick is a filesystem used by Gluster as storage.
- A Volume is a collection of one or more Brick.
We will set up a Gluster cluster made of three nodes, each of them will host one brick. These bricks will be part of a single volume where data will be replicated.
- Sample node 1 will be
ryokan.onsen.lan
, IP192.168.0.1
- Sample node 2 will be
uchiyu.onsen.lan
, IP192.168.0.2
- Sample node 3 will be
buro.onsen.lan
, IP192.168.0.7
- Sample Gluster Volume will be
onsen-gv0
Install all the things
Our three nodes have Debian as operating system. Gluster install is easy :
$ sudo apt install glusterfs-server
Managing the Gluster service
We want to have Gluster service started automatically at server boot. Let's enable the service and start it right now.
On all servers:
$ sudo systemctl enable glusterd
$ sudo service glusterd start
Managing the trusted pool
Gluster uses a pool of trusted nodes communicating with each other to share data. We'll make both servers recognize their peer.
On all servers, mind the IP:
$ sudo gluster peer probe 192.168.0.2
Probe successful
We can check peer status and view peers list of the trusted pool:
$ sudo gluster peer status
Number of Peers: 1
Hostname: 192.168.0.2
Uuid: 98b75cb8-12d3-477d-b2e8-a54e1df926ed
State: Peer in Cluster (Connected)
Hostname: 192.168.0.7
Uuid: bfb6a644-21f2-440d-920d-317a0cbdd836
State: Peer in Cluster (Connected)
$ sudo gluster pool list
UUID Hostname State
98b75cb8-12d3-477d-b2e8-a54e1df926ed 192.168.0.2 Connected
bfb6a644-21f2-440d-920d-317a0cbdd836 192.168.0.7 Connected
406cdbd7-f748-40f4-850d-2230aa1f5431 localhost Connected
Setting up storage
Gluster works with bricks. Each node should have a brick so it can be used in a volume. Let's create the required filesystems for the bricks on all nodes.
Our three nodes have an LVM
volume group with remaining unallocated space.
We'll create two logical volumes and format them as btrfs
filesystems.
Gluster recommends XFS
but any filesystem with extended attributes
will do. We could also set up a thinly provisioned logical volume and
then a thin volume, but I wanted to keep things simple at the moment.
We then add the created filesystems to nodes' /etc/fstab
, mount them and
create a subdirectory to host the brick data. We'll follow
Gluster brick naming convention.
Rather than using /data
I decided to use /srv
in the
FHS because I consider
this brick data being site-specific data.
On all servers, mind the hostname:
$ sudo lvcreate -L 1G -n lvgluster vgryokan
$ sudo mkfs.btrfs /dev/mapper/vgryokan-lvgluster
$ sudo echo "/dev/mapper/vgryokan-lvgluster /srv/glusterfs btrfs defaults 0 0" >> /etc/fstab
$ sudo mount /srv/glusterfs
$ sudo mkdir -p /srv/glusterfs/onsen-gv0/brick1
Setting up volume
We create a distributed replicated volume using the created bricks and start it.
On a single node:
$ sudo gluster volume create onsen-gv0 replica 3 192.168.0.1:/srv/glusterfs/onsen-gv0/brick1 192.168.0.2:/srv/glusterfs/onsen-gv0/brick1 192.168.0.77:/srv/glusterfs/onsen-gv0/brick1
volume create: onsen-gv0: success: please start the volume to access data
$ sudo gluster volume start onsen-gv0
volume start: onsen-gv0: success
Accessing the volume content
To access the volume content, we need to mount a filesystem of type glusterfs
.
We create a mount point directory and mount the volume on it.
On all servers:
$ sudo mkdir -p /srv/glusterfs/mnt
$ sudo mount -t glusterfs localhost:onsen-gv0 /srv/glusterfs/mnt
We finally create a file in the newly mounted volume and check on other systems that the file is correctly replicated.
$ sudo touch /srv/glusterfs/mnt/test # Replicated on the other nodes !
Mounting the Gluster volume at boot time
As we want the volume data to be available as soon as possible,
we add the following line to each node /etc/fstab
:
localhost:/onsen-gv0 /srv/glusterfs/mnt glusterfs defaults,_netdev 0 0
We use the following options:
- _netdev : The filesystem resides on a device that requires network access (used to prevent the system from attempting to mount these filesystems until the network has been enabled on the system).
Rumble of thunder
Here are some operational things or issues I encountered, and how I delt with them.
Brick is already part of a volume
When I tried to add a brick that was previously used in another volume, I got this error:
failed: /srv/glusterfs/onsen-gv0/brick1/ is already part of a volume
To reuse it, I had to remove Gluster specific attributes and metadata on the brick:
$ sudo setfattr -x trusted.glusterfs.volume-id /srv/glusterfs/onsen-gv0/brick1/
$ sudo setfattr -x trusted.gfid /srv/glusterfs/onsen-gv0/brick1/
$ sudo rm -Rf /srv/glusterfs/onsen-gv0/brick1/.glusterfs/
Errors when trying to remove a brick from a replicated volume
After adding ofuro.onsen.lan
as the 4th node, to replace buro.onsen.lan
,
I got these message when I tried to remove the buron.onsen.lan
brick.
$ sudo gluster volume remove-brick onsen-gv0 192.168.0.7:/srv/glusterfs/onsen-gv0/brick1 start
Running remove-brick with cluster.force-migration enabled can result in data corruption. It is safer to disable this option so that files that receive writes during migration are not migrated.
Files that are not migrated can then be manually copied after the remove-brick commit operation.
Do you want to continue with your current cluster.force-migration settings? (y/n) y
...
But my volume hasn't been set up with cluster.force-migration
:
$ sudo gluster volume get onsen-gv0 force-migration
Option Value
------ -----
cluster.force-migration off
An issue is open here, and a rephrase has been proposed here. However, to my opinion, it would be more user friendly to show only this message when the volume has the option set.
...
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
Obviously, if we remove a brick from a 4 bricks replicated configuration, we have to specify that we want to reduce the number of replicas too.
Let's try again:
$ sudo gluster volume remove-brick onsen-gv0 replica 3 192.168.0.7:/srv/glusterfs/onsen-gv0/brick1 start
...
volume remove-brick start: failed: Migration of data is not needed when reducing replica count. Use the 'force' option
Well... mmh okay? 🤷 Gluster, why don't you tell those matters first? Or, better, can't you just handle that yourself?
root@ryokan:~# gluster volume remove-brick onsen-gv0 replica 3 192.168.0.7:/srv/glusterfs/onsen-gv0/brick1 force
Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume.
Do you want to continue? (y/n) y
volume remove-brick commit force: success