The following is an excerpt from Chapter 8 of Ansible for DevOps, a book on Ansible by Jeff Geerling.
Modern infrastructure often involves some amount of horizontal scaling; instead of having one giant server, with one storage volume, one database, one application instance, etc., most apps use two, four, ten, or dozens of servers.
Many applications can be scaled horizontally with ease, but what happens when you need shared resources, like files, application code, or other transient data, to be shared on all the servers? And how do you have this data scale out with your infrastructure, in a fast but reliable way? There are many different approaches to synchronizing or distributing files across servers:
- Set up rsync either on cron or via inotify to synchronize smaller sets of files on a regular basis.
- Store everything in a code repository (e.g. Git, SVN, etc.) and deploy files to each server using Ansible.
- Have one large volume on a file server and mount it via NFS or some other file sharing protocol.
- Have one master SAN that's mounted on each of the servers.
- Use a distributed file system, like Gluster, Lustre, Fraunhofer, or Ceph.
Some options are easier to set up than others, and all have benefits—and drawbacks. Rsync, git, or NFS offer simple initial setup, and low impact on filesystem performance (in many scenarios). But if you need more flexibility and scalability, less network overhead, and greater fault tolerance, you will have to consider something that requires more configuration (e.g. a distributed file system) and/or more hardware (e.g. a SAN).
GlusterFS is licensed under the AGPL license, has good documentation, and a fairly active support community (especially in the #gluster IRC channel). But to someone new to distributed file systems, it can be daunting to get set it up the first time.
Configuring Gluster - Basic Overview
To get Gluster working on a basic two-server setup (so you can have one folder that's synchronized and replicated across the two servers—allowing one server to go down completely, and the other to still have access to the files), you need to do the following:
- Install Gluster server and client on each server, and start the server daemon.
- (On both servers) Create a 'brick' directory (where Gluster will store files for a given volume).
- (On both servers) Create a directory to be used as a mount point (a directory where you'll have Gluster mount the shared volume).
- (On both servers) Use
gluster peer probe
to have Gluster connect to the other server. - (On one server) Use
gluster volume create
to create a new Gluster volume. - (On one server) Use
gluster volume start
to start the new Gluster volume. - (On both servers) Mount the gluster volume (adding a record to
/etc/fstab
to make the mount permanent).
Additionally, you need to make sure you have the following ports open on both servers (so Gluster can communicate): TCP ports 111, 24007-24011, 49152-49153, and UDP port 111. (You need to add an additional TCP port in the 49xxx range for each extra server in your Gluster cluster.)
Configuring Gluster with Ansible
For demonstration purposes, we'll set up a simple two-server infrastructure using Vagrant, and create a shared volume between the two, with two replicas (meaning all files will be replicated on each server). As your infrastructure grows, you can set other options for data consistency and transport according to your needs.
To build the two-server infrastructure locally, create a folder gluster
containing the following Vagrantfile
:
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
# Base VM OS configuration.
config.vm.box = "geerlingguy/ubuntu1404"
config.vm.synced_folder '.', '/vagrant', disabled: true
config.ssh.insert_key = false
config.vm.provider :virtualbox do |v|
v.memory = 256
v.cpus = 1
end
# Define two VMs with static private IP addresses.
boxes = [
{ :name => "gluster1", :ip => "192.168.29.2" },
{ :name => "gluster2", :ip => "192.168.29.3" }
]
# Provision each of the VMs.
boxes.each do |opts|
config.vm.define opts[:name] do |config|
config.vm.hostname = opts[:name]
config.vm.network :private_network, ip: opts[:ip]
# Provision both VMs using Ansible after the last VM is booted.
if opts[:name] == "gluster2"
config.vm.provision "ansible" do |ansible|
ansible.playbook = "playbooks/provision.yml"
ansible.inventory_path = "inventory"
ansible.limit = "all"
end
end
end
end
end
This configuration creates two servers, gluster1
and gluster2
, and will run a playbook at playbooks/provision.yml
on the servers defined in an inventory
file in the same directory as the Vagrantfile.
Create the inventory
file to help Ansible connect to the two servers:
[gluster]
192.168.29.2
192.168.29.3
[gluster:vars]
ansible_ssh_user=vagrant
ansible_ssh_private_key_file=~/.vagrant.d/insecure_private_key
Now, create a playbook named provision.yml
inside a playbooks
directory:
---
- hosts: gluster
sudo: yes
vars_files:
- vars.yml
roles:
- geerlingguy.firewall
- geerlingguy.glusterfs
tasks:
- name: Ensure Gluster brick and mount directories exist.
file: "path={{ item }} state=directory mode=0775"
with_items:
- "{{ gluster_brick_dir }}"
- "{{ gluster_mount_dir }}"
- name: Configure Gluster volume.
gluster_volume:
state: present
name: "{{ gluster_brick_name }}"
brick: "{{ gluster_brick_dir }}"
replicas: 2
cluster: "{{ groups.gluster | join(',') }}"
host: "{{ inventory_hostname }}"
force: yes
run_once: true
- name: Ensure Gluster volume is mounted.
mount:
name: "{{ gluster_mount_dir }}"
src: "{{ inventory_hostname }}:/{{ gluster_brick_name }}"
fstype: glusterfs
opts: "defaults,_netdev"
state: mounted
This playbook uses two roles to set up a firewall and install the required packages for GlusterFS to work. You can manually install both of the required roles with the command ansible-galaxy install geerlingguy.firewall geerlingguy.glusterfs
, or add them to a requirements.txt
file and install with ansible-galaxy install -r requirements.txt
.
Gluster requires a 'brick' directory to use as a virtual filesystem, and our servers also need a directory where the filesystem can be mounted, so the first file
task ensures both directories exist (gluster_brick_dir
and gluster_mount_dir
). Since we need to use these directory paths more than once, we use variables which will be defined later, in vars.yml
.
Ansible's gluster_volume
module (added in Ansible 1.9) does all the hard work of probing peer servers, setting up the brick as a Gluster filesystem, and configuring the brick for replication. Some of the most important configuration parameters for the gluster_volume
module include:
-
state
: Setting this topresent
makes sure the brick is present. It will also start the volume when it is first created by default, though this behavior can be overridden by thestart_on_create
option. -
name
andbrick
give the Gluster brick a name and location on the server, respectively. In this example, the brick will be located on the boot volume, so we also have to addforce: yes
, or Gluster will complain about not having the brick on a separate volume. -
replicas
tells Gluster how many replicas to ensure exist; this number can vary depending on how many servers you have in the brick'scluster
, and how tolerance you have for server outages. We won't get much into tuning GlusterFS for performance and resiliency, but most situations warrant a value of2
or3
. -
cluster
defines all the hosts which will contain the distributed filesystem. In this case, all thegluster
servers in our Ansible inventory should be included, so we use a Jinja2join
filter to join all the addresses into a list. -
host
sets the host for peer probing explicitly. If you don't set this, you can sometimes get errors on brick creation, depending on your network configuration.
We only need to run the gluster_volume
module once for all the servers, so we add run_once: true
.
The last task in the playbook uses Ansible's mount
module to ensure the Gluster volume is mounted on each of the servers, in the gluster_mount_dir
.
After the playbook is created, we need to define all the variables used in the playbook. Create a vars.yml
file inside the playbooks
directory, with the following variables:
---
# Firewall configuration.
firewall_allowed_tcp_ports:
- 22
# For Gluster.
- 111
# Port-mapper for Gluster 3.4+.
# - 2049
# Gluster Daemon.
- 24007
# 24009+ for Gluster <= 3.3; 49152+ for Gluster 3.4+.
- 24009
- 24010
# Gluster inline NFS server.
- 38465
- 38466
firewall_allowed_udp_ports:
- 111
# Gluster configuration.
gluster_mount_dir: /mnt/gluster
gluster_brick_dir: /srv/gluster/brick
gluster_brick_name: gluster
This variables file should be pretty self-explanatory; all the ports required for Gluster are opened in the firewall, and the three Gluster-related variables we use in the playbook are defined.
Now that we have everything set up, the folder structure should look like this:
gluster/
playbooks/
provision.yml
main.yml
inventory
Vagrantfile
Change directory into the gluster
directory, and run vagrant up
. After a few minutes, provisioning should have completed successfully. To ensure Gluster is working properly, you can run the following two commands, which should give information about Gluster's peer connections and the configured gluster
volume:
$ ansible gluster -i inventory -a "gluster peer status" -s
192.168.29.2 | success | rc=0 >>
Number of Peers: 1
Hostname: 192.168.29.3
Port: 24007
Uuid: 1340bcf1-1ae6-4e55-9716-2642268792a4
State: Peer in Cluster (Connected)
192.168.29.3 | success | rc=0 >>
Number of Peers: 1
Hostname: 192.168.29.2
Port: 24007
Uuid: 63d4a5c8-6b27-4747-8cc1-16af466e4e10
State: Peer in Cluster (Connected)
$ ansible gluster -i inventory -a "gluster volume info" -s
192.168.29.3 | success | rc=0 >>
Volume Name: gluster
Type: Replicate
Volume ID: b75e9e45-d39b-478b-a642-ccd16b7d89d8
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.29.2:/srv/gluster/brick
Brick2: 192.168.29.3:/srv/gluster/brick
192.168.29.2 | success | rc=0 >>
Volume Name: gluster
Type: Replicate
Volume ID: b75e9e45-d39b-478b-a642-ccd16b7d89d8
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.29.2:/srv/gluster/brick
Brick2: 192.168.29.3:/srv/gluster/brick
You can also do the following to confirm that files are being replicated/distributed correctly:
- Log into the first server:
vagrant ssh gluster1
- Create a file in the mounted gluster volume:
sudo touch /mnt/gluster/test
- Log out of the first server:
exit
- Log into the second server:
vagrant ssh gluster2
- List the contents of the gluster directory:
ls /mnt/gluster
You should see the test
file you created in step 2; this means Gluster is working correctly!
Summary
Deploying distributed file systems like Gluster can seem challenging, but Ansible simplifies the process, and more importantly, does so idempotently; each time you run the playbook again, it will ensure everything stays configured as you've set it.
This example Gluster configuration can be found in its entirety on GitHub, in the Gluster example in the Ansible Vagrant Examples project.
Read Ansible for DevOps, available on LeanPub:
Comments
Seems there are some problems copy pasting from the example scripts, greater than arrows are converted to html equivalents:
syntax error, unexpected '=', expecting =>
{ :name => "gluster1", :ip => "192.168.29.2" },
Would be nice to update the example code.
Thanks for the tutorial !!
Sorry about that! It was indeed a copy/paste issue. I've updated the examples in the blog post.
hi, thanks for such a useful content
how can we add multiple bricks via the gluster_volume module?
Hi,
I'm getting an error during TASK [geerlingguy.glusterfs : Ensure PPA for GlusterFS is present.]:
Failed to fetch http://ppa.launchpad.net/gluster/glusterfs-3.13/ubuntu/dists/trusty/mai… 404 Not Found
See that http://ppa.launchpad.net/gluster/glusterfs-3.13/ubuntu/dists is missing trusty, but contains the other ones. Any feedback on how to solve this would be appreciated.
Nevermind, just noticed the Git link with updated example!