Use Vagrant, Foreman, and Puppet to provision and configure HAProxy as a reverse proxy, load-balancer for a cluster of Apache web servers.
Introduction
In this post, we will use several technologies, including Vagrant, Foreman, and Puppet, to provision and configure a basic load-balanced web server environment. In this environment, a single node with HAProxy will act as a reverse proxy and load-balancer for two identical Apache web server nodes. All three nodes will be provisioned and bootstrapped using Vagrant, from a Linux CentOS 6.5 Vagrant Box. Afterwards, Foreman, with Puppet, will then be used to install and configure the nodes with HAProxy and Apache, using a series of Puppet modules.
For this post, I will assume you already have running instances of Vagrant with the vagrant-hostmanager plugin, VirtualBox, and Foreman. If you are unfamiliar with Vagrant, the vagrant-hostmanager plugin, VirtualBox, Foreman, or Puppet, review my recent post, Installing Foreman and Puppet Agent on Multiple VMs Using Vagrant and VirtualBox. This post demonstrates how to install and configure Foreman. In addition, the post also demonstrates how to provision and bootstrap virtual machines using Vagrant and VirtualBox. Basically, we will be repeating many of this same steps in this post, with the addition of HAProxy, Apache, and some custom configuration Puppet modules.
All code for this post is available on GitHub. However, it been updated as of 8/23/2015. Changes were required to fix compatibility issues with the latest versions of Puppet 4.x and Foreman. Additionally, the version of CentOS on all VMs was updated from 6.6 to 7.1 and the version of Foreman was updated from 1.7 to 1.9.
Steps
Here is a high-level overview of our steps in this post:
- Provision and configure the three CentOS-based virtual machines (‘nodes’) using Vagrant and VirtualBox
- Install the HAProxy and Apache Puppet modules, from Puppet Forge, onto the Foreman server
- Install the custom HAProxy and Apache Puppet configuration modules, from GitHub, onto the Foreman server
- Import the four new module’s classes to Foreman’s Puppet class library
- Add the three new virtual machines (‘hosts’) to Foreman
- Configure the new hosts in Foreman, assigning the appropriate Puppet classes
- Apply the Foreman Puppet configurations to the new hosts
- Test HAProxy is working as a reverse and proxy load-balancer for the two Apache web server nodes
In this post, I will use the terms ‘virtual machine’, ‘machine’, ‘node’, ‘agent node’, and ‘host’, interchangeable, based on each software’s own nomenclature.
Provisioning
First, using the process described in the previous post, provision and bootstrap the three new virtual machines. The new machine’s Vagrant configuration is shown below. This should be added to the JSON configuration file. All code for the earlier post is available on GitHub.
{ "nodes": { "haproxy.example.com": { ":ip": "192.168.35.101", "ports": [], ":memory": 512, ":bootstrap": "bootstrap-node.sh" }, "node01.example.com": { ":ip": "192.168.35.121", "ports": [], ":memory": 512, ":bootstrap": "bootstrap-node.sh" }, "node02.example.com": { ":ip": "192.168.35.122", "ports": [], ":memory": 512, ":bootstrap": "bootstrap-node.sh" } } }
After provisioning and bootstrapping, observe the three machines running in Oracle’s VM VirtualBox Manager.
Installing Puppet Forge Modules
The next task is to install the HAProxy and Apache Puppet modules on the Foreman server. This allows Foreman to have access to them. I chose the puppetlabs-haproxy HAProxy module and the puppetlabs-apache Apache modules. Both modules were authored by Puppet Labs, and are available on Puppet Forge.
The exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory.
sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-haproxy sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-apache # confirm module installation puppet module list --modulepath /etc/puppet/environments/production/modules
Installing Configuration Modules
Next, install the HAProxy and Apache configuration Puppet modules on the Foreman server. Both modules are hosted on my GitHub repository. Both modules can be downloaded directly from GitHub and installed on the Foreman server, from the command line. Again, the exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory. Also, notice I am currently downloading version 0.1.0 of both modules at the time of writing this post. Make sure to double-check for the latest versions of both modules before running the commands. Modify the commands if necessary.
# apache config module wget -N https://github.com/garystafford/garystafford-apache_example_config/archive/v0.1.0.tar.gz && \ sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force # haproxy config module wget -N https://github.com/garystafford/garystafford-haproxy_node_config/archive/v0.1.0.tar.gz && \ sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force # confirm module installation puppet module list --modulepath /etc/puppet/environments/production/modules
HAProxy Configuration
The HAProxy configuration module configures HAProxy’s /etc/haproxy/haproxy.cfg
file. The single class in the module’s init.pp
manifest is as follows:
class haproxy_node_config () inherits haproxy { haproxy::listen { 'puppet00': collect_exported => false, ipaddress => '*', ports => '80', mode => 'http', options => { 'option' => ['httplog'], 'balance' => 'roundrobin', }, } Haproxy::Balancermember <<| listening_service == 'puppet00' |>> haproxy::balancermember { 'haproxy': listening_service => 'puppet00', server_names => ['node01.example.com', 'node02.example.com'], ipaddresses => ['192.168.35.121', '192.168.35.122'], ports => '80', options => 'check', } }
The resulting /etc/haproxy/haproxy.cfg
file will have the following configuration added. It defines the two Apache web server node’s hostname, ip addresses, and http port. The configuration also defines the load-balancing method, ‘round-robin‘ in our example. In this example, we are using layer 7 load-balancing (application layer – http), as opposed to layer 4 load-balancing (transport layer – tcp). Either method will work for this example. The Puppet Labs’ HAProxy module’s documentation on Puppet Forge and HAProxy’s own documentation are both excellent starting points to understand how to configure HAProxy. We are barely scraping the surface of HAProxy’s capabilities in this brief example.
listen puppet00 bind *:80 mode http balance roundrobin option httplog server node01.example.com 192.168.35.121:80 check server node02.example.com 192.168.35.122:80 check
Apache Configuration
The Apache configuration module creates default web page in Apache’s docroot
directory, /var/www/html/index.html
. The single class in the module’s init.pp
manifest is as follows:
The resulting /var/www/html/index.html
file will look like the following. Observe that the facter variables shown in the module manifest above have been replaced by the individual node’s hostname and ip address during application of the configuration by Puppet (ie. ${fqdn}
became node01.example.com
).
Both of these Puppet modules were created specifically to configure HAProxy and Apache for this post. Unlike published modules on Puppet Forge, these two modules are very simple, and don’t necessarily represent the best practices and patterns for authoring Puppet Forge modules.
Importing into Foreman
After installing the new modules onto the Foreman server, we need to import them into Foreman. This is accomplished from the ‘Puppet classes’ tab, using the ‘Import from theforeman.example.com’ button. Once imported, the module classes are available to assign to host machines.
Add Host to Foreman
Next, add the three new hosts to Foreman. If you have questions on how to add the nodes to Foreman, start Puppet’s Certificate Signing Request (CSR) process on the hosts, signing the certificates, or other first time tasks, refer to the previous post. That post explains this process in detail.
Configure the Hosts
Next, configure the HAProxy and Apache nodes with the necessary Puppet classes. In addition to the base module classes and configuration classes, I recommend adding git and ntp modules to each of the new nodes. These modules were explained in the previous post. Refer to the screen-grabs below for correct module classes to add, specific to HAProxy and Apache.
Agent Configuration and Testing the System
Once configurations are retrieved and applied by Puppet Agent on each node, we can test our reverse proxy load-balanced environment. To start, open a browser and load haproxy.paychex.com
. You should see one of the two pages below. Refresh the page a few times. You should observe HAProxy re-directing you to one Apache web server node, and then the other, using HAProxy’s round-robin algorithm. You can differentiate the Apache web servers by the hostname and ip address displayed on the web page.
After hitting HAProxy’s URL several times successfully, view HAProxy’s built-in Statistics Report page at http://haproxy.example.com/haproxy?stats
. Note below, each of the two Apache node has been hit 44 times each from HAProxy. This demonstrates the effectiveness of the reverse proxy and load-balancing features of HAProxy.
Accessing Apache Directly
If you are testing HAProxy from the same machine on which you created the virtual machines (VirtualBox host), you will likely be able to directly access either of the Apache web servers (ei. node02.example.com
). The VirtualBox host file contains the ip addresses and hostnames of all three hosts. This DNS configuration was done automatically by the vagrant-hostmanager plugin. However, in an actual Production environment, only the HAProxy server’s hostname and ip address would be publicly accessible to a user. The two Apache nodes would sit behind a firewall, accessible only by the HAProxy server. HAProxy acts as a façade to public side of the network.
Testing Apache Host Failure
The main reason you would likely use a load-balancer is high-availability. With HAProxy acting as a load-balancer, we should be able to impair one of the two Apache nodes, without noticeable disruption. HAProxy will continue to serve content from the remaining Apache web server node.
Log into node01.example.com
, using the following command, vagrant ssh node01.example.com
. To simulate an impairment on ‘node01’, run the following command to stop Apache, sudo service httpd stop
. Now, refresh the haproxy.example.com
URL in your web browser. You should notice HAProxy is now redirecting all traffic to node02.example.com
.
Troubleshooting
While troubleshooting HAProxy configuration issues for this demonstration, I discovered logging is not configured by default on CentOS. No worries, I recommend HAProxy: Give me some logs on CentOS 6.5!, by Stephane Combaudon, to get logging running. Once logging is active, you can more easily troubleshoot HAProxy and Apache configuration issues. Here are some example commands you might find useful:
# haproxy sudo more -f /var/log/haproxy.log sudo haproxy -f /etc/haproxy/haproxy.cfg -c # check/validate config file # apache sudo ls -1 /etc/httpd/logs/ sudo tail -50 /etc/httpd/logs/error_log sudo less /etc/httpd/logs/access_log
Redundant Proxies
In this simple example, the system’s weakest point is obviously the single HAProxy instance. It represents a single-point-of-failure (SPOF) in our environment. In an actual production environment, you would likely have more than one instance of HAProxy. They may both be in a load-balanced pool, or one active and on standby as a failover, should one instance become impaired. There are several techniques for building in proxy redundancy, often with the use of Virtual IP and Keepalived. Below is a list of articles that might help you take this post’s example to the next level.
- An Introduction to HAProxy and Load Balancing Concepts
- Install HAProxy and Keepalived (Virtual IP)
- Redundant Load Balancers – HAProxy and Keepalived
- Howto setup a haproxy as fault tolerant / high available load balancer for multiple caching web proxies on RHEL/Centos/SL
- Keepalived Module on Puppet Forge: arioch/keepalived, byTom De Vylder