Automate the Provisioning and Configuration of HAProxy and an Apache Web Server Cluster Using Foreman

Use Vagrant, Foreman, and Puppet to provision and configure HAProxy as a reverse proxy, load-balancer for a cluster of Apache web servers.

Simple Load Balanced 2

Introduction

In this post, we will use several technologies, including VagrantForeman, and Puppet, to provision and configure a basic load-balanced web server environment. In this environment, a single node with HAProxy will act as a reverse proxy and load-balancer for two identical Apache web server nodes. All three nodes will be provisioned and bootstrapped using Vagrant, from a Linux CentOS 6.5 Vagrant Box. Afterwards, Foreman, with Puppet, will then be used to install and configure the nodes with HAProxy and Apache, using a series of Puppet modules.

For this post, I will assume you already have running instances of Vagrant with the vagrant-hostmanager plugin, VirtualBox, and Foreman. If you are unfamiliar with Vagrant, the vagrant-hostmanager plugin, VirtualBox, Foreman, or Puppet, review my recent post, Installing Foreman and Puppet Agent on Multiple VMs Using Vagrant and VirtualBox. This post demonstrates how to install and configure Foreman. In addition, the post also demonstrates how to provision and bootstrap virtual machines using Vagrant and VirtualBox. Basically, we will be repeating many of this same steps in this post, with the addition of HAProxy, Apache, and some custom configuration Puppet modules.

All code for this post is available on GitHub. However, it been updated as of 8/23/2015. Changes were required to fix compatibility issues with the latest versions of Puppet 4.x and Foreman. Additionally, the version of CentOS on all VMs was updated from 6.6 to 7.1 and the version of Foreman was updated from 1.7 to 1.9.

Steps

Here is a high-level overview of our steps in this post:

  1. Provision and configure the three CentOS-based virtual machines (‘nodes’) using Vagrant and VirtualBox
  2. Install the HAProxy and Apache Puppet modules, from Puppet Forge, onto the Foreman server
  3. Install the custom HAProxy and Apache Puppet configuration modules, from GitHub, onto the Foreman server
  4. Import the four new module’s classes to Foreman’s Puppet class library
  5. Add the three new virtual machines (‘hosts’) to Foreman
  6. Configure the new hosts in Foreman, assigning the appropriate Puppet classes
  7. Apply the Foreman Puppet configurations to the new hosts
  8. Test HAProxy is working as a reverse and proxy load-balancer for the two Apache web server nodes

In this post, I will use the terms ‘virtual machine’, ‘machine’, ‘node’, ‘agent node’, and ‘host’, interchangeable, based on each software’s own nomenclature.

Provisioning

First, using the process described in the previous post, provision and bootstrap the three new virtual machines. The new machine’s Vagrant configuration is shown below. This should be added to the JSON configuration file. All code for the earlier post is available on GitHub.

{
  "nodes": {
    "haproxy.example.com": {
      ":ip": "192.168.35.101",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    },
    "node01.example.com": {
      ":ip": "192.168.35.121",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    },
    "node02.example.com": {
      ":ip": "192.168.35.122",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    }
  }
}

After provisioning and bootstrapping, observe the three machines running in Oracle’s VM VirtualBox Manager.

Oracle VM VirtualBox Manager View of New Nodes

Oracle VM VirtualBox Manager View of New Nodes

Installing Puppet Forge Modules

The next task is to install the HAProxy and Apache Puppet modules on the Foreman server. This allows Foreman to have access to them. I chose the puppetlabs-haproxy HAProxy module and the puppetlabs-apache Apache modules. Both modules were authored by Puppet Labs, and are available on Puppet Forge.

The exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory.

sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-haproxy
sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-apache

# confirm module installation
puppet module list --modulepath /etc/puppet/environments/production/modules

Installing Configuration Modules

Next, install the HAProxy and Apache configuration Puppet modules on the Foreman server. Both modules are hosted on my GitHub repository. Both modules can be downloaded directly from GitHub and installed on the Foreman server, from the command line. Again, the exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory. Also, notice I am currently downloading version 0.1.0 of both modules at the time of writing this post. Make sure to double-check for the latest versions of both modules before running the commands. Modify the commands if necessary.

# apache config module
wget -N https://github.com/garystafford/garystafford-apache_example_config/archive/v0.1.0.tar.gz && \
sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force

# haproxy config module
wget -N https://github.com/garystafford/garystafford-haproxy_node_config/archive/v0.1.0.tar.gz && \
sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force

# confirm module installation
puppet module list --modulepath /etc/puppet/environments/production/modules
GitHub Repository for Apache Config Example

GitHub Repository for Apache Config Example

HAProxy Configuration
The HAProxy configuration module configures HAProxy’s /etc/haproxy/haproxy.cfg file. The single class in the module’s init.pp manifest is as follows:

class haproxy_node_config () inherits haproxy {
  haproxy::listen { 'puppet00':
    collect_exported => false,
    ipaddress        => '*',
    ports            => '80',
    mode             => 'http',
    options          => {
      'option'  => ['httplog'],
      'balance' => 'roundrobin',
    },
  }

  Haproxy::Balancermember <<| listening_service == 'puppet00' |>>

  haproxy::balancermember { 'haproxy':
    listening_service => 'puppet00',
    server_names      => ['node01.example.com', 'node02.example.com'],
    ipaddresses       => ['192.168.35.121', '192.168.35.122'],
    ports             => '80',
    options           => 'check',
  }
}

The resulting /etc/haproxy/haproxy.cfg file will have the following configuration added. It defines the two Apache web server node’s hostname, ip addresses, and http port. The configuration also defines the load-balancing method, ‘round-robin‘ in our example. In this example, we are using layer 7 load-balancing (application layer – http), as opposed to layer 4 load-balancing (transport layer – tcp). Either method will work for this example. The Puppet Labs’ HAProxy module’s documentation on Puppet Forge and HAProxy’s own documentation are both excellent starting points to understand how to configure HAProxy. We are barely scraping the surface of HAProxy’s capabilities in this brief example.

listen puppet00
  bind *:80
  mode  http
  balance  roundrobin
  option  httplog
  server node01.example.com 192.168.35.121:80 check
  server node02.example.com 192.168.35.122:80 check

Apache Configuration
The Apache configuration module creates default web page in Apache’s docroot directory, /var/www/html/index.html. The single class in the module’s init.pp manifest is as follows:
ApacheConfigClass
The resulting /var/www/html/index.html file will look like the following. Observe that the facter variables shown in the module manifest above have been replaced by the individual node’s hostname and ip address during application of the configuration by Puppet (ie. ${fqdn} became node01.example.com).

ApacheConfigClass

Both of these Puppet modules were created specifically to configure HAProxy and Apache for this post. Unlike published modules on Puppet Forge, these two modules are very simple, and don’t necessarily represent the best practices and patterns for authoring Puppet Forge modules.

Importing into Foreman

After installing the new modules onto the Foreman server, we need to import them into Foreman. This is accomplished from the ‘Puppet classes’ tab, using the ‘Import from theforeman.example.com’ button. Once imported, the module classes are available to assign to host machines.

Importing Puppet Classes into Foreman

Importing Puppet Classes into Foreman

Add Host to Foreman

Next, add the three new hosts to Foreman. If you have questions on how to add the nodes to Foreman, start Puppet’s Certificate Signing Request (CSR) process on the hosts, signing the certificates, or other first time tasks, refer to the previous post. That post explains this process in detail.

Foreman Hosts Tab Showing New Nodes

Foreman Hosts Tab Showing New Nodes

Configure the Hosts

Next, configure the HAProxy and Apache nodes with the necessary Puppet classes. In addition to the base module classes and configuration classes, I recommend adding git and ntp modules to each of the new nodes. These modules were explained in the previous post. Refer to the screen-grabs below for correct module classes to add, specific to HAProxy and Apache.

HAProxy Node Puppet Classes Tab

HAProxy Node Puppet Classes Tab

Apache Nodes Puppet Classes Tab

Apache Nodes Puppet Classes Tab

Agent Configuration and Testing the System

Once configurations are retrieved and applied by Puppet Agent on each node, we can test our reverse proxy load-balanced environment. To start, open a browser and load haproxy.paychex.com. You should see one of the two pages below. Refresh the page a few times. You should observe HAProxy re-directing you to one Apache web server node, and then the other, using HAProxy’s round-robin algorithm. You can differentiate the Apache web servers by the hostname and ip address displayed on the web page.

Load Balancer Directing Traffic to Node01

Load Balancer Directing Traffic to Node01

Load Balancer Directing Traffic to Node02

Load Balancer Directing Traffic to Node02

After hitting HAProxy’s URL several times successfully, view HAProxy’s built-in Statistics Report page at http://haproxy.example.com/haproxy?stats. Note below, each of the two Apache node has been hit 44 times each from HAProxy. This demonstrates the effectiveness of the reverse proxy and load-balancing features of HAProxy.

Statistics Report for HAProxy

Statistics Report for HAProxy

Accessing Apache Directly
If you are testing HAProxy from the same machine on which you created the virtual machines (VirtualBox host), you will likely be able to directly access either of the Apache web servers (ei. node02.example.com). The VirtualBox host file contains the ip addresses and hostnames of all three hosts. This DNS configuration was done automatically by the vagrant-hostmanager plugin. However, in an actual Production environment, only the HAProxy server’s hostname and ip address would be publicly accessible to a user. The two Apache nodes would sit behind a firewall, accessible only by the HAProxy server. HAProxy acts as a façade to public side of the network.

Testing Apache Host Failure
The main reason you would likely use a load-balancer is high-availability. With HAProxy acting as a load-balancer, we should be able to impair one of the two Apache nodes, without noticeable disruption. HAProxy will continue to serve content from the remaining Apache web server node.

Log into node01.example.com, using the following command, vagrant ssh node01.example.com. To simulate an impairment on ‘node01’, run the following command to stop Apache, sudo service httpd stop. Now, refresh the haproxy.example.com URL in your web browser. You should notice HAProxy is now redirecting all traffic to node02.example.com.

Troubleshooting

While troubleshooting HAProxy configuration issues for this demonstration, I discovered logging is not configured by default on CentOS. No worries, I recommend HAProxy: Give me some logs on CentOS 6.5!, by Stephane Combaudon, to get logging running. Once logging is active, you can more easily troubleshoot HAProxy and Apache configuration issues. Here are some example commands you might find useful:

# haproxy
sudo more -f /var/log/haproxy.log
sudo haproxy -f /etc/haproxy/haproxy.cfg -c # check/validate config file

# apache
sudo ls -1 /etc/httpd/logs/
sudo tail -50 /etc/httpd/logs/error_log
sudo less /etc/httpd/logs/access_log

Redundant Proxies

In this simple example, the system’s weakest point is obviously the single HAProxy instance. It represents a single-point-of-failure (SPOF) in our environment. In an actual production environment, you would likely have more than one instance of HAProxy. They may both be in a load-balanced pool, or one active and on standby as a failover, should one instance become impaired. There are several techniques for building in proxy redundancy, often with the use of Virtual IP and Keepalived. Below is a list of articles that might help you take this post’s example to the next level.

, , , , , , , , , , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: