Provision and Deploy a Consul Cluster on AWS, using Terraform, Docker, and Jenkins

Cover2

Introduction

Modern DevOps tools, such as HashiCorp’s Packer and Terraform, make it easier to provision and manage complex cloud architecture. Utilizing a CI/CD server, such as Jenkins, to securely automate the use of these DevOps tools, ensures quick and consistent results.

In a recent post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, we built a Consul cluster using Docker swarm mode, to host distributed configurations for a Spring Boot application. The cluster was built locally with VirtualBox. This architecture is fine for development and testing, but not for use in Production.

In this post, we will deploy a highly available three-node Consul cluster to AWS. We will use Terraform to provision a set of EC2 instances and accompanying infrastructure. The instances will be built from a hybrid AMIs containing the new Docker Community Edition (CE). In a recent post, Baking AWS AMI with new Docker CE Using Packer, we provisioned an Ubuntu AMI with Docker CE, using Packer. We will deploy Docker containers to each EC2 host, containing an instance of Consul server.

All source code can be found on GitHub.

Jenkins

I have chosen Jenkins to automate all of the post’s build, provisioning, and deployment tasks. However, none of the code is written specifically to Jenkins; you may run all of it from the command line.

For this post, I have built four projects in Jenkins, as follows:

  1. Provision Docker CE AMI: Builds Ubuntu AMI with Docker CE, using Packer
  2. Provision Consul Infra AWS: Provisions Consul infrastructure on AWS, using Terraform
  3. Deploy Consul Cluster AWS: Deploys Consul to AWS, using Docker
  4. Destroy Consul Infra AWS: Destroys Consul infrastructure on AWS, using Terraform

Jenkins UI

We will primarily be using the ‘Provision Consul Infra AWS’, ‘Deploy Consul Cluster AWS’, and ‘Destroy Consul Infra AWS’ Jenkins projects in this post. The fourth Jenkins project, ‘Provision Docker CE AMI’, automates the steps found in the recent post, Baking AWS AMI with new Docker CE Using Packer, to build the AMI used to provision the EC2 instances in this post.

Consul AWS Diagram 2

Terraform

Using Terraform, we will provision EC2 instances in three different Availability Zones within the US East 1 (N. Virginia) Region. Using Terraform’s Amazon Web Services (AWS) provider, we will create the following AWS resources:

  • (1) Virtual Private Cloud (VPC)
  • (1) Internet Gateway
  • (1) Key Pair
  • (3) Elastic Cloud Compute (EC2) Instances
  • (2) Security Groups
  • (3) Subnets
  • (1) Route
  • (3) Route Tables
  • (3) Route Table Associations

The final AWS architecture should resemble the following:

Consul AWS Diagram

Production Ready AWS

Although we have provisioned a fairly complete VPC for this post, it is far from being ready for Production. I have created two security groups, limiting the ingress and egress to the cluster. However, to further productionize the environment would require additional security hardening. At a minimum, you should consider adding public/private subnets, NAT gateways, network access control list rules (network ACLs), and the use of HTTPS for secure communications.

In production, applications would communicate with Consul through local Consul clients. Consul clients would take part in the LAN gossip pool from different subnets, Availability Zones, Regions, or VPCs using VPC peering. Communications would be tightly controlled by IAM, VPC, subnet, IP address, and port.

Also, you would not have direct access to the Consul UI through a publicly exposed IP or DNS address. Access to the UI would be removed altogether or locked down to specific IP addresses, and accessed restricted to secure communication channels.

Consul

We will achieve high availability (HA) by clustering three Consul server nodes across the three Elastic Cloud Compute (EC2) instances. In this minimally sized, three-node cluster of Consul servers, we are protected from the loss of one Consul server node, one EC2 instance, or one Availability Zone(AZ). The cluster will still maintain a quorum of two nodes. An additional level of HA that Consul supports, multiple datacenters (multiple AWS Regions), is not demonstrated in this post.

Docker

Having Docker CE already installed on each EC2 instance allows us to execute remote Docker commands over SSH from Jenkins. These commands will deploy and configure a Consul server node, within a Docker container, on each EC2 instance. The containers are built from HashiCorp’s latest Consul Docker image pulled from Docker Hub.

Getting Started

Preliminary Steps

If you have built infrastructure on AWS with Terraform, these steps should be familiar to you:

  1. First, you will need an AMI with Docker. I suggest reading Baking AWS AMI with new Docker CE Using Packer.
  2. You will need an AWS IAM User with the proper access to create the required infrastructure. For this post, I created a separate Jenkins IAM User with PowerUser level access.
  3. You will need to have an RSA public-private key pair, which can be used to SSH into the EC2 instances and install Consul.
  4. Ensure you have your AWS credentials set. I usually source mine from a .env file, as environment variables. Jenkins can securely manage credentials, using secret text or files.
  5. Fork and/or clone the Consul cluster project from  GitHub.
  6. Change the aws_key_name and public_key_path variable values to your own RSA key, in the variables.tf file
  7. Change the aws_amis_base variable values to your own AMI ID (see step 1)
  8. If you are do not want to use the US East 1 Region and its AZs, modify the variables.tf, network.tf, and instances.tf files.
  9. Disable Terraform’s remote state or modify the resource to match your remote state configuration, in the main.tf file. I am using an Amazon S3 bucket to store my Terraform remote state.

Building an AMI with Docker

If you have not built an Amazon Machine Image (AMI) for use in this post already, you can do so using the scripts provided in the previous post’s GitHub repository. To automate the AMI build task, I built the ‘Provision Docker CE AMI’ Jenkins project. Identical to the other three Jenkins projects in this post, this project has three main tasks, which include: 1) SCM: clone the Packer AMI GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Packer.

The SCM and Bindings tasks are identical to the other projects (see below for details), except for the use of a different GitHub repository. The project’s Build step, which runs the packer_build_ami.sh script looks as follows:

jenkins_13

The resulting AMI ID will need to be manually placed in Terraform’s variables.tf file, before provisioning the AWS infrastructure with Terraform. The new AMI ID will be displayed in Jenkin’s build output.

jenkins_14

Provisioning with Terraform

Based on the modifications you made in the Preliminary Steps, execute the terraform validate command to confirm your changes. Then, run the terraform plan command to review the plan. Assuming are were no errors, finally, run the terraform apply command to provision the AWS infrastructure components.

In Jenkins, I have created the ‘Provision Consul Infra AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Terraform. Those tasks look as follows:

Jenkins_08.png

You will obviously need to use your modified GitHub project, incorporating the configuration changes detailed above, as the SCM source for Jenkins.

Jenkins Credentials

You will also need to configure your AWS credentials.

Jenkins_03.png

The provision_infra.sh script provisions the AWS infrastructure using Terraform. The script also updates Terraform’s remote state. Remember to update the remote state configuration in the script to match your personal settings.


cd tf_env_aws/
terraform remote config \
-backend=s3 \
-backend-config="bucket=your_bucket" \
-backend-config="key=terraform_consul.tfstate" \
-backend-config="region=your_region"
terraform plan
terraform apply

The Jenkins build output should look similar to the following:

jenkins_12.png

Although the build only takes about 90 seconds to complete, the EC2 instances could take a few extra minutes to complete their Status Checks and be completely ready. The final results in the AWS EC2 Management Console should look as follows:

EC2 Management Console

Note each EC2 instance is running in a different US East 1 Availability Zone.

Installing Consul

Once the AWS infrastructure is running and the EC2 instances have completed their Status Checks successfully, we are ready to deploy Consul. In Jenkins, I have created the ‘Deploy Consul Cluster AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run an SSH remote Docker command on each EC2 instance to deploy Consul. The SCM and Bindings tasks are identical to the project above. The project’s Build step looks as follows:

Jenkins_04.png

First, the delete_containers.sh script deletes any previous instances of Consul containers. This is helpful if you need to re-deploy Consul. Next, the deploy_consul.sh script executes a series of SSH remote Docker commands to install and configure Consul on each EC2 instance.


# Advertised Consul IP
export ec2_server1_private_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-1' \
–output text –query 'Reservations[*].Instances[*].PrivateIpAddress')
echo "consul-server-1 private ip: ${ec2_server1_private_ip}"

view raw

consul_01.sh

hosted with ❤ by GitHub


# Deploy Consul Server 1
ec2_public_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-1' \
–output text –query 'Reservations[*].Instances[*].PublicIpAddress')
consul_server="consul-server-1"
ssh -oStrictHostKeyChecking=no -T \
-i ~/.ssh/consul_aws_rsa \
ubuntu@${ec2_public_ip} << EOSSH
docker run -d \
–net=host \
–hostname ${consul_server} \
–name ${consul_server} \
–env "SERVICE_IGNORE=true" \
–env "CONSUL_CLIENT_INTERFACE=eth0" \
–env "CONSUL_BIND_INTERFACE=eth0" \
–volume /home/ubuntu/consul/data:/consul/data \
–publish 8500:8500 \
consul:latest \
consul agent -server -ui -client=0.0.0.0 \
-bootstrap-expect=3 \
-advertise='{{ GetInterfaceIP "eth0" }}' \
-data-dir="/consul/data"
sleep 5
docker logs consul-server-1
docker exec -i consul-server-1 consul members
EOSSH

view raw

consul_02.sh

hosted with ❤ by GitHub


# Deploy Consul Server 2
ec2_public_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-2' \
–output text –query 'Reservations[*].Instances[*].PublicIpAddress')
consul_server="consul-server-2"
ssh -oStrictHostKeyChecking=no -T \
-i ~/.ssh/consul_aws_rsa \
ubuntu@${ec2_public_ip} << EOSSH
docker run -d \
–net=host \
–hostname ${consul_server} \
–name ${consul_server} \
–env "SERVICE_IGNORE=true" \
–env "CONSUL_CLIENT_INTERFACE=eth0" \
–env "CONSUL_BIND_INTERFACE=eth0" \
–volume /home/ubuntu/consul/data:/consul/data \
–publish 8500:8500 \
consul:latest \
consul agent -server -ui -client=0.0.0.0 \
-advertise='{{ GetInterfaceIP "eth0" }}' \
-retry-join="${ec2_server1_private_ip}" \
-data-dir="/consul/data"
sleep 5
docker logs consul-server-2
docker exec -i consul-server-2 consul members
EOSSH

view raw

consul_03.sh

hosted with ❤ by GitHub


# Deploy Consul Server 3
ec2_public_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-3' \
–output text –query 'Reservations[*].Instances[*].PublicIpAddress')
consul_server="consul-server-3"
ssh -oStrictHostKeyChecking=no -T \
-i ~/.ssh/consul_aws_rsa \
ubuntu@${ec2_public_ip} << EOSSH
docker run -d \
–net=host \
–hostname ${consul_server} \
–name ${consul_server} \
–env "SERVICE_IGNORE=true" \
–env "CONSUL_CLIENT_INTERFACE=eth0" \
–env "CONSUL_BIND_INTERFACE=eth0" \
–volume /home/ubuntu/consul/data:/consul/data \
–publish 8500:8500 \
consul:latest \
consul agent -server -ui -client=0.0.0.0 \
-advertise='{{ GetInterfaceIP "eth0" }}' \
-retry-join="${ec2_server1_private_ip}" \
-data-dir="/consul/data"
sleep 5
docker logs consul-server-3
docker exec -i consul-server-3 consul members
EOSSH

view raw

consul_04.sh

hosted with ❤ by GitHub


# Output Consul Web UI URL
ec2_public_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-1' \
–output text –query 'Reservations[*].Instances[*].PublicIpAddress')
echo " "
echo "*** Consul UI: http://${ec2_public_ip}:8500/ui/ ***"

view raw

consul_05.sh

hosted with ❤ by GitHub

The entire Jenkins build process only takes about 30 seconds. Afterward, the output from a successful Jenkins build should show that all three Consul server instances are running, have formed a quorum, and have elected a Leader.

Jenkins_05.png

Persisting State

The Consul Docker image exposes VOLUME /consul/data, which is a path were Consul will place its persisted state. Using Terraform’s remote-exec provisioner, we create a directory on each EC2 instance, at /home/ubuntu/consul/config. The docker run command bind-mounts the container’s /consul/data path to the EC2 host’s /home/ubuntu/consul/config directory.

According to Consul, the Consul server container instance will ‘store the client information plus snapshots and data related to the consensus algorithm and other state, like Consul’s key/value store and catalog’ in the /consul/data directory. That container directory is now bind-mounted to the EC2 host, as demonstrated below.

jenkins_15

Accessing Consul

Following a successful deployment, you should be able to use the public URL, displayed in the build output of the ‘Deploy Consul Cluster AWS’ project, to access the Consul UI. Clicking on the Nodes tab in the UI, you should see all three Consul server instances, one per EC2 instance, running and healthy.

Consul UI

Destroying Infrastructure

When you are finished with the post, you may want to remove the running infrastructure, so you don’t continue to get billed by Amazon. The ‘Destroy Consul Infra AWS’ project destroys all the AWS infrastructure, provisioned as part of this post, in about 60 seconds. The project’s SCM and Bindings tasks are identical to the both previous projects. The Build step calls the destroy_infra.sh script, which is included in the GitHub project. The script executes the terraform destroy -force command. It will delete all running infrastructure components associated with the post and update Terraform’s remote state.

Jenkins_09

Conclusion

This post has demonstrated how modern DevOps tooling, such as HashiCorp’s Packer and Terraform, make it easy to build, provision and manage complex cloud architecture. Using a CI/CD server, such as Jenkins, to securely automate the use of these tools, ensures quick and consistent results.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , ,

  1. #1 by nicolasmas on July 20, 2017 - 6:33 am

    Nice post! Quick question: Why would you use containers for each consul server and not directly have them on the EC2 instance?

  2. #3 by Alexander Land on October 13, 2017 - 8:21 am

    Thanks for the article – interesting and useful.

    I am also interested why Docker instead of direct installation of Consul binaries.

    At least one reason is obvious – it is easier to install and start / stop – but it comes with a price – I heard that it is a bad idea to use Docker containers in production as they crash from time to time without any visible reason.

    Thanks

  3. #4 by Kranthikumar Parupally on October 14, 2017 - 6:14 pm

    thanks man’

  4. #5 by Ron on July 5, 2018 - 8:59 am

    an extremely helpful example of manually starting a consul cluster. thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: