Posts Tagged DevOps

DevOps for DataOps: Building a CI/CD Pipeline for Apache Airflow DAGs

Build an effective CI/CD pipeline to test and deploy your Apache Airflow DAGs to Amazon MWAA using GitHub Actions

Introduction

In this post, we will learn how to use GitHub Actions to build an effective CI/CD workflow for our Apache Airflow DAGs. We will use the DevOps concepts of Continuous Integration and Continuous Delivery to automate the testing and deployment of Airflow DAGs to Amazon Managed Workflows for Apache Airflow (Amazon MWAA) on AWS.

Fork and pull model of collaborative Airflow development used in this post

Technologies

Apache Airflow

According to the documentation, Apache Airflow is an open-source platform to author, schedule, and monitor workflows programmatically. With Airflow, you author workflows as Directed Acyclic Graphs (DAGs) of tasks written in Python.

Amazon Managed Workflows for Apache Airflow

According to AWS, Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a highly available, secure, and fully-managed workflow orchestration for Apache Airflow. MWAA automatically scales its workflow execution capacity to meet your needs and is integrated with AWS security services to help provide fast and secure access to data.

Example of Apache Airflow UI within Amazon MWAA Environment

GitHub Actions

According to GitHub, GitHub Actions makes it easy to automate software workflows with CI/CD. GitHub Actions allow you to build, test, and deploy code right from GitHub. GitHub Actions are workflows triggered by GitHub events like push, issue creation, or a new release. You can leverage GitHub Actions prebuilt and maintained by the community.

Example of GitHub Action workflow running in the GitHub repository used in this post

If you are new to GitHub Actions, I recommend my previous post, Continuous Integration and Deployment of Docker Images using GitHub Actions.

Terminology

DataOps

According to Wikipedia, DataOps is an automated, process-oriented methodology used by analytic and data teams to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new approach to data analytics.

DataOps applies to the entire data lifecycle from data preparation to reporting and recognizes the interconnected nature of the data analytics team and IT operations. DataOps incorporates the Agile methodology to shorten the software development life cycle (SDLC) of analytics development.

DevOps

According to Wikipedia, DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality.

DevOps is a set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, while ensuring high quality. -Wikipedia

Fail Fast

According to Wikipedia, a fail-fast system is one that immediately reports any condition that is likely to indicate a failure. Using the DevOps concept of fail fast, we build steps into our workflows to uncover errors sooner in the SDLC. We shift testing as far to the left as possible (referring to the pipeline of steps moving from left to right) and test at multiple points along the way.

Source Code

All source code for this demonstration, including the GitHub Actions, Pytest unit tests, and Git Hooks, is open-sourced and located on GitHub.

Architecture

The diagram below represents the architecture for a recent blog post and video demonstration, Lakehouse Automation on AWS with Apache Airflow. The post and video show how to programmatically load and upload data from Amazon Redshift to an Amazon S3-based data lake using Apache Airflow.

Architecture for the post and video, Lakehouse Automation on AWS with Apache Airflow

In this post, we will review how the DAGs from the previous were developed, tested, and deployed to MWAA using a variety of progressively more effective CI/CD workflows. The workflows demonstrated could also be easily applied to other Airflow resources in addition to DAGs, such as SQL scripts, configuration and data files, Python requirement files, and plugins.

Workflows

No DevOps

Below we see a minimally viable workflow for loading DAGs into Amazon MWAA, which does not use the principles of CI/CD. Changes are made in the local Airflow developer’s environment. The modified DAGs are copied directly to the Amazon S3 bucket, which are then automatically synced with Amazon MWAA, barring any errors. Those changes are also (hopefully) pushed back to the centralized version control or source code management (SCM) system, which is GitHub in this post.

Error-prone, non-DevOps workflow for modifying and syncing DAGs to MWAA

There are at least two significant issues with this error-prone workflow. First, the DAGs are always out of sync between the Amazon S3 bucket and GitHub. These are two independent steps — copying or syncing the DAGs to S3 and pushing the DAGs to GitHub. A developer might continue making changes and pushing DAGs to S3 without pushing to GitHub or vice versa.

Secondly, the DevOps concept of fail-fast is missing. The first time you know your DAG contains errors is likely when it is synced to MWAA and throws an Import Error. By then, the DAG has already been copied to S3, synced to MWAA, and possibly pushed to GitHub, which other developers could then pull.

Example of a typical DAG Import Error, easily caught with a simple test

GitHub Actions

A significant step up from the previous workflow is using GitHub Actions to test and deploy your code after pushing it to GitHub. Although in this workflow, code is still ‘pushed straight to Trunk’ (the main branch in GitHub) and risks other developers in a collaborative environment pulling potentially erroneous code, you have far less chance of DAG errors making it to MWAA.

GitHub Actions allow you to fail faster and catch errors sooner

Using GitHub Actions, you also eliminate human error that could result in the changes to DAGs not being synced to Amazon S3. Lastly, using this workflow improves security by eliminating the need to provide direct access to the Airflow Amazon S3 bucket to Airflow Developers.

Fork and pull model of collaborative Airflow development used in this post (video only)

Types of Tests

The first GitHub Action, test_dags.yml, is triggered on a push to the dags directory in the main branch of the repository. It is also triggered whenever a pull request is made for the main branch. The first GitHub Action runs a battery of tests, including checking Python dependencies, code style, code quality, DAG import errors, and unit tests. The tests catch issues with DAGs before being synced to S3 by a second GitHub Action.

name: Test DAGs

on:
  push:
    paths:
      - 'dags/**'
  pull_request:
    branches:
      - main

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.7'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements/requirements.txt
        pip check
    - name: Lint with Flake8
      run: |
        pip install flake8
        flake8 --ignore E501 dags --benchmark -v
    - name: Confirm Black code compliance (psf/black)
      run: |
        pip install pytest-black
        pytest dags --black -v
    - name: Test with Pytest
      run: |
        pip install pytest
        cd tests || exit
        pytest tests.py -v

Successful runs of the ‘Test DAGs’ GitHub Action, shown in the Actions Console

Python Dependencies

The first test installs the modules listed in the requirements.txt file used locally to develop the application. This test is designed to uncover any missing or conflicting modules.

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements/requirements.txt
pip check

It is essential to develop your DAGs against the same version of Python and with the same version of the Python modules used in your Airflow environment. You can use the BashOperator to run shell commands to obtain the versions of Python and module installed in your Airflow environment:

python3 --version; python3 -m pip list

A snippet of log output from DAG showing Python version and Python modules available in MWAA 2.0.2:

Python version and Python modules available in MWAA 2.0.2

The latest stable release of Airflow is currently version 2.2.2, released 2021-11-15. However, as of December 2021, Amazon’s latest version of MWAA 2.x is version 2.0.2, released 2021-04-19. MWAA 2.0.2 currently runs Python3 version 3.7.10.

Available versions of Amazon MWAA as of December 2021

Flake8

Known as ‘your tool for style guide enforcement,’ Flake8 is described as the modular source code checker. It is a command-line utility for enforcing style consistency across Python projects. Flake8 is a wrapper around PyFlakes, pycodestyle, and Ned Batchelder’s McCabe script. The module, pycodestyle, is a tool to check your Python code against some of the style conventions in PEP 8.

Flake8 is highly configurable, with options to ignore specific rules if not required by your development team. For example, in this demonstration, I intentionally ignored rule E501, which states that ‘line length should be limited to 72 characters.

- name: Lint with Flake8
run: |
pip install flake8
flake8 --ignore E501 dags --benchmark -v

Black

Known as ‘the uncompromising code formatter,’ Python code formatted using Black (referred to as Blackened code) looks the same regardless of the project you’re reading. Formatting becomes transparent, allowing teams to focus on the content instead. Black makes code review faster by producing the smallest diffs possible, assuming all developers are using black to format their code.

The Airflow DAGs in this GitHub repository are automatically formatted with black using a pre-commit Git Hooks before being committed and pushed to GitHub. The test confirms black code compliance.

- name: Confirm Black code compliance (psf/black)
run: |
pip install pytest-black
pytest dags --black -v

Pytest

The pytest framework describes itself as a mature, fully-featured Python testing tool that helps you write better programs. The Pytest framework makes it easy to write small tests yet scales to support complex functional testing for applications and libraries.

The GitHub Action in the GitHub project, test_dags.yml, calls the tests.py file, also contained in the project.

- name: Test with Pytest
run: |
pip install pytest
cd tests || exit
pytest tests.py -v

The tests.py file contains several pytest unit tests. The tests are based on my project requirements; your tests will vary. These tests confirm that all DAGs:

  1. Do not contain DAG Import Errors (test catches 75% of my errors);
  2. Follow specific file naming conventions;
  3. Include a description and an owner other than ‘airflow’;
  4. Contain required project tags;
  5. Do not send emails (my projects use SNS or Slack for notifications);
  6. Do not retry more than three times;
import os
import sys
import pytest
from airflow.models import DagBag
sys.path.append(os.path.join(os.path.dirname(__file__), "../dags"))
sys.path.append(os.path.join(os.path.dirname(__file__), "../dags/utilities"))
# Airflow variables called from DAGs under test are stubbed out
os.environ["AIRFLOW_VAR_DATA_LAKE_BUCKET"] = "test_bucket"
os.environ["AIRFLOW_VAR_ATHENA_QUERY_RESULTS"] = "SELECT 1;"
os.environ["AIRFLOW_VAR_SNS_TOPIC"] = "test_topic"
os.environ["AIRFLOW_VAR_REDSHIFT_UNLOAD_IAM_ROLE"] = "test_role_1"
os.environ["AIRFLOW_VAR_GLUE_CRAWLER_IAM_ROLE"] = "test_role_2"
@pytest.fixture(params=["../dags/"])
def dag_bag(request):
return DagBag(dag_folder=request.param, include_examples=False)
def test_no_import_errors(dag_bag):
assert not dag_bag.import_errors
def test_requires_tags(dag_bag):
for dag_id, dag in dag_bag.dags.items():
assert dag.tags
def test_requires_specific_tag(dag_bag):
for dag_id, dag in dag_bag.dags.items():
try:
assert dag.tags.index("data lake demo") >= 0
except ValueError:
assert dag.tags.index("redshift demo") >= 0
def test_desc_len_greater_than_fifteen(dag_bag):
for dag_id, dag in dag_bag.dags.items():
assert len(dag.description) > 15
def test_owner_len_greater_than_five(dag_bag):
for dag_id, dag in dag_bag.dags.items():
assert len(dag.owner) > 5
def test_owner_not_airflow(dag_bag):
for dag_id, dag in dag_bag.dags.items():
assert str.lower(dag.owner) != "airflow"
def test_no_emails_on_retry(dag_bag):
for dag_id, dag in dag_bag.dags.items():
assert not dag.default_args["email_on_retry"]
def test_no_emails_on_failure(dag_bag):
for dag_id, dag in dag_bag.dags.items():
assert not dag.default_args["email_on_failure"]
def test_three_or_less_retries(dag_bag):
for dag_id, dag in dag_bag.dags.items():
assert dag.default_args["retries"] <= 3
def test_dag_id_contains_prefix(dag_bag):
for dag_id, dag in dag_bag.dags.items():
assert str.lower(dag_id).find("__") != -1
def test_dag_id_requires_specific_prefix(dag_bag):
for dag_id, dag in dag_bag.dags.items():
assert str.lower(dag_id).startswith("data_lake__") \
or str.lower(dag_id).startswith("redshift_demo__")

If you are building custom Airflow Operators, additional unit, functional, and integration tests are recommended.

Fork and Pull

We can improve on the practice of pushing directly to Trunk by implementing one of two collaborative development models, recommended by GitHub:

  1. The Shared repository model: uses ‘topic’ branches, which are reviewed, approved, and merged into the main branch.
  2. Fork and pull model: a repo is forked, changes are made, a pull request is created, the request is reviewed, and if approved, merged into the main branch.

In the fork and pull model, we create a fork of the DAG repository where we make our changes. We then commit and push those changes back to the forked repository. When ready, we create a pull request. If the pull request is approved and passes all the tests, it is manually or automatically merged into the main branch. DAGs are then synced to S3 and, eventually, to MWAA. I usually prefer to trigger merges manually once all tests have passed.

The fork and pull model greatly reduces the chance that bad code is merged to the main branch before passing all tests.

Errors are caught early in the fork and pull model prior to merging code changes

Syncing DAGs to S3

The second GitHub Action in the GitHub project, sync_dags.yml, is triggered when the previous Action, test_dags.yml, completes successfully, or in the case of the folk and pull method, the merge to the main branch is successful.

name: Sync DAGs

on:
workflow_run:
workflows:
- 'Test DAGs'
types:
- completed
pull_request:
types:
- closed

jobs:
deploy:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
steps:
- uses: actions/checkout@master
- uses: jakejarvis/s3-sync-action@master
env:
AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: 'us-east-1'
SOURCE_DIR: 'dags'
DEST_DIR: 'dags'

The GitHub Action, sync_dags.yml, requires three GitHub encrypted secrets, created in advance and associated with the GitHub repository. According to GitHub, secrets are encrypted environment variables you create in an organization, repository, or repository environment. Encrypted secrets allow you to store sensitive information, such as access tokens, in your repository. The secrets that you create are available to use in GitHub Actions workflows.

Encrypted repository secrets used by GitHub Action to sync with Amazon S3

The DAGs are synced to Amazon S3 and, eventually, automatically synced to MWAA.

GitHub Action syncs DAGs to Amazon S3 if tests are successful

Local Testing and Git Hooks

To further improve your CI/CD workflows, you should consider using Git Hooks. Using Git Hooks, we can ensure code is tested locally before committing and pushing changes to GitHub. Testing locally allows us to fail-faster, catching errors during development instead of once code is pushed to GitHub.

Errors are caught even early using Git Hooks

According to the documentation, Git has a way to fire off custom scripts when certain important actions occur. There are two types of hooks: client-side and server-side. Client-side hooks are triggered by operations such as committing and merging, while server-side hooks run on network operations such as receiving pushed commits.

You can use these hooks for all sorts of reasons. I often use a client-side pre-commit hook to format DAGs using black. Using a client-side pre-push Git Hook, we will ensure that tests are run before pushing the DAGs to GitHub. According to Git, The pre-push hook runs when the git push command is executed after the remote refs have been updated but before any objects have been transferred. You can use it to validate a set of ref updates before a push occurs. A non-zero exit code will abort the push. The test could instead be run as part of the pre-commit hook if they are not too time-consuming.

To use the pre-push hook, create the following file within the local repository, .git/hooks/pre-push:

#!/bin/sh
# do nothing if there are no commits to push
if [ -z "$(git log @{u}..)" ]; then
exit 0
fi
sh ./run_tests_locally.sh

Then, run the following chmod command to make the hook executable:

chmod 755 .git/hooks/pre-push

The the pre-push hook runs the shell script, run_tests_locally.sh. The script executes nearly identical tests, locally, as the GitHub Action, test_dags.yml, does remotely on GitHub:

#!/bin/sh
echo "Starting Flake8 test..."
flake8 --ignore E501 dags --benchmark || exit 1
echo "Starting Black test..."
python3 -m pytest --cache-clear
python3 -m pytest dags/ --black -v || exit 1
echo "Starting Pytest tests..."
cd tests || exit
python3 -m pytest tests.py -v || exit 1
echo "All tests completed successfully! 🥳"

Complete CI/CD workflow including running tests locally using a Git Hook (video only)

References

Here are some additional references for testing and deploying Airflow DAGs and the use of GitHub Actions:

This blog represents my own viewpoints and not of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners.

, , , , , , ,

Leave a comment

Continuous Integration and Deployment of Docker Images using GitHub Actions

According to GitHub, GitHub Actions allows you to automate, customize, and execute your software development workflows right in your repository. You can discover, create, and share actions to perform any job you would like, including continuous integration (CI) and continuous deployment (CD), and combine actions in a completely customized workflow.

This brief post will examine a simple use case for GitHub Actions — automatically building and pushing a new Docker image to Docker Hub. A GitHub Actions workflow will be triggered every time a new Git tag is pushed to the GitHub project repository.

GitHub Actions Workflow running, based on the push of a new git tag

GitHub Project Repository

For the demonstration, we will be using the public NLP Client microservice GitHub project repository. The NLP Client, written in Go, is part of five microservices that comprise the Natural Language Processing (NLP) API. I developed this API to demonstrate architectural principles and DevOps practices. The API’s microservices are designed to be run as a distributed system using container orchestration platforms such as Docker Swarm, Red Hat OpenShift, Amazon ECS, and Kubernetes.

Public NLP Client GitHub project repository

Encrypted Secrets

To push new images to Docker Hub, the workflow must be logged in to your Docker Hub account. GitHub recommends storing your Docker Hub username and password as encrypted secrets, so they are not exposed in your workflow file. Encrypted secrets allow you to store sensitive information as encrypted environment variables in your organization, repository, or repository environment. The secrets that you create will be available to use in GitHub Actions workflows. To allow the workflow to log in to Docker Hub, I created two secrets, DOCKERHUB_USERNAME and DOCKERHUB_PASSWORD using my organization’s credentials, which I then reference in the workflow.

Actions Secrets shown in the GitHub project’s Secrets tab

GitHub Actions Workflow

According to GitHub, a workflow is a configurable automated process made up of one or more jobs. You must create a YAML file to define your workflow configuration. GitHub contains many searchable code examples you can use to bootstrap your workflow development. For this demonstration, I started with the example shown in the GitHub Actions Guide, Publishing Docker images, and modified it to meet my needs. Workflow files are checked into the project’s repository within the .github/workflows directory.https://itnext.io/media/0e27d26012167bab83def6ef3595a74f

Workflow Development

Visual Studio Code (VS Code) is an excellent, full-featured, and free IDE for software development and writing Infrastructure as Code (IaC). VS Code has a large ecosystem of extensions, including extensions for GitHub Actions. Currently, I am using the GitHub Actions extension, cschleiden.vscode-github-actions, by Christopher Schleiden.

The extension features auto-complete, as shown below in the GitHub Actions workflow YAML file.

Auto-complete example using the GitHub Actions extension

Git Tags

The demonstration’s workflow is designed to be triggered when a new Git tag is pushed to the NLP Client project repository. Using the workflow, you can perform normal pushes (git push) to the repository without triggering the workflow. For example, you would not typically want to trigger a new image build and push when updating the project’s README file. Thus, we use the new Git tag as the workflow trigger.

Pushing a new tag to GitHub
Git tags as shown in the GitHub project repository

For consistency, I also designed the workflow to be triggered only when the format of the Git tag follows the common Semantic Versioning (SemVer) convention of version number MAJOR.MINOR.PATCH (v*.*.*).

on:
push:
tags:
- 'v*.*.*'

Also, following common Docker conventions in the workflow, the Git tag (e.g., v1.2.3) is truncated to remove the letter ‘v’ and used as the tag for the Docker image (e.g., 1.2.3). In the workflow, theGITHUB_REF:11 portion of the command truncates the Git tag reference of refs/tags/v1.2.3 to just 1.2.3.

run: echo "RELEASE_VERSION=${GITHUB_REF:11}" >> $GITHUB_ENV

Workflow Run

Pushing the Git tag triggers the workflow to run automatically, as seen in the Actions tab.

GitHub Actions Workflow running, based on the push of a new git tag
GitHub Actions Workflow running, based on the push of a new git tag

Detailed logs show you how each step in the workflow was processed.

GitHub Actions Workflow running, based on the push of a new git tag

The example below shows that the workflow has successfully built and pushed a new Docker image to Docker Hub for the NLP Client microservice.

Completed GitHub Actions Workflow run

Failure Notifications

You can choose to receive a notification when a workflow fails. GitHub Actions notifications are a configurable option found in the GitHub account owner’s Settings tab.

Example email notification of workflow run failure

Status Badge

You can display a status badge in your repository to indicate the status of your workflows. The badge can be added as Markdown to your README file.

Public NLP Client GitHub project’s README displaying the status badge

Docker Hub

As a result of the successful completion of the workflow, we now have a new image tagged as 1.2.3 in the NLP Client Docker Hub repository: garystafford/nlp-client.

NLP Client Docker Hub repository showing new image tag

Conclusion

In this brief post, we saw a simple example of how GitHub Actions allows you to automate, customize, and execute your software development workflows right in your GitHub repository. We can easily extend this post’s GitHub Actions example to include updating the service’s Kubernetes Deployment resource file to the latest image tag in Docker Hub. Further, we can trigger a GitOps workflow with tools such as Weaveworks’ Flux or Argo CD to deploy the revised workload to a Kubernetes cluster.

Deployed NLP API as seen from Argo CD

This blog represents my own viewpoints and not of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners.

, , ,

Leave a comment

Managing AWS Infrastructure as Code using Ansible, CloudFormation, and CodeBuild

Introduction

When it comes to provisioning and configuring resources on the AWS cloud platform, there is a wide variety of services, tools, and workflows you could choose from. You could decide to exclusively use the cloud-based services provided by AWS, such as CodeBuild, CodePipeline, CodeStar, and OpsWorks. Alternatively, you could choose open-source software (OSS) for provisioning and configuring AWS resources, such as community editions of Jenkins, HashiCorp Terraform, Pulumi, Chef, and Puppet. You might also choose to use licensed products, such as Octopus Deploy, TeamCity, CloudBees Core, Travis CI Enterprise, and XebiaLabs XL Release. You might even decide to write your own custom tools or scripts in Python, Go, JavaScript, Bash, or other common languages.

The reality in most enterprises I have worked with, teams integrate a combination of AWS services, open-source software, custom scripts, and occasionally licensed products to construct complete, end-to-end, infrastructure as code-based workflows for provisioning and configuring AWS resources. Choices are most often based on team experience, vendor relationships, and an enterprise’s specific business use cases.

In the following post, we will explore one such set of easily-integrated tools for provisioning and configuring AWS resources. The tool-stack is comprised of Red Hat Ansible, AWS CloudFormation, and AWS CodeBuild, along with several complementary AWS technologies. Using these tools, we will provision a relatively simple AWS environment, then deploy, configure, and test a highly-available set of Apache HTTP Servers. The demonstration is similar to the one featured in a previous post, Getting Started with Red Hat Ansible for Google Cloud Platform.

ansible-aws-stack2.png

Why Ansible?

With its simplicity, ease-of-use, broad compatibility with most major cloud, database, network, storage, and identity providers amongst other categories, Ansible has been a popular choice of Engineering teams for configuration-management since 2012. Given the wide variety of polyglot technologies used within modern Enterprises and the growing predominance of multi-cloud and hybrid cloud architectures, Ansible provides a common platform for enabling mature DevOps and infrastructure as code practices. Ansible is easily integrated with higher-level orchestration systems, such as AWS CodeBuild, Jenkins, or Red Hat AWX and Tower.

Technologies

The primary technologies used in this post include the following.

Red Hat Ansible

ansibleAnsible, purchased by Red Hat in October 2015, seamlessly provides workflow orchestration with configuration management, provisioning, and application deployment in a single platform. Unlike similar tools, Ansible’s workflow automation is agentless, relying on Secure Shell (SSH) and Windows Remote Management (WinRM). If you are interested in learning more on the advantages of Ansible, they’ve published a whitepaper on The Benefits of Agentless Architecture.

According to G2 Crowd, Ansible is a clear leader in the Configuration Management Software category, ranked right behind GitLab. Competitors in the category include GitLab, AWS Config, Puppet, Chef, Codenvy, HashiCorp Terraform, Octopus Deploy, and JetBrains TeamCity.

AWS CloudFormation

Deployment__Management_copy_AWS_CloudFormation-512

According to AWS, CloudFormation provides a common language to describe and provision all the infrastructure resources within AWS-based cloud environments. CloudFormation allows you to use a JSON- or YAML-based template to model and provision, in an automated and secure manner, all the resources needed for your applications across all AWS regions and accounts.

Codifying your infrastructure, often referred to as ‘Infrastructure as Code,’ allows you to treat your infrastructure as just code. You can author it with any IDE, check it into a version control system, and review the files with team members before deploying it.

AWS CodeBuild

code-build-console-iconAccording to AWS, CodeBuild is a fully managed continuous integration service that compiles your source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue.

CloudBuild integrates seamlessly with other AWS Developer tools, including CodeStar, CodeCommit, CodeDeploy, and CodePipeline.

According to G2 Crowd, the main competitors to AWS CodeBuild, in the Build Automation Software category, include Jenkins, CircleCI, CloudBees Core and CodeShip, Travis CI, JetBrains TeamCity, and Atlassian Bamboo.

Other Technologies

In addition to the major technologies noted above, we will also be leveraging the following services and tools to a lesser extent, in the demonstration:

  • AWS CodeCommit
  • AWS CodePipeline
  • AWS Systems Manager Parameter Store
  • Amazon Simple Storage Service (S3)
  • AWS Identity and Access Management (IAM)
  • AWS Command Line Interface (CLI)
  • CloudFormation Linter
  • Apache HTTP Server

Demonstration

Source Code

All source code for this post is contained in two GitHub repositories. The CloudFormation templates and associated files are in the ansible-aws-cfn GitHub repository. The Ansible Roles and related files are in the ansible-aws-roles GitHub repository. Both repositories may be cloned using the following commands.

git clone --branch master --single-branch --depth 1 --no-tags \ 
  https://github.com/garystafford/ansible-aws-cfn.git

git clone --branch master --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/ansible-aws-roles.git

Development Process

The general process we will follow for provisioning and configuring resources in this demonstration are as follows:

  • Create an S3 bucket to store the validated CloudFormation templates
  • Create an Amazon EC2 Key Pair for Ansible
  • Create two AWS CodeCommit Repositories to store the project’s source code
  • Put parameters in Parameter Store
  • Write and test the CloudFormation templates
  • Configure Ansible and AWS Dynamic Inventory script
  • Write and test the Ansible Roles and Playbooks
  • Write the CodeBuild build specification files
  • Create an IAM Role for CodeBuild and CodePipeline
  • Create and test CodeBuild Projects and CodePipeline Pipelines
  • Provision, deploy, and configure the complete web platform to AWS
  • Test the final web platform

Prerequisites

For this demonstration, I will assume you already have an AWS account, the AWS CLI, Python, and Ansible installed locally, an S3 bucket to store the final CloudFormation templates and an Amazon EC2 Key Pair for Ansible to use for SSH.

 Continuous Integration and Delivery Overview

In this demonstration, we will be building multiple CI/CD pipelines for provisioning and configuring our resources to AWS, using several AWS services. These services include CodeCommit, CodeBuild, CodePipeline, Systems Manager Parameter Store, and Amazon Simple Storage Service (S3). The diagram below shows the complete CI/CD workflow we will build using these AWS services, along with Ansible.

aws_devops

AWS CodeCommit

According to Amazon, AWS CodeCommit is a fully-managed source control service that makes it easy to host secure and highly scalable private Git repositories. CodeCommit eliminates the need to operate your own source control system or worry about scaling its infrastructure.

Start by creating two AWS CodeCommit repositories to hold the two GitHub projects your cloned earlier. Commit both projects to your own AWS CodeCommit repositories.

screen_shot_2019-07-26_at_9_02_54_pm

Configuration Management

We have several options for storing the configuration values necessary to provision and configure the resources on AWS. We could set configuration values as environment variables directly in CodeBuild. We could set configuration values from within our Ansible Roles. We could use AWS Systems Manager Parameter Store to store configuration values. For this demonstration, we will use a combination of all three options.

AWS Systems Manager Parameter Store

According to Amazon, AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, and license codes as parameter values, as either plain text or encrypted.

The demonstration uses two CloudFormation templates. The two templates have several parameters. A majority of those parameter values will be stored in Parameter Store, retrieved by CloudBuild, and injected into the CloudFormation template during provisioning.

screen_shot_2019-07-26_at_9_38_33_pm

The Ansible GitHub project includes a shell script, parameter_store_values.sh, to put the necessary parameters into Parameter Store. The script requires the AWS Command Line Interface (CLI) to be installed locally. You will need to change the KEY_PATH key value in the script (snippet shown below) to match the location your private key, part of the Amazon EC2 Key Pair you created earlier for use by Ansible.

KEY_PATH="/path/to/private/key"

# put encrypted parameter to Parameter Store
aws ssm put-parameter \
  --name $PARAMETER_PATH/ansible_private_key \
  --type SecureString \
  --value "file://${KEY_PATH}" \
  --description "Ansible private key for EC2 instances" \
  --overwrite

SecureString

Whereas all other parameters are stored in Parameter Store as String datatypes, the private key is stored as a SecureString datatype. Parameter Store uses an AWS Key Management Service (KMS) customer master key (CMK) to encrypt the SecureString parameter value. The IAM Role used by CodeBuild (discussed later) will have the correct permissions to use the KMS key to retrieve and decrypt the private key SecureString parameter value.

screen_shot_2019-07-26_at_9_41_42_pm

CloudFormation

The demonstration uses two CloudFormation templates. The first template, network-stack.template, contains the AWS network stack resources. The template includes one VPC, one Internet Gateway, two NAT Gateways, four Subnets, two Elastic IP Addresses, and associated Route Tables and Security Groups. The second template, compute-stack.template, contains the webserver compute stack resources. The template includes an Auto Scaling Group, Launch Configuration, Application Load Balancer (ALB), ALB Listener, ALB Target Group, and an Instance Security Group. Both templates originated from the AWS CloudFormation template sample library, and were modified for this demonstration.

The two templates are located in the cfn_templates directory of the CloudFormation project, as shown below in the tree view.

.
├── LICENSE.md
├── README.md
├── buildspec_files
│   ├── build.sh
│   └── buildspec.yml
├── cfn_templates
│   ├── compute-stack.template
│   └── network-stack.template
├── codebuild_projects
│   ├── build.sh
│   └── cfn-validate-s3.json
├── codepipeline_pipelines
│   ├── build.sh
│   └── cfn-validate-s3.json
└── requirements.txt

The templates require no modifications for the demonstration. All parameters are in Parameter store or set by the Ansible Roles, and consumed by the Ansible Playbooks via CodeBuild.

Ansible

We will use Red Hat Ansible to provision the network and compute resources by interacting directly with CloudFormation, deploy and configure Apache HTTP Server, and finally, perform final integration tests of the system. In my opinion, the closest equivalent to Ansible on the AWS platform is AWS OpsWorks. OpsWorks lets you use Chef and Puppet (direct competitors to Ansible) to automate how servers are configured, deployed, and managed across Amazon EC2 instances or on-premises compute environments.

Ansible Config

To use Ansible with AWS and CloudFormation, you will first want to customize your project’s ansible.cfg file to enable the aws_ec2 inventory plugin. Below is part of my configuration file as a reference.

[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp
fact_caching_timeout = 300

host_key_checking = False
roles_path = roles
inventory = inventories/hosts
remote_user = ec2-user
private_key_file = ~/.ssh/ansible

[inventory]
enable_plugins = host_list, script, yaml, ini, auto, aws_ec2

Ansible Roles

According to Ansible, Roles are ways of automatically loading certain variable files, tasks, and handlers based on a known file structure. Grouping content by roles also allows easy sharing of roles with other users. For the demonstration, I have written four roles, located in the roles directory, as shown below in the project tree view. The default, common role is not used in this demonstration.

.
├── LICENSE.md
├── README.md
├── ansible.cfg
├── buildspec_files
│   ├── buildspec_compute.yml
│   ├── buildspec_integration_tests.yml
│   ├── buildspec_network.yml
│   └── buildspec_web_config.yml
├── codebuild_projects
│   ├── ansible-test.json
│   ├── ansible-web-config.json
│   ├── build.sh
│   ├── cfn-compute.json
│   ├── cfn-network.json
│   └── notes.md
├── filter_plugins
├── group_vars
├── host_vars
├── inventories
│   ├── aws_ec2.yml
│   ├── ec2.ini
│   ├── ec2.py
│   └── hosts
├── library
├── module_utils
├── notes.md
├── parameter_store_values.sh
├── playbooks
│   ├── 10_cfn_network.yml
│   ├── 20_cfn_compute.yml
│   ├── 30_web_config.yml
│   └── 40_integration_tests.yml
├── production
├── requirements.txt
├── roles
│   ├── cfn_compute
│   ├── cfn_network
│   ├── common
│   ├── httpd
│   └── integration_tests
├── site.yml
└── staging

The four roles include a role for provisioning the network, the cfn_network role. A role for configuring the compute resources, the cfn_compute role. A role for deploying and configuring the Apache servers, the httpd role. Finally, a role to perform final integration tests of the platform, the integration_tests role. The individual roles help separate the project’s major parts, network, compute, and middleware, into logical code files. Each role was initially built using Ansible Galaxy (ansible-galaxy init). They follow Galaxy’s standard file structure, as shown in the tree view below, of the cfn_network role.

.
├── README.md
├── defaults
│   └── main.yml
├── files
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── tasks
│   ├── create.yml
│   ├── delete.yml
│   └── main.yml
├── templates
├── tests
│   ├── inventory
│   └── test.yml
└── vars
    └── main.yml

Testing Ansible Roles

In addition to checking each role during development and on each code commit with Ansible Lint, each role contains a set of unit tests, in the tests directory, to confirm the success or failure of the role’s tasks. Below we see a basic set of tests for the cfn_compute role. First, we gather Facts about the deployed EC2 instances. Facts information Ansible can automatically derive from your remote systems. We check the facts for expected properties of the running EC2 instances, including timezone, Operating System, major OS version, and the UserID. Note the use of the failed_when conditional. This Ansible playbook error handling conditional is used to confirm the success or failure of tasks.

---
- name: Test cfn_compute Ansible role
  gather_facts: True
  hosts: tag_Group_webservers

  pre_tasks:
  - name: List all ansible facts
    debug:
      msg: "{{ ansible_facts }}"

  tasks:
  - name: Check if EC2 instance's timezone is set to 'UTC'
    debug:
      msg: Timezone is UTC
    failed_when: ansible_facts['date_time']['tz'] != 'UTC'

  - name: Check if EC2 instance's OS is 'Amazon'
    debug:
      msg: OS is Amazon
    failed_when: ansible_facts['distribution_file_variety'] != 'Amazon'

  - name: Check if EC2 instance's OS major version is '2018'
    debug:
      msg: OS major version is 2018
    failed_when: ansible_facts['distribution_major_version'] != '2018'

  - name: Check if EC2 instance's UserID is 'ec2-user'
    debug:
      msg: UserID is ec2-user
    failed_when: ansible_facts['user_id'] != 'ec2-user'

If we were to run the test on their own, against the two correctly provisioned and configured EC2 web servers, we would see results similar to the following.

screen_shot_2019-07-26_at_6_55_04_pm

In the cfn_network role unit tests, below, note the use of the Ansible cloudformation_facts module. This module allows us to obtain facts about the successfully completed AWS CloudFormation stack. We can then use these facts to drive additional provisioning and configuration, or testing. In the task below, we get the network CloudFormation stack’s Outputs. These are the exact same values we would see in the stack’s Output tab, from the AWS CloudFormation management console.

---
- name: Test cfn_network Ansible role
  gather_facts: False
  hosts: localhost

  pre_tasks:
    - name: Get facts about the newly created cfn network stack
      cloudformation_facts:
        stack_name: "ansible-cfn-demo-network"
      register: cfn_network_stack_facts

    - name: List 'stack_outputs' from cached facts
      debug:
        msg: "{{ cloudformation['ansible-cfn-demo-network'].stack_outputs }}"

  tasks:
  - name: Check if the AWS Region of the VPC is {{ lookup('env','AWS_REGION') }}
    debug:
      msg: "AWS Region of the VPC is {{ lookup('env','AWS_REGION') }}"
    failed_when: cloudformation['ansible-cfn-demo-network'].stack_outputs['VpcRegion'] != lookup('env','AWS_REGION')

Similar to the CloudFormation templates, the Ansible roles require no modifications. Most of the project’s parameters are decoupled from the code and stored in Parameter Store or CodeBuild buildspec files (discussed next). The few parameters found in the roles, in the defaults/main.yml files are neither account- or environment-specific.

Ansible Playbooks

The roles will be called by our Ansible Playbooks. There is a create and a delete set of tasks for the cfn_network and cfn_compute roles. Either create or delete tasks are accessible through the role, using the main.yml file and referencing the create or delete Ansible Tags.

---
- import_tasks: create.yml
  tags:
    - create

- import_tasks: delete.yml
  tags:
    - delete

Below, we see the create tasks for the cfn_network role, create.yml, referenced above by main.yml. The use of the cloudcormation module in the first task allows us to create or delete AWS CloudFormation stacks and demonstrates the real power of Ansible—the ability to execute complex AWS resource provisioning, by extending its core functionality via a module. By switching the Cloud module, we could just as easily provision resources on Google Cloud, Azure, AliCloud, OpenStack, or VMWare, to name but a few.

---
- name: create a stack, pass in the template via an S3 URL
  cloudformation:
    stack_name: "{{ stack_name }}"
    state: present
    region: "{{ lookup('env','AWS_REGION') }}"
    disable_rollback: false
    template_url: "{{ lookup('env','TEMPLATE_URL') }}"
    template_parameters:
      VpcCIDR: "{{ lookup('env','VPC_CIDR') }}"
      PublicSubnet1CIDR: "{{ lookup('env','PUBLIC_SUBNET_1_CIDR') }}"
      PublicSubnet2CIDR: "{{ lookup('env','PUBLIC_SUBNET_2_CIDR') }}"
      PrivateSubnet1CIDR: "{{ lookup('env','PRIVATE_SUBNET_1_CIDR') }}"
      PrivateSubnet2CIDR: "{{ lookup('env','PRIVATE_SUBNET_2_CIDR') }}"
      TagEnv: "{{ lookup('env','TAG_ENVIRONMENT') }}"
    tags:
      Stack: "{{ stack_name }}"

The CloudFormation parameters in the above task are mainly derived from environment variables, whose values were retrieved from the Parameter Store by CodeBuild and set in the environment. We obtain these external values using Ansible’s Lookup Plugins. The stack_name variable’s value is derived from the role’s defaults/main.yml file. The task variables use the Python Jinja2 templating system style of encoding.

variables

The associated Ansible Playbooks, which call the tasks, are located in the playbooks directory, as shown previously in the tree view. The playbooks define a few required parameters, like where the list of hosts will be derived and calls the appropriate roles. For our simple demonstration, only a single role is called per playbook. Typically, in a larger project, you would call multiple roles from a single playbook. Below, we see the Network playbook, playbooks/10_cfn_network.yml, which calls the cfn_network role.

---
- name: Provision VPC and Subnets
  hosts: localhost
  connection: local
  gather_facts: False

  roles:
    - role: cfn_network

Dynamic Inventory

Another principal feature of Ansible is demonstrated in the Web Server Configuration playbook, playbooks/30_web_config.yml, shown below. Note the hosts to which we want to deploy and configure Apache HTTP Server is based on an AWS tag value, indicated by the reference to tag_Group_webservers. This indirectly refers to an AWS tag, named Group, with the value of webservers, which was applied to our EC2 hosts by CloudFormation. The ability to generate a Dynamic Inventory, using a dynamic external inventory system, is a key feature of Ansible.

---
- name: Configure Apache Web Servers
  hosts: tag_Group_webservers
  gather_facts: False
  become: yes
  become_method: sudo

  roles:
    - role: httpd

To generate a dynamic inventory of EC2 hosts, we are using the Ansible AWS EC2 Dynamic Inventory script, inventories/ec2.py and inventories/ec2.ini files. The script dynamically queries AWS for all the EC2 hosts containing specific AWS tags, belonging to a particular Security Group, Region, Availability Zone, and so forth.

I have customized the AWS EC2 Dynamic Inventory script’s configuration in the inventories/aws_ec2.yml file. Amongst other configuration items, the file defines  keyed_groups. This instructs the script to inventory EC2 hosts according to their unique AWS tags and tag values.

plugin: aws_ec2
remote_user: ec2-user
private_key_file: ~/.ssh/ansible
regions:
  - us-east-1
keyed_groups:
  - key: tags.Name
    prefix: tag_Name_
    separator: ''
  - key: tags.Group
    prefix: tag_Group_
    separator: ''
hostnames:
  - dns-name
  - ip-address
  - private-dns-name
  - private-ip-address
compose:
  ansible_host: ip_address

Once you have built the CloudFormation compute stack in the proceeding section of the demonstration, to build the dynamic EC2 inventory of hosts, you would use the following command.

ansible-inventory -i inventories/aws_ec2.yml --graph

You would then see an inventory of all your EC2 hosts, resembling the following.

@all:
  |--@aws_ec2:
  |  |--ec2-18-234-137-73.compute-1.amazonaws.com
  |  |--ec2-3-95-215-112.compute-1.amazonaws.com
  |--@tag_Group_webservers:
  |  |--ec2-18-234-137-73.compute-1.amazonaws.com
  |  |--ec2-3-95-215-112.compute-1.amazonaws.com
  |--@tag_Name_Apache_Web_Server:
  |  |--ec2-18-234-137-73.compute-1.amazonaws.com
  |  |--ec2-3-95-215-112.compute-1.amazonaws.com
  |--@ungrouped:

Note the two EC2 web servers instances, listed under tag_Group_webservers. They represent the target inventory onto which we will install Apache HTTP Server. We could also use the tag, Name, with the value tag_Name_Apache_Web_Server.

AWS CodeBuild

Recalling our diagram, you will note the use of CodeBuild is a vital part of each of our five DevOps workflows. CodeBuild is used to 1) validate the CloudFormation templates, 2) provision the network resources,  3) provision the compute resources, 4) install and configure the web servers, and 5) run integration tests.

aws_devops

Splitting these processes into separate workflows, we can redeploy the web servers without impacting the compute resources or redeploy the compute resources without affecting the network resources. Often, different teams within a large enterprise are responsible for each of these resources categories—architecture, security (IAM), network, compute, web servers, and code deployments. Separating concerns makes a shared ownership model easier to manage.

Build Specifications

CodeBuild projects rely on a build specification or buildspec file for its configuration, as shown below. CodeBuild’s buildspec file is synonymous to Jenkins’ Jenkinsfile. Each of our five workflows will use CodeBuild. Each CodeBuild project references a separate buildspec file, included in the two GitHub projects, which by now you have pushed to your two CodeCommit repositories.

screen_shot_2019-07-26_at_6_10_59_pm

Below we see an example of the buildspec file for the CodeBuild project that deploys our AWS network resources, buildspec_files/buildspec_network.yml.

version: 0.2

env:
  variables:
    TEMPLATE_URL: "https://s3.amazonaws.com/garystafford_cloud_formation/cf_demo/network-stack.template"
    AWS_REGION: "us-east-1"
    TAG_ENVIRONMENT: "ansible-cfn-demo"
  parameter-store:
    VPC_CIDR: "/ansible_demo/vpc_cidr"
    PUBLIC_SUBNET_1_CIDR: "/ansible_demo/public_subnet_1_cidr"
    PUBLIC_SUBNET_2_CIDR: "/ansible_demo/public_subnet_2_cidr"
    PRIVATE_SUBNET_1_CIDR: "/ansible_demo/private_subnet_1_cidr"
    PRIVATE_SUBNET_2_CIDR: "/ansible_demo/private_subnet_2_cidr"

phases:
  install:
    runtime-versions:
      python: 3.7
    commands:
      - pip install -r requirements.txt -q
  build:
    commands:
      - ansible-playbook -i inventories/aws_ec2.yml playbooks/10_cfn_network.yml --tags create  -v
  post_build:
    commands:
      - ansible-playbook -i inventories/aws_ec2.yml roles/cfn_network/tests/test.yml

There are several distinct sections to the buildspec file. First, in the variables section, we define our variables. They are a combination of three static variable values and five variable values retrieved from the Parameter Store. Any of these may be overwritten at build-time, using the AWS CLI, SDK, or from the CodeBuild management console. You will need to update some of the variables to match your particular environment, such as the TEMPLATE_URL to match your S3 bucket path.

Next, the phases of our build. Again, if you are familiar with Jenkins, think of these as Stages with multiple Steps. The first phase, install, builds a Docker container, in which the build process is executed. Here we are using Python 3.7. We also run a pip command to install the required Python packages from our requirements.txt file. Next, we perform our build phase by executing an Ansible command.

 ansible-playbook \
  -i inventories/aws_ec2.yml \
  playbooks/10_cfn_network.yml --tags create -v

The command calls our playbook, playbooks/10_cfn_network.yml. The command references the create tag. This causes the playbook to run to cfn_network role’s create tasks (roles/cfn_network/tasks/create.yml), as defined in the main.yml file (roles/cfn_network/tasks/main.yml). Lastly, in our post_build phase, we execute our role’s unit tests (roles/cfn_network/tests/test.yml), using a second Ansible command.

CodeBuild Projects

Next, we need to create CodeBuild projects. You can do this using the AWS CLI or from the CodeBuild management console (shown below). I have included individual templates and a creation script in each project, in the codebuild_projects directory, which you could use to build the projects, using the AWS CLI. You would have to modify the JSON templates, replacing all references to my specific, unique AWS resources, with your own. For the demonstration, I suggest creating the five projects manually in the CodeBuild management console, using the supplied CodeBuild project templates as a guide.

screen_shot_2019-07-26_at_6_10_12_pm

CodeBuild IAM Role

To execute our CodeBuild projects, we need an IAM Role or Roles CodeBuild with permission to such resources as CodeCommit, S3, and CloudWatch. For this demonstration, I chose to create a single IAM Role for all workflows. I then allowed CodeBuild to assign the required policies to the Role as needed, which is a feature of CodeBuild.

screen_shot_2019-07-26_at_6_52_23_pm

CodePipeline Pipeline

In addition to CodeBuild, we are using CodePipeline for our first of five workflows. CodePipeline validates the CloudFormation templates and pushes them to our S3 bucket. The pipeline calls the corresponding CodeBuild project to validate each template, then deploys the valid CloudFormation templates to S3.

codepipeline

In true CI/CD fashion, the pipeline is automatically executed every time source code from the CloudFormation project is committed to the CodeCommit repository.

screen_shot_2019-07-26_at_6_12_51_pm

CodePipeline calls CodeBuild, which performs a build, based its buildspec file. This particular CodeBuild buildspec file also demonstrates another ability of CodeBuild, executing an external script. When we have a complex build phase, we may choose to call an external script, such as a Bash or Python script, verses embedding the commands in the buildspec.

version: 0.2

phases:
  install:
    runtime-versions:
      python: 3.7
  pre_build:
    commands:
      - pip install -r requirements.txt -q
      - cfn-lint -v
  build:
    commands:
      - sh buildspec_files/build.sh

artifacts:
  files:
    - '**/*'
  base-directory: 'cfn_templates'
  discard-paths: yes

Below, we see the script that is called. Here we are using both the CloudFormation Linter, cfn-lint, and the cloudformation validate-template command to validate our templates for comparison. The two tools give slightly different, yet relevant, linting results.

#!/usr/bin/env bash

set -e

for filename in cfn_templates/*.*; do
    cfn-lint -t ${filename}
    aws cloudformation validate-template \
      --template-body file://${filename}
done

Similar to the CodeBuild project templates, I have included a CodePipeline template, in the codepipeline_pipelines directory, which you could modify and create using the AWS CLI. Alternatively, I suggest using the CodePipeline management console to create the pipeline for the demo, using the supplied CodePipeline template as a guide.

screen_shot_2019-07-26_at_6_11_51_pm

Below, the stage view of the final CodePipleine pipeline.

screen_shot_2019-07-26_at_6_12_26_pm

Build the Platform

With all the resources, code, and DevOps workflows in place, we should be ready to build our platform on AWS. The CodePipeline project comes first, to validate the CloudFormation templates and place them into your S3 bucket. Since you are probably not committing new code to the CloudFormation file CodeCommit repository,  which would trigger the pipeline, you can start the pipeline using the AWS CLI (shown below) or via the management console.

# list names of pipelines
aws codepipeline list-pipelines

# execute the validation pipeline
aws codepipeline start-pipeline-execution --name cfn-validate-s3

screen_shot_2019-07-26_at_6_08_03_pm

The pipeline should complete within a few seconds.

screen_shot_2019-07-26_at_10_12_53_pm.png

Next, execute each of the four CodeBuild projects in the following order.

# list the names of the projects
aws codebuild list-projects

# execute the builds in order
aws codebuild start-build --project-name cfn-network
aws codebuild start-build --project-name cfn-compute

# ensure EC2 instance checks are complete before starting
# the ansible-web-config build!
aws codebuild start-build --project-name ansible-web-config
aws codebuild start-build --project-name ansible-test

As the code comment above states, be careful not to start the ansible-web-config build until you have confirmed the EC2 instance Status Checks have completed and have passed, as shown below. The previous, cfn-compute build will complete when CloudFormation finishes building the new compute stack. However, the fact CloudFormation finished does not indicate that the EC2 instances are fully up and running. Failure to wait will result in a failed build of the ansible-web-config CodeBuild project, which installs and configures the Apache HTTP Servers.

screen_shot_2019-07-26_at_6_27_52_pm

Below, we see the cfn_network CodeBuild project first building a Python-based Docker container, within which to perform the build. Each build is executed in a fresh, separate Docker container, something that can trip you up if you are expecting a previous cache of Ansible Facts or previously defined environment variables, persisted across multiple builds.

screen_shot_2019-07-26_at_6_15_12_pm

Below, we see the two completed CloudFormation Stacks, a result of our CodeBuild projects and Ansible.

screen_shot_2019-07-26_at_6_44_43_pm

The fifth and final CodeBuild build tests our platform by attempting to hit the Apache HTTP Server’s default home page, using the Application Load Balancer’s public DNS name.

screen_shot_2019-07-26_at_6_32_09_pm

Below, we see an example of what happens when a build fails. In this case, one of the final integration tests failed to return the expected results from the ALB endpoint.

screen_shot_2019-07-26_at_6_40_37_pm

Below, with the bug is fixed, we rerun the build, which re-executed the tests, successfully.

screen_shot_2019-07-26_at_6_38_21_pm

We can manually confirm the platform is working by hitting the same public DNS name of the ALB as our tests in our browser. The request should load-balance our request to one of the two running web server’s default home page. Normally, at this point, you would deploy your application to Apache, using a software continuous deployment tool, such as Jenkins, CodeDeploy, Travis CI, TeamCity, or Bamboo.

screen_shot_2019-07-26_at_6_39_26_pm

Cleaning Up

To clean up the running AWS resources from the demonstration, first delete the CloudFormation compute stack, then delete the network stack. To do so, execute the following commands, one at a time. The commands call the same playbooks we called to create the stacks, except this time, we use the delete tag, as opposed to the create tag.

# first delete cfn compute stack
ansible-playbook \ 
  -i inventories/aws_ec2.yml \ 
  playbooks/20_cfn_compute.yml -t delete -v

# then delete cfn network stack
ansible-playbook \ 
  -i inventories/aws_ec2.yml \ 
  playbooks/10_cfn_network.yml -t delete -v

You should observe the following output, indicating both CloudFormation stacks have been deleted.

screen_shot_2019-07-26_at_7_12_38_pm

Confirm the stacks were deleted from the CloudFormation management console or from the AWS CLI.

 

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , , , , , , ,

Leave a comment

Docker Enterprise Edition: Multi-Environment, Single Control Plane Architecture for AWS

Final_DockerEE_21 (1)

Designing a successful, cloud-based containerized application platform requires a balance of performance and security with cost, reliability, and manageability. Ensuring that a platform meets all functional and non-functional requirements, while remaining within budget and is easily maintainable, can be challenging.

As Cloud Architect and DevOps Team Lead, I recently participated in the development of two architecturally similar, lightweight, cloud-based containerized application platforms. From the start, both platforms were architected to maximize security and performance, while minimizing cost and operational complexity. The later platform was built on AWS with Docker Enterprise Edition.

Docker Enterprise Edition

Released in March of this year, Docker Enterprise Edition (Docker EE) is a secure, full-featured container-based management platform. There are currently eight versions of Docker EE, available for Windows Server, Azure, AWS, and multiple Linux distros, including RHEL, CentOS, Ubuntu, SUSE, and Oracle.

Docker EE is one of several production-grade container orchestration Platforms as a Service (PaaS). Some of the other container platforms in this category include:

Docker Community Edition (CE), Kubernetes, and Apache Mesos are free and open-source. Some providers, such as Rancher Labs, offer enterprise support for an additional fee. Cloud-based services, such as Red Hat Openshift Online, AWS, GCE, and ACS, charge the typical usage monthly fee. Docker EE, similar to Mesosphere Enterprise DC/OS and Red Hat OpenShift, is priced on a per node/per year annual subscription model.

Docker EE is currently offered in three subscription tiers, including Basic, Standard, and Advanced. Additionally, Docker offers Business Day and Business Critical support. Docker EE’s Advanced Tier adds several significant features, including secure multi-tenancy with node-based isolation, and image security scanning and continuous vulnerability scanning, as part of Docker EE’s Docker Trusted Registry.

Architecting for Affordability and Maintainability

Building an enterprise-scale application platform, using public cloud infrastructure, such as AWS, and a licensed Containers-as-a-Service (CaaS) platform, such as Docker EE, can quickly become complex and costly to build and maintain. Based on current list pricing, the cost of a single Linux node ranges from USD 75 per month for basic support, up to USD 300 per month for Docker Enterprise Edition Advanced with Business Critical support. Although cost is relative to the value generated by the application platform, none the less, architects should always strive to avoid unnecessary complexity and cost.

Reoccurring operational costs, such as licensed software subscriptions, support contracts, and monthly cloud-infrastructure charges, are often overlooked by project teams during the build phase. Accurately forecasting reoccurring costs of a fully functional Production platform, under expected normal load, is essential. Teams often overlook how Docker image registries, databases, data lakes, and data warehouses, quickly swell, inflating monthly cloud-infrastructure charges to maintain the platform. The need to control cloud costs have led to the growth of third-party cloud management solutions, such as CloudCheckr Cloud Management Platform (CMP).

Shared Docker Environment Model

Most software development projects require multiple environments in which to continuously develop, test, demonstrate, stage, and release code. Creating separate environments, replete with their own Docker EE Universal Control Plane (aka Control Plane or UCP), Docker Trusted Registry (DTR), AWS infrastructure, and third-party components, would guarantee a high-level of isolation and performance. However, replicating all elements in each environment would add considerable build and run costs, as well as unnecessary complexity.

On both recent projects, we choose to create a single AWS Virtual Private Cloud (VPC), which contained all of the non-production environments required by our project teams. In parallel, we built an entirely separate Production VPC for the Production environment. I’ve seen this same pattern repeated with Red Hat OpenStack and Microsoft Azure.

Production

Isolating Production from the lower environments is essential to ensure security, and to eliminate non-production traffic from impacting the performance of Production. Corporate compliance and regulatory policies often dictate complete Production isolation. Having separate infrastructure, security appliances, role-based access controls (RBAC), configuration and secret management, and encryption keys and SSL certificates, are all required.

For complete separation of Production, different AWS accounts are frequently used. Separate AWS accounts provide separate billing, usage reporting, and AWS Identity and Access Management (IAM), amongst other advantages.

Performance and Staging

Unlike Production, there are few reasons to completely isolate lower-environments from one another. The exception I’ve encountered is Performance and Staging. These two environments are frequently separated from other environments to ensure the accuracy of performance testing and release staging activities. Performance testing, in particular, can generate enormous load on systems, which if not isolated, will impair adjacent environments, applications, and monitoring systems.

On a few recent projects, to reduce cost and complexity, we repurposed the UAT environment for performance testing, once user-acceptance testing was complete. Performance testing was conducted during off-peak development and testing periods, with access to adjacent environments blocked.

The multi-purpose UAT environment further served as a Staging environment. Applications were deployed and released to the UAT and Performance environments, following a nearly-identical process used for Production. Hotfixes to Production were also tested in this environment.

Example of Shared Environments

To demonstrate how to architect a shared non-production Docker EE environment, which minimizes cost and complexity, let’s examine the example shown below. In the example, built on AWS with Docker EE, there are four typical non-production environments, CI/CD, Development, Test, and UAT, and one Production environment.

Docker_EE_AWS_Diagram_01

In the example, there are two separate VPCs, the Production VPC, and the Non-Production VPC. There is no reason to configure VPC Peering between the two VPCs, as there is no need for direct communication between the two. Within the Non-Production VPC, to the left in the diagram, there is a cluster of three Docker EE UCP Manager EC2 nodes, a cluster of three DTR Worker EC2 nodes, and the four environments, consisting of varying numbers of EC2 Worker nodes. Production, to the right of the diagram, has its own cluster of three UCP Manager EC2 nodes and a cluster of six EC2 Worker nodes.

Single Non-Production UCP

As a primary means of reducing cost and complexity, in the example, a single minimally-sized Docker EE UCP cluster of three Manager nodes orchestrate activities across all four non-production environments. Alternately, you would have to create a UCP cluster for each environment; that means nine more Worker Nodes to configure and maintain.

The UCP users, teams, organizations, access controls, Docker Secrets, overlay networks, and other UCP features, for all non-production environments, are managed through the single Control Plane. All deployments to all the non-production environments, from the CI/CD server, are performed through the single Control Plane. Each UCP Manager node is deployed to a different AWS Availability Zone (AZ) to ensure high-availability.

Shared DTR

As another means of reducing cost and complexity, in the example, a Docker EE DTR cluster of three Worker nodes contain all Docker image repositories. Both the non-production and the Production environments use this DTR as a secure source of all Docker images. Not having to replicate image repositories, access controls, infrastructure, and figuring out how to migrate images between two separate DTR clusters, is a significant time, cost, and complexity savings. Each DTR Worker node is also deployed to a different AZ to ensure high-availability.

Using a shared DTR between non-production and Production is an important security consideration your project team needs to consider. A single DTR, shared between non-production and Production, comes with inherent availability and security risks, which should be understood in advance.

Separate Non-Production Worker Nodes

In the shared non-production environments example, each environment has dedicated AWS EC2 instances configured as Docker EE Worker nodes. The number of Worker nodes is determined by the requirements for each environment, as dictated by the project’s Development, Testing, Security, and DevOps teams. Like the UCP and DTR clusters, each Worker node, within an individual environment, is deployed to a different AZ to ensure high-availability and mimic the Production architecture.

Minimizing the number of Worker nodes in each environment, as well as the type and size of each EC2 node, offers a significant potential cost and administrative savings.

Separate Environment Ingress

In the example, the UCP, DTR, and each of the four environments are accessed through separate URLs, using AWS Hosted Zone CNAME records (subdomains). Encrypted HTTPS traffic is routed through a series of security appliances, depending on traffic type, to individual private AWS Elastic Load Balancers (ELB), one for both UCPs, the DTR, and each of the environments. Each ELB load-balances traffic to the Docker EE nodes associated the specific traffic. All firewalls, ELBs, and the UCP and DTR are secured with a high-grade wildcard SSL certificate.

AWS_ELB

Separate Data Sources

In the shared non-production environments example, there is one Amazon Relational Database Service‎ (RDS) instance in non-Production and one Production. Both RDS instances are replicated across multiple Availability Zones. Within the single shared non-production RDS instance, there are four separate databases, one per non-production environment. This architecture sacrifices the potential database performance of separate RDS instances for additional cost and complexity.

Maintaining Environment Separation

Node Labels

To obtain sufficient environment separation while using a single UCP, each Docker EE Worker node is tagged with an environment node label. The node label indicates which environment the Worker node is associated with. For example, in the screenshot below, a Worker node is assigned to the Development environment by tagging it with the key of environment and the value of dev.

Node_Label

* The Docker EE screens shown here are from UCP 2.1.5, not the recently released 2.2.x, which has an updated UI appearance.Each service’s Docker Compose file uses deployment placement constraints, which indicate where Docker should or should not deploy services. In the hello-world Docker Compose file example below, the node.labels.environment constraint is set to the ENVIRONMENT variable, which is set during container deployment by the CI/CD server. This constraint directs Docker to only deploy the hello-world service to nodes which contain the placement constraint of node.labels.environment, whose value matches the ENVIRONMENT variable value.


# Hello World Service Stack
# DTR_URL: Docker Trusted Registry URL
# IMAGE: Docker Image to deply
# ENVIRONMENT: Environment to deploy into
version: '3.2'
services:
hello-world:
image: ${DTR_URL}/${IMAGE}
deploy:
placement:
constraints:
node.role == worker
node.labels.environment == ${ENVIRONMENT}
replicas: 4
update_config:
parallelism: 4
delay: 10s
restart_policy:
condition: any
max_attempts: 3
delay: 10s
logging:
driver: fluentd
options:
tag: docker.{{.Name}}
env: SERVICE_NAME,ENVIRONMENT
environment:
SERVICE_NAME: hello-world
ENVIRONMENT: ${ENVIRONMENT}
command: "java \
-Dspring.profiles.active=${ENVIRONMENT} \
-Djava.security.egd=file:/dev/./urandom \
-jar hello-world.jar"

Deploying from CI/CD Server

The ENVIRONMENT value is set as an environment variable, which is then used by the CI/CD server, running a docker stack deploy or a docker service update command, within a deployment pipeline. Below is an example of how to use the environment variable as part of a Jenkins pipeline as code Jenkinsfile.


#!/usr/bin/env groovy
// Deploy Hello World Service Stack
node('java') {
properties([parameters([
choice(choices: ["ci", "dev", "test", "uat"].join("\n"),
description: 'Environment', name: 'ENVIRONMENT')
])])
stage('Git Checkout') {
dir('service') {
git branch: 'master',
credentialsId: 'jenkins_github_credentials',
url: 'ssh://git@garystafford/hello-world.git'
}
dir('credentials') {
git branch: 'master',
credentialsId: 'jenkins_github_credentials',
url: 'ssh://git@garystafford/ucp-bundle-jenkins.git'
}
}
dir('service') {
stage('Build and Unit Test') {
sh './gradlew clean cleanTest build'
}
withEnv(["IMAGE=hello-world:${BUILD_NUMBER}"]) {
stage('Docker Build Image') {
withCredentials([[$class: 'StringBinding',
credentialsId: 'docker_username',
variable: 'DOCKER_PASSWORD'],
[$class: 'StringBinding',
credentialsId: 'docker_username',
variable: 'DOCKER_USERNAME']]) {
sh "docker login -u ${DOCKER_USERNAME} -p ${DOCKER_PASSWORD} ${DTR_URL}"
}
sh "docker build –no-cache -t ${DTR_URL}/${IMAGE} ."
}
stage('Docker Push Image') {
sh "docker push ${DTR_URL}/${IMAGE}"
}
withEnv(['DOCKER_TLS_VERIFY=1',
"DOCKER_CERT_PATH=${WORKSPACE}/credentials/",
"DOCKER_HOST=${DOCKER_HOST}"]) {
stage('Docker Stack Deploy') {
try {
sh "docker service rm ${params.ENVIRONMENT}_hello-world"
sh 'sleep 30s' // wait for service to be completely removed if it exists
} catch (err) {
echo "Error: ${err}" // catach and move on if it doesn't already exist
}
sh "docker stack deploy \
–compose-file=docker-compose.yml ${params.ENVIRONMENT}"
}
}
}
}
}

view raw

Jenkinsfile

hosted with ❤ by GitHub

Centralized Logging and Metrics Collection

Centralized logging and metrics collection systems are used for application and infrastructure dashboards, monitoring, and alerting. In the shared non-production environment examples, the centralized logging and metrics collection systems are internal to each VPC, but reside on separate EC2 instances and are not registered with the Control Plane. In this way, the logging and metrics collection systems should not impact the reliability, performance, and security of the applications running within Docker EE. In the example, Worker nodes run a containerized copy of fluentd, which collects and pushes logs to ELK’s Elasticsearch.

Logging and metrics collection systems could also be supplied by external cloud-based SaaS providers, such as LogglySysdig and Datadog, or by the platform’s cloud-provider, such as Amazon CloudWatch.

With four environments running multiple containerized copies of each service, figuring out which log entry came from which service instance, requires multiple data points. As shown in the example Kibana UI below, the environment value, along with the service name and container ID, as well as the git commit hash and branch, are added to each log entry for easier troubleshooting. To include the environment, the value of the ENVIRONMENT variable is passed to Docker’s fluentd log driver as an env option. This same labeling method is used to tag metrics.

ELK

Separate Docker Service Stacks

For further environment separation within the single Control Plane, services are deployed as part of the same Docker service stack. Each service stack contains all services that comprise an application running within a single environment. Multiple stacks may be required to support multiple, distinct applications within the same environment.

For example, in the screenshot below, a hello-world service container, built with a Docker image, tagged with build 59 of the Jenkins continuous integration pipeline, is deployed as part of both the Development (dev) and Test service stacks. The CD and UAT service stacks each contain different versions of the hello-world service.

Hello-World-UCP

Separate Docker Overlay Networks

For additional environment separation within the single non-production UCP, all Docker service stacks associated with an environment, reside on the same Docker overlay network. Overlay networks manage communications among the Docker Worker nodes, enabling service-to-service communication for all services on the same overlay network while isolating services running on one network from services running on another network.

in the example screenshot below, the hello-world service, a member of the test service stack, is running on the test_default overlay network.

Network

Cleaning Up

Having distinct environment-centric Docker service stacks and overlay networks makes it easy to clean up an environment, without impacting adjacent environments. Both service stacks and overlay networks can be removed to clear an environment’s contents.

Separate Performance Environment

In the alternative example below, a Performance environment has been added to the Non-Production VPC. To ensure a higher level of isolation, the Performance environment has its own UPC, RDS, and ELBs. The Performance environment shares the DTR, as well as the security, logging, and monitoring components, with the rest of the non-production environments.

Below, the Performance environment has half the number of Worker nodes as Production. Performance results can be scaled for expected Production performance, given more nodes. Alternately, the number of nodes can be scaled up temporarily to match Production, then scaled back down to a minimum after testing is complete.

Docker_EE_AWS_Diagram_02

Shared DevOps Tooling

All environments leverage shared Development and DevOps resources, deployed to a separate VPC. Resources include Agile Application Lifecycle Management (ALM), such as JIRA or CA Agile Central, source control repository management (SCM), such as GitLab or Bitbucket, binary repository management, such as Artifactory or Nexus, and a CI/CD solution, such as Jenkins, TeamCity, or Bamboo.

From the DevOps VPC, Docker images are pushed and pulled from the DTR in the Non-Production VPC. Deployments of container-based application are executed from the DevOps VPC CI/CD server to the non-production, Performance, and Production UCPs. Separate DevOps CI/CD pipelines and access controls are essential in maintaining the separation of the non-production and Production environments.

Docker_EE_AWS_Diagram_03

Complete Platform

Several common components found in a Docker EE cloud-based AWS platform were discussed in the post. However, a complete AWS application platform has many more moving parts. Below is a comprehensive list of components, including DevOps tooling, organized into two categories: 1) common components that can be potentially shared across the non-production environments to save cost and complexity, and 2) components that should be replicated in each non-environment for security and performance.

Shared Non-Production Components:

  • AWS
    • Virtual Private Cloud (VPC), Region, Availability Zones
    • Route Tables, Network ACLs, Internet Gateways
    • Subnets
    • Some Security Groups
    • IAM Groups, User, Roles, Policies (RBAC)
    • Relational Database Service‎ (RDS)
    • ElastiCache
    • API Gateway, Lambdas
    • S3 Buckets
    • Bastion Servers, NAT Gateways
    • Route 53 Hosted Zone (Registered Domain)
    • EC2 Key Pairs
    • Hardened Linux AMI
  • Docker EE
    • UCP and EC2 Manager Nodes
    • DTR and EC2 Worker Nodes
    • UCP and DTR Users, Teams, Organizations
    • DTR Image Repositories
    • Secret Management
  • Third-Party Components/Products
    • SSL Certificates
    • Security Components: Firewalls, Virus Scanning, VPN Servers
    • Container Security
    • End-User IAM
    • Directory Service
    • Log Aggregation
    • Metric Collection
    • Monitoring, Alerting
    • Configuration and Secret Management
  • DevOps
    • CI/CD Pipelines as Code
    • Infrastructure as Code
    • Source Code Repositories
    • Binary Artifact Repositories

Isolated Non-Production Components:

  • AWS
    • Route 53 Hosted Zones and Associated Records
    • Elastic Load Balancers (ELB)
    • Elastic Compute Cloud (EC2) Worker Nodes
    • Elastic IPs
    • ELB and EC2 Security Groups
    • RDS Databases (Single RDS Instance with Separate Databases)

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , ,

2 Comments

The Evolving Role of DevOps in Emerging Technologies

31970399_m

Growth of DevOps

The adoption of DevOps practices by global organizations has become mainstream, according to many recent industry studies. For instance, a late 2016 study, conducted by IDG Research for Unisys Corporation of global enterprise organizations, found 38 percent of respondents had already adopted DevOps, while another 29 percent were in the planning phase, and 17 percent in the evaluation stage. Adoption rates were even higher, 49 percent versus 38 percent, for larger organizations with 500 or more developers.

Another recent 2017 study by Red Gate Software, The State of Database DevOps, based on 1,000 global organizations, found 47 percent of the respondents had already adopted DevOps practices, with another 33 percent planning on adopting DevOps practices within the next 24 months. Similar to the Unisys study, prior adoption rates were considerably higher, 59 percent versus 47 percent, for larger organizations with over 1,000 employees.

Emerging Technologies

Although DevOps originated to meet the needs of Agile software development to release more frequently, DevOps is no longer just continuous integration and continuous delivery. As more organizations undergo a digital transformation and adopt disruptive technologies to drive business success, the role of DevOps continues to evolve and expand.

Emerging technology trends, such as Machine Learning, Artificial Intelligence (AI), and Internet of Things (IoT/IIoT), serve to both influence DevOps practices, as well as create the need for the application of DevOps practices to these emerging technologies. Let’s examine the impact of some of these emerging technology trends on DevOps in this brief, two-part post.

Mobile

Although mobile application development is certainly not new, DevOps practices around mobile continue to evolve as mobile becomes the primary application platform for many organizations. Mobile applications have unique development and operational requirements. Take for example UI functional testing. Whereas web application developers often test against a relatively small matrix of popular web browsers and operating systems (Desktop Browser Market Share – Net Application.com), mobile developers must test against a continuous outpouring of new mobile devices, both tablets and phones (Test on the right mobile devices – BroswerStack). The complexity of automating the testing of such a large number mobile devices has resulted in the growth of specialized cloud-based testing platforms, such as BrowserStack and SauceLabs.

Cloud

Similar to Mobile, the Cloud is certainly not new. However, as more firms move their IT operations to the Cloud, DevOps practices have had to adapt rapidly. The need to adjust is no more apparent than with Amazon Web Services. Currently, AWS lists no less than 18 categories of cloud offerings on their website, with each category containing several products and services. Categories include compute, storage, databases, networking, security, messaging, mobile, AI, IoT, and analytics.

In addition to products like compute, storage, and database, AWS now offers development, DevOps, and management tools, such as AWS OpsWorks and AWS CloudFormation. These products offer alternatives to traditional non-cloud CI/CD/RM workflows for deploying and managing complex application platforms on AWS. Learning the nuances of a growing list of AWS specific products and workflows, while simultaneously adapting your organization’s DevOps practices to them, has resulted in a whole new category of DevOps engineering specialization centered around AWS. Cloud-centric DevOps engineering specialization is also seen with other large cloud providers, such as Microsoft Azure and Google Cloud Platform.

Security

Call it DevSecOps, SecDevOps, SecOps, or Rugged DevOps, the intersection of DevOps and Security is bustling these days. As the complexity of modern application platforms grows, as well as the sophistication of threats from hackers and the requirements of government and industry compliance, security is no longer an afterthought or a process run in seeming isolation from software development and DevOps. In my recent experience, it is not uncommon to see IT security specialists actively participating in Agile development teams and embedded in DevOps and Platform teams.

Modern application platforms must be designed from day one to be bug-free, performant, compliant, and secure.

Security practices are now commonly part of the entire software development lifecycle, including enterprise architecture, software development, data governance, continuous testing, and infrastructure as code. Modern application platforms must be designed from day one to be bug-free, performant, compliant, and secure.

Take for example penetration (PEN) testing. Once a mostly manual process, done close to release time, evolving DevOps practices now allow testing for security vulnerabilities to applications and software-defined infrastructure to be done early and often in the software development lifecycle. Easily automatable and configurable cloud- and non-cloud-based tools like SonarQube, Veracode, QualysOWASP ZAP, and Chef Compliance, amongst others, are frequently incorporated into continuous integration workflows by development and DevOps teams. There is no longer an excuse for security vulnerabilities to be discovered just before release, or worse, in Production.

Modern Platforms

Along with the Cloud, modern application development trends, like the rise of the platform, microservices (or service-based architectures), containerization, NoSQL databases, and container orchestration, have likely provided the majority of fuel for the recent explosive growth of DevOps. Although innovative IT organizations have fostered these technologies for the past few years, their growth and relative maturity have risen sharply in the last 12 to 18 months.

No longer the stuff of Unicorns, platforms based on Evolutionary Architectures are being built and deployed by an increasing number of everyday organizations.

No longer the stuff of Unicorns, such as Amazon, Etsy, and Netflix, platforms based on Evolutionary Architectures are being built and deployed by an increasing number of everyday organizations. Although complexity continues to rise, the barrier to entry has been greatly reduced with technologies found across the SDLC, including  Node, Spring Boot, Docker, Consul, Terraform, and Kubernetes, amongst others.

As modern platforms become more commonplace, the DevOps practices around them continue to mature and become specialized. Imagine, with potentially hundreds of moving parts, building, testing, deploying, and actively managing a large-scale microservice-based application on a container orchestration platform requires highly-specialized knowledge. The ability to ‘do DevOps at scale’ is critical.

Legacy Systems

Legacy systems as an emerging technology trend in DevOps? As the race to build the ‘next generation’ of application platforms accelerates to meet the demands of the business and their customers, there is a growing need to support ‘last generation’ systems. Many IT organizations support multiple legacy systems, ranging in age from as short as five years old to more than 25 years old. These monolithic legacy systems, which often contain a company’s secret sauce, such as complex business algorithms and decision engines, are built on out-moded technology stacks, often lack vendor support, and require separate processes to build, test, deploy, and manage. Worse, the knowledge to maintain these systems is frequently only known to a shrinking group of IT resources. Who wants to work on the old system with so many bright and shiny toys being built?

As a cost-effective means to maintain these legacy systems, organizations are turning to modern DevOps practices. Although not possible to the same degree, depending on the legacy technology, practices include the use source control, various types of automated testing, automated provisioning, deployment and configuration of system components, and infrastructure automation (DevOps for legacy systems – Infosys white paper).

Not specifically a DevOps practice, organizations are also implementing content collaboration systems, like Atlassian Confluence and Microsoft SharePoint, to document legacy system architectures and manual processes, before the resources and their knowledge is lost.

To be Continued

In a future post, we will look additional emerging technologies and their impact on DevOps, including:

  • Big Data
  • Internet of Things (IoT/IIoT)
  • Artificial Intelligence (AI)
  • Machine Learning
  • COTS/SaaS

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

 

Illustration Copyright: Andreus / 123RF Stock Photo

, , , , , , ,

Leave a comment

Eventual Consistency: Decoupling Microservices with Spring AMQP and RabbitMQ

RabbitMQEnventCons.png

Introduction

In a recent post, Decoupling Microservices using Message-based RPC IPC, with Spring, RabbitMQ, and AMPQ, we moved away from synchronous REST HTTP for inter-process communications (IPC) toward message-based IPC. Moving to asynchronous message-based communications allows us to decouple services from one another. It makes it easier to build, test, and release our individual services. In that post, we did not achieve fully asynchronous communications. Although, we did achieve a higher level of service decoupling using message-based Remote Procedure Call (RPC) IPC.

In this post, we will fully decouple our services using the distributed computing model of eventual consistency. More specifically, we will use a message-based, event-driven, loosely-coupled, eventually consistent architectural approach for communications between services.

What is eventual consistency? One of the best definitions of eventual consistency I have read was posted on microservices.io. To paraphrase, ‘using an event-driven, eventually consistent approach, each service publishes an event whenever it updates its data. Other services subscribe to events. When an event is received, a service updates its data.

Example of Eventual Consistency

Imagine, Service A, the Customer service, inserts a new customer record into its database. Based on that ‘customer created’ event, Service A publishes a message containing the new customer object, serialized to JSON, to the lightweight, persistent, New Customer message queue.

Service B, the Customer Onboarding service, a subscriber to the New Customer queue, consumes and deserializes Service A’s message. Service B may or may not perform a data transformation of the Customer object to its own Customer data model. Service B then inserts the new customer record into its own database.

In the above example, it can be said that the customer records in Service B’s database are eventually consistent with the customer records in Service A’s database. Service A makes a change and publishes a message in response to the event. Service B consumes the message and makes the same change. Eventually (likely within milliseconds), Service B’s customer records are consistent with Service A’s customer records.

Why Eventual Consistency?

So what does this apparent added complexity and duplication of data buy us? Consider the advantages. Service B, the Onboarding service, requires no knowledge of, or a dependency on, Service A, the Customer service. Still, Service B has a current record of all the customers that Service A maintains. Instead of making repeated and potentially costly RESTful HTTP calls or RPC message-based calls to or from Service A to Service B for new customers, Service B queries its database for a list of customers.

The value of eventual consistency increases factorially as you scale a distributed system. Imagine dozens of distinct microservices, many requiring data from other microservices. Further, imagine multiple instances of each of those services all running in parallel. Decoupling services from one another, through asynchronous forms of IPC, messaging, and event-driven eventual consistency greatly simplifies the software development lifecycle and operations.

Demonstration

In this post, we could use a few different architectural patterns to demonstrate message passing with RabbitMQ and Spring AMQP. They including Work Queues, Publish/Subscribe, Routing, or Topics. To keep things as simple as possible, we will have a single Producer, publish messages to a single durable and persistent message queue. We will have a single Subscriber, a Consumer, consume the messages from that queue. We focus on a single type of event message.

Sample Code

To demonstrate Spring AMQP-based messaging with RabbitMQ, we will use a reference set of three Spring Boot microservices. The Election ServiceCandidate Service, and Voter Service are all backed by MongoDB. The services and MongoDB, along with RabbitMQ and Voter API Gateway, are all part of the Voter API.

The Voter API Gateway, based on HAProxy, serves as a common entry point to all three services, as well as serving as a reverse proxy and load balancer. The API Gateway provides round-robin load-balanced access to multiple instances of each service.

Voter_API_Architecture

All the source code found this post’s example is available on GitHub, within a few different project repositories. The Voter Service repository contains the Voter service source code, along with the scripts and Docker Compose files required to deploy the project. The Election Service repository, Candidate Service repository, and Voter API Gateway repository are also available on GitHub. There is also a new AngularJS/Node.js Web Client, to demonstrate how to use the Voter API.

For this post, you only need to clone the Voter Service repository.

Deploying Voter API

All components, including the Spring Boot services, MongoDB, RabbitMQ, API Gateway, and the Web Client, are individually deployed using Docker. Each component is publicly available as a Docker Image, on Docker Hub. The Voter Service repository contains scripts to deploy the entire set of Dockerized components, locally. The repository also contains optional scripts to provision a Docker Swarm, using Docker’s newer swarm mode, and deploy the components. We will only deploy the services locally for this post.

To clone and deploy the components locally, including the Spring Boot services, MongoDB, RabbitMQ, and the API Gateway, execute the following commands. If this is your first time running the commands, it may take a few minutes for your system to download all the required Docker Images from Docker Hub.


git clone –depth 1 –branch rabbitmq \
https://github.com/garystafford/voter-service.git
cd voter-service/scripts-services
sh ./stack_deploy_local.sh

If everything was deployed successfully, you should observe six running Docker containers, similar to the output, below.


CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
32d73282ff3d garystafford/voter-api-gateway:rabbitmq "/docker-entrypoin…" 8 seconds ago Up 5 seconds 0.0.0.0:8080->8080/tcp voterstack_voter-api-gateway_1
1ece438c5da4 garystafford/candidate-service:rabbitmq "java -Dspring.pro…" 10 seconds ago Up 7 seconds 0.0.0.0:8097->8080/tcp voterstack_candidate_1
30391faa3422 garystafford/voter-service:rabbitmq "java -Dspring.pro…" 10 seconds ago Up 7 seconds 0.0.0.0:8099->8080/tcp voterstack_voter_1
35063ccfe706 garystafford/election-service:rabbitmq "java -Dspring.pro…" 12 seconds ago Up 10 seconds 0.0.0.0:8095->8080/tcp voterstack_election_1
23eae86967a2 rabbitmq:management-alpine "docker-entrypoint…" 14 seconds ago Up 11 seconds 4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp, 15671/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp voterstack_rabbitmq_1
7e77ddecddbb mongo:latest "docker-entrypoint…" 24 seconds ago Up 21 seconds 0.0.0.0:27017->27017/tcp voterstack_mongodb_1

Using Voter API

The Voter Service, Election Service, and Candidate Service GitHub repositories each contain README files, which detail all the API endpoints each service exposes, and how to call them.

In addition to casting votes for candidates, the Voter service can simulate election results. Calling the /simulation endpoint, and indicating the desired election, the Voter service will randomly generate a number of votes for each candidate in that election. This will save us the burden of casting votes for this demonstration. However, the Voter service has no knowledge of elections or candidates. The Voter service depends on the Candidate service to obtain a list of candidates.

The Candidate service manages electoral candidates, their political affiliation, and the election in which they are running. Like the Voter service, the Candidate service also has a /simulation endpoint. The service will create a list of candidates based on the 2012 and 2016 US Presidential Elections. The simulation capability of the service saves us the burden of inputting candidates for this demonstration.

The Election service manages elections, their polling dates, and the type of election (federal, state, or local). Like the other services, the Election service also has a /simulation endpoint, which will create a list of sample elections. The Election service will not be discussed in this post’s demonstration. We will examine communications between the Candidate and Voter services, only.

REST HTTP Endpoint

As you recall from our previous post, Decoupling Microservices using Message-based RPC IPC, with Spring, RabbitMQ, and AMPQ, the Voter service exposes multiple, almost identical endpoints. Each endpoint uses a different means of IPC to retrieve candidates and generates random votes.

Calling the /voter/simulation/http/{election} endpoint and providing a specific election, prompts the Voter service to request a list of candidates from the Candidate service, based on the election parameter you input. This request is done using synchronous REST HTTP. The Voter service uses the HTTP GET method to request the data from the Candidate service. The Voter service then waits for a response.

The Candidate service receives the HTTP request. The Candidate service responds to the Voter service with a list of candidates in JSON format. The Voter service receives the response payload containing the list of candidates. The Voter service then proceeds to generate a random number of votes for each candidate in the list. Finally, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message-based RPC Endpoint

Similarly, calling the /voter/simulation/rpc/{election} endpoint and providing a specific election, prompts the Voter service to request the same list of candidates. However, this time, the Voter service (the client) produces a request message and places in RabbitMQ’s voter.rpc.requests queue. The Voter service then waits for a response. The Voter service has no direct dependency on the Candidate service; it only depends on a response to its request message. In this way, it is still a form of synchronous IPC, but the Voter service is now decoupled from the Candidate service.

The request message is consumed by the Candidate service (the server), who is listening to that queue. In response, the Candidate service produces a message containing the list of candidates serialized to JSON. The Candidate service (the server) sends a response back to the Voter service (the client) through RabbitMQ. This is done using the Direct reply-to feature of RabbitMQ or using a unique response queue, specified in the reply-to header of the request message, sent by the Voter Service.

The Voter service receives the message containing the list of candidates. The Voter service deserializes the JSON payload to candidate objects. The Voter service then proceeds to generate a random number of votes for each candidate in the list. Finally, identical to the previous example, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters database.

New Endpoint

Calling the new /voter/simulation/db/{election} endpoint and providing a specific election, prompts the Voter service to query its own MongoDB database for a list of candidates.

But wait, where did the candidates come from? The Voter service didn’t call the Candidate service? The answer is message-based eventual consistency. Whenever a new candidate is created, using a REST HTTP POST request to the Candidate service’s /candidate/candidates endpoint, a Spring Data Rest Repository Event Handler responds. Responding to the candidate created event, the event handler publishes a message, containing a serialized JSON representation of the new candidate object, to a durable and persistent RabbitMQ queue.

The Voter service is listening to that queue. The Voter service consumes messages off the queue, deserializes the candidate object, and saves it to its own voters database, to the candidate collection. For this example, we are saving the incoming candidate object as is, with no transformations. The candidate object model for both services is identical.

When /voter/simulation/db/{election} endpoint is called, the Voter service queries its voters database for a list of candidates. They Voter service then proceeds to generate a random number of votes for each candidate in the list. Finally, identical to the previous two examples, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message_Queue_Diagram_Final3B

Exploring the Code

We will not review the REST HTTP or RPC IPC code in this post. It was covered in detail, in the previous post. Instead, we will explore the new code required for eventual consistency.

Spring Dependencies

To use AMQP with RabbitMQ, we need to add a project dependency on org.springframework.boot.spring-boot-starter-amqp. Below is a snippet from the Candidate service’s build.gradle file, showing project dependencies. The Voter service’s dependencies are identical.


dependencies {
compile group: 'org.springframework.boot', name: 'spring-boot-actuator-docs'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-actuator'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-amqp'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-data-mongodb'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-data-rest'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-hateoas'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-logging'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-web'
compile group: 'org.webjars', name: 'hal-browser'
testCompile group: 'org.springframework.boot', name: 'spring-boot-starter-test'
}

view raw

build.gradle

hosted with ❤ by GitHub

AMQP Configuration

Next, we need to add a small amount of RabbitMQ AMQP configuration to both services. We accomplish this by using Spring’s @Configuration annotation on our configuration classes. Below is the abridged configuration class for the Voter service.


package com.voterapi.voter.configuration;
import org.springframework.amqp.core.DirectExchange;
import org.springframework.amqp.core.Queue;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class VoterConfig {
@Bean
public Queue candidateQueue() {
return new Queue("candidates.queue");
}
}

And here, the abridged configuration class for the Candidate service.


package com.voterapi.candidate.configuration;
import org.springframework.amqp.core.Binding;
import org.springframework.amqp.core.BindingBuilder;
import org.springframework.amqp.core.DirectExchange;
import org.springframework.amqp.core.Queue;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class CandidateConfig {
@Bean
public Queue candidateQueue() {
return new Queue("candidates.queue");
}
}

Event Handler

With our dependencies and configuration in place, we will define the CandidateEventHandler class. This class is annotated with the Spring Data Rest @RepositoryEventHandler and Spring’s @Component. The @Component annotation ensures the event handler is registered.

The class contains the handleCandidateSave method, which is annotated with the Spring Data Rest @HandleAfterCreate. The event handler acts on the Candidate object, which is the first parameter in the method signature.

Responding to the candidate created event, the event handler publishes a message, containing a serialized JSON representation of the new candidate object, to the candidates.queue queue. This was the queue we configured earlier.


package com.voterapi.candidate.service;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.voterapi.candidate.domain.Candidate;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.amqp.core.Queue;
import org.springframework.amqp.rabbit.core.RabbitTemplate;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.rest.core.annotation.HandleAfterCreate;
import org.springframework.data.rest.core.annotation.RepositoryEventHandler;
import org.springframework.stereotype.Component;
@Component
@RepositoryEventHandler
public class CandidateEventHandler {
private final Logger logger = LoggerFactory.getLogger(this.getClass());
private RabbitTemplate rabbitTemplate;
private Queue candidateQueue;
@Autowired
public CandidateEventHandler(RabbitTemplate rabbitTemplate, Queue candidateQueue) {
this.rabbitTemplate = rabbitTemplate;
this.candidateQueue = candidateQueue;
}
@HandleAfterCreate
public void handleCandidateSave(Candidate candidate) {
sendMessage(candidate);
}
private void sendMessage(Candidate candidate) {
rabbitTemplate.convertAndSend(
candidateQueue.getName(), serializeToJson(candidate));
}
private String serializeToJson(Candidate candidate) {
ObjectMapper mapper = new ObjectMapper();
String jsonInString = "";
try {
jsonInString = mapper.writeValueAsString(candidate);
} catch (JsonProcessingException e) {
logger.info(String.valueOf(e));
}
logger.debug("Serialized message payload: {}", jsonInString);
return jsonInString;
}
}

Consuming Messages

Next, we let’s switch to the Voter service’s CandidateListService class. Below is an abridged version of the class with two new methods. First, the getCandidateMessage method listens to the candidates.queue queue. This was the queue we configured earlier. The method is annotated with theSpring AMQP Rabbit @RabbitListener annotation.

The getCandidateMessage retrieves the new candidate object from the message, deserializes the message’s JSON payload, maps it to the candidate object model and saves it to the Voter service’s database.

The second method, getCandidatesQueueDb, retrieves the candidates from the Voter service’s database. The method makes use of the Spring Data MongoDB Aggregation package to return a list of candidates from MongoDB.


/**
* Consumes a new candidate message, deserializes, and save to MongoDB
* @param candidateMessage
*/
@RabbitListener(queues = "#{candidateQueue.name}")
public void getCandidateMessage(String candidateMessage) {
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
TypeReference<Candidate> mapType = new TypeReference<Candidate>() {};
Candidate candidate = null;
try {
candidate = objectMapper.readValue(candidateMessage, mapType);
} catch (IOException e) {
logger.info(String.valueOf(e));
}
candidateRepository.save(candidate);
logger.debug("Candidate {} saved to MongoDB", candidate.toString());
}
/**
* Retrieves candidates from MongoDB and transforms to voter view
* @param election
* @return List of candidates
*/
public List<CandidateVoterView> getCandidatesQueueDb(String election) {
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(Criteria.where("election").is(election)),
project("firstName", "lastName", "politicalParty", "election")
.andExpression("concat(firstName,' ', lastName)")
.as("fullName"),
sort(Sort.Direction.ASC, "lastName")
);
AggregationResults<CandidateVoterView> groupResults
= mongoTemplate.aggregate(aggregation, Candidate.class, CandidateVoterView.class);
return groupResults.getMappedResults();
}

RabbitMQ Management Console

The easiest way to observe what is happening with the messages is using the RabbitMQ Management Console. To access the console, point your web browser to localhost, on port 15672. The default login credentials for the console are guest/guest. As you successfully produce and consume messages with RabbitMQ, you should see activity on the Overview tab.

RabbitMQ_EC_Durable3.png

Recall we said the queue, in this example, was durable. That means messages will survive the RabbitMQ broker stopping and starting. In the below view of the RabbitMQ Management Console, note the six messages persisted in memory. The Candidate service produced the messages in response to six new candidates being created. However, the Voter service was not running, and therefore, could not consume the messages. In addition, the RabbitMQ server was restarted, after receiving the candidate messages. The messages were persisted and still present in the queue after the successful reboot of RabbitMQ.

RabbitMQ_EC_Durable

Once RabbitMQ and the Voter service instance were back online, the Voter service successfully consumed the six waiting messages from the queue.

RabbitMQ_EC_Durable2.png

Service Logs

In addition to using the RabbitMQ Management Console, we may obverse communications between the two services by looking at the Voter and Candidate service’s logs. I have grabbed a snippet of both service’s logs and added a few comments to show where different processes are being executed.

First the Candidate service logs. We observe a REST HTTP POST request containing a new candidate. We then observe the creation of the new candidate object in the Candidate service’s database, followed by the event handler publishing a message on the queue. Finally, we observe the response is returned in reply to the initial REST HTTP POST request.


# REST HTTP POST Request received
2017-05-11 22:43:46.667 DEBUG 18702 — [nio-8097-exec-5] o.a.coyote.http11.Http11InputBuffer : Received [POST /candidate/candidates HTTP/1.1
Host: localhost:8097
User-Agent: HTTPie/0.9.8
Accept-Encoding: gzip, deflate
Accept: application/json, */*
Connection: keep-alive
Content-Type: application/json
Content-Length: 127
{"firstName": "Hillary", "lastName": "Clinton", "politicalParty": "Democratic Party", "election": "2016 Presidential Election"}]
2017-05-11 22:43:46.667 DEBUG 18702 — [nio-8097-exec-5] o.a.c.authenticator.AuthenticatorBase : Security checking request POST /candidate/candidates
# Inserting new Candidate into database
2017-05-11 22:43:46.674 DEBUG 18702 — [nio-8097-exec-5] o.s.data.mongodb.core.MongoTemplate : Inserting DBObject containing fields: [_class, _id, firstName, lastName, politicalParty, election] in collection: candidate
2017-05-11 22:43:46.674 DEBUG 18702 — [nio-8097-exec-5] o.s.data.mongodb.core.MongoDbUtils : Getting Mongo Database name=[candidates]
2017-05-11 22:43:46.674 DEBUG 18702 — [nio-8097-exec-5] org.mongodb.driver.protocol.insert : Inserting 1 documents into namespace candidates.candidate on connection [connectionId{localValue:2, serverValue:147}] to server localhost:27017
2017-05-11 22:43:46.677 DEBUG 18702 — [nio-8097-exec-5] org.mongodb.driver.protocol.insert : Insert completed
# Publishing message on queue
2017-05-11 22:43:46.678 DEBUG 18702 — [nio-8097-exec-5] o.s.d.r.c.e.AnnotatedEventHandlerInvoker : Invoking AfterCreateEvent handler for Hillary Clinton (Democratic Party).
2017-05-11 22:43:46.679 DEBUG 18702 — [nio-8097-exec-5] c.v.c.service.CandidateEventHandler : Serialized message payload: {"id":"591521621162e1490eb0d537","firstName":"Hillary","lastName":"Clinton","politicalParty":"Democratic Party","election":"2016 Presidential Election","fullName":"Hillary Clinton"}
2017-05-11 22:43:46.679 DEBUG 18702 — [nio-8097-exec-5] o.s.amqp.rabbit.core.RabbitTemplate : Executing callback on RabbitMQ Channel: Cached Rabbit Channel: AMQChannel(amqp://guest@127.0.0.1:5672/,2), conn: Proxy@1bfb15f5 Shared Rabbit Connection: SimpleConnection@37d1ba14 [delegate=amqp://guest@127.0.0.1:5672/, localPort= 59422]
2017-05-11 22:43:46.679 DEBUG 18702 — [nio-8097-exec-5] o.s.amqp.rabbit.core.RabbitTemplate : Publishing message on exchange [], routingKey = [candidates.queue]
# Response to HTTP POST
2017-05-11 22:43:46.679 DEBUG 18702 — [nio-8097-exec-5] o.s.b.f.s.DefaultListableBeanFactory : Returning cached instance of singleton bean 'persistentEntities'
2017-05-11 22:43:46.681 DEBUG 18702 — [nio-8097-exec-5] o.s.b.f.s.DefaultListableBeanFactory : Returning cached instance of singleton bean 'org.springframework.boot.actuate.autoconfigure.EndpointWebMvcHypermediaManagementContextConfiguration$ActuatorEndpointLinksAdvice'
2017-05-11 22:43:46.682 DEBUG 18702 — [nio-8097-exec-5] s.d.r.w.j.PersistentEntityJackson2Module : Serializing PersistentEntity org.springframework.data.mongodb.core.mapping.BasicMongoPersistentEntity@1a4d1ab7.
2017-05-11 22:43:46.683 DEBUG 18702 — [nio-8097-exec-5] o.s.w.s.m.m.a.HttpEntityMethodProcessor : Written [Resource { content: Hillary Clinton (Democratic Party), links: [<http://localhost:8097/candidate/candidates/591521621162e1490eb0d537&gt;;rel="self", <http://localhost:8097/candidate/candidates/591521621162e1490eb0d537{?projection}>;rel="candidate"] }] as "application/json" using [org.springframework.data.rest.webmvc.config.RepositoryRestMvcConfiguration$ResourceSupportHttpMessageConverter@27329d2a]
2017-05-11 22:43:46.683 DEBUG 18702 — [nio-8097-exec-5] o.s.web.servlet.DispatcherServlet : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
2017-05-11 22:43:46.683 DEBUG 18702 — [nio-8097-exec-5] o.s.web.servlet.DispatcherServlet : Successfully completed request

Now the Voter service logs. At the exact same second as the message and the response sent by the Candidate service, the Voter service consumes the message off the queue. The Voter service then deserializes the new candidate object and inserts it into its database.


# Retrieving message from queue
2017-05-11 22:43:46.242 DEBUG 19001 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Retrieving delivery for Consumer@78910096: tags=[{amq.ctag-WCLRWmQ6WRkGgxg-enVslA=candidates.queue}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@127.0.0.1:5672/,1), conn: Proxy@386143c0 Shared Rabbit Connection: SimpleConnection@2d187d86 [delegate=amqp://guest@127.0.0.1:5672/, localPort= 59586], acknowledgeMode=AUTO local queue size=0
2017-05-11 22:43:46.684 DEBUG 19001 — [pool-1-thread-9] o.s.a.r.listener.BlockingQueueConsumer : Storing delivery for Consumer@78910096: tags=[{amq.ctag-WCLRWmQ6WRkGgxg-enVslA=candidates.queue}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@127.0.0.1:5672/,1), conn: Proxy@386143c0 Shared Rabbit Connection: SimpleConnection@2d187d86 [delegate=amqp://guest@127.0.0.1:5672/, localPort= 59586], acknowledgeMode=AUTO local queue size=0
2017-05-11 22:43:46.685 DEBUG 19001 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Received message: (Body:'{"id":"591521621162e1490eb0d537","firstName":"Hillary","lastName":"Clinton","politicalParty":"Democratic Party","election":"2016 Presidential Election","fullName":"Hillary Clinton"}' MessageProperties [headers={}, timestamp=null, messageId=null, userId=null, receivedUserId=null, appId=null, clusterId=null, type=null, correlationId=null, correlationIdString=null, replyTo=null, contentType=text/plain, contentEncoding=UTF-8, contentLength=0, deliveryMode=null, receivedDeliveryMode=PERSISTENT, expiration=null, priority=0, redelivered=false, receivedExchange=, receivedRoutingKey=candidates.queue, receivedDelay=null, deliveryTag=6, messageCount=0, consumerTag=amq.ctag-WCLRWmQ6WRkGgxg-enVslA, consumerQueue=candidates.queue])
2017-05-11 22:43:46.686 DEBUG 19001 — [cTaskExecutor-1] .a.r.l.a.MessagingMessageListenerAdapter : Processing [GenericMessage [payload={"id":"591521621162e1490eb0d537","firstName":"Hillary","lastName":"Clinton","politicalParty":"Democratic Party","election":"2016 Presidential Election","fullName":"Hillary Clinton"}, headers={amqp_receivedDeliveryMode=PERSISTENT, amqp_receivedRoutingKey=candidates.queue, amqp_contentEncoding=UTF-8, amqp_deliveryTag=6, amqp_consumerQueue=candidates.queue, amqp_redelivered=false, id=608b990a-919b-52c1-fb64-4af4be03b306, amqp_consumerTag=amq.ctag-WCLRWmQ6WRkGgxg-enVslA, contentType=text/plain, timestamp=1494557026686}]]
# Inserting new Candidate into database
2017-05-11 22:43:46.687 DEBUG 19001 — [cTaskExecutor-1] o.s.data.mongodb.core.MongoTemplate : Saving DBObject containing fields: [_class, _id, firstName, lastName, politicalParty, election]
2017-05-11 22:43:46.687 DEBUG 19001 — [cTaskExecutor-1] o.s.data.mongodb.core.MongoDbUtils : Getting Mongo Database name=[voters]
2017-05-11 22:43:46.688 DEBUG 19001 — [cTaskExecutor-1] org.mongodb.driver.protocol.update : Updating documents in namespace voters.candidate on connection [connectionId{localValue:2, serverValue:151}] to server localhost:27017
2017-05-11 22:43:46.703 DEBUG 19001 — [cTaskExecutor-1] org.mongodb.driver.protocol.update : Update completed
2017-05-11 22:43:46.703 DEBUG 19001 — [cTaskExecutor-1] c.v.voter.service.CandidateListService : Candidate Hillary Clinton (Democratic Party) saved to MongoDB

MongoDB

Using the mongo Shell, we can observe six new 2016 Presidential Election candidates in the Candidate service’s database.


> show dbs
candidates 0.000GB
voters 0.000GB
> use candidates
switched to db candidates
> show collections
candidate
> db.candidate.find({})
{ "_id" : ObjectId("5915220e1162e14b2a42e65e"), "_class" : "com.voterapi.candidate.domain.Candidate", "firstName" : "Donald", "lastName" : "Trump", "politicalParty" : "Republican Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("5915220f1162e14b2a42e65f"), "_class" : "com.voterapi.candidate.domain.Candidate", "firstName" : "Chris", "lastName" : "Keniston", "politicalParty" : "Veterans Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("591522101162e14b2a42e660"), "_class" : "com.voterapi.candidate.domain.Candidate", "firstName" : "Jill", "lastName" : "Stein", "politicalParty" : "Green Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("591522101162e14b2a42e661"), "_class" : "com.voterapi.candidate.domain.Candidate", "firstName" : "Gary", "lastName" : "Johnson", "politicalParty" : "Libertarian Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("591522111162e14b2a42e662"), "_class" : "com.voterapi.candidate.domain.Candidate", "firstName" : "Darrell", "lastName" : "Castle", "politicalParty" : "Constitution Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("591522111162e14b2a42e663"), "_class" : "com.voterapi.candidate.domain.Candidate", "firstName" : "Hillary", "lastName" : "Clinton", "politicalParty" : "Democratic Party", "election" : "2016 Presidential Election" }

Now, looking at the Voter service’s database, we should find the same six 2016 Presidential Election candidates. Note the Object IDs are the same between the two service’s document sets, as are the rest of the fields (first name, last name, political party, and election). However, the class field is different between the two service’s records.


> show dbs
candidates 0.000GB
voters 0.000GB
> use voters
> db.candidate.find({})
{ "_id" : ObjectId("5915220e1162e14b2a42e65e"), "_class" : "com.voterapi.voter.domain.Candidate", "firstName" : "Donald", "lastName" : "Trump", "politicalParty" : "Republican Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("5915220f1162e14b2a42e65f"), "_class" : "com.voterapi.voter.domain.Candidate", "firstName" : "Chris", "lastName" : "Keniston", "politicalParty" : "Veterans Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("591522101162e14b2a42e660"), "_class" : "com.voterapi.voter.domain.Candidate", "firstName" : "Jill", "lastName" : "Stein", "politicalParty" : "Green Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("591522101162e14b2a42e661"), "_class" : "com.voterapi.voter.domain.Candidate", "firstName" : "Gary", "lastName" : "Johnson", "politicalParty" : "Libertarian Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("591522111162e14b2a42e662"), "_class" : "com.voterapi.voter.domain.Candidate", "firstName" : "Darrell", "lastName" : "Castle", "politicalParty" : "Constitution Party", "election" : "2016 Presidential Election" }
{ "_id" : ObjectId("591522111162e14b2a42e663"), "_class" : "com.voterapi.voter.domain.Candidate", "firstName" : "Hillary", "lastName" : "Clinton", "politicalParty" : "Democratic Party", "election" : "2016 Presidential Election" }

Production Considerations

The post demonstrated a simple example of message-based, event-driven eventual consistency. In an actual Production environment, there are a few things that must be considered.

  • We only addressed a ‘candidate created’ event. We would also have to code for other types of events, such as a ‘candidate deleted’ event and a ‘candidate updated’ event.
  • If a candidate is added, deleted, then re-added, are the events published and consumed in the right order? What about with multiple instances of the Voter service running? Does this pattern guarantee event ordering?
  • How should the Candidate service react on startup if RabbitMQ is not available
  • What if RabbitMQ fails after the Candidate services have started?
  • How should the Candidate service react if a new candidate record is added to the database, but a ‘candidate created’ event message cannot be published to RabbitMQ? The two actions are not wrapped in a single transaction.
  • In all of the above scenarios, what response should be returned to the API end user?

Conclusion

In this post, using eventual consistency, we successfully decoupled our two microservices and achieved asynchronous inter-process communications. Adopting a message-based, event-driven, loosely-coupled architecture, wherever possible, in combination with REST HTTP when it makes sense, will improve the overall manageability and scalability of a microservices-based platform.

References

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , , ,

5 Comments

Preparing for Your Organization’s DevOps Journey

19672001 - man looking at pencil with eraser erases maze

Introduction

Recently, I was asked two questions regarding DevOps. The first, ‘How do you get started implementing DevOps in an organization?’ A question I get asked, and answer, fairly frequently. The second was a bit more challenging to answer, ‘How do you prepare your organization to implement DevOps?

Getting Started

The first question, ‘How do you get started implementing DevOps in an organization?’, is a popular question many companies ask. The answer varies depending on who you ask, but the process is fairly well practiced and documented by a number of well-known and respected industry pundits. A successful DevOps implementation is a combination of strategic planning and effective execution.

A successful DevOps implementation is a combination of strategic planning and effective execution.

Most commonly, an organization starts with some form of a DevOps maturity assessment. The concept of a DevOps maturity model was introduced by Jez Humble and David Farley, in their ground-breaking book, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series), circa 2011.

Humble and Farley presented their ‘Maturity Model for Configuration and Release Management’ (page 419). This model, which encompassed much more than just CM and RM, was created as a means of evaluating and improving an organization’s DevOps practices.

Although there are several variations, maturity models ordinarily all provide some means of ranking the relative maturity of an organization’s DevOps practices. Less sophisticated models focus primarily on tooling and processes. More holistic models, such as Accenture’s DevOps Maturity Assessment, focus on tooling, processes, people, and culture.

Following the analysis, most industry experts recommend a strategic plan, followed an implementation plan. The plans set milestones for reaching higher levels of maturity, according to the model. Experts will identify key performance indicators, such as release frequency, defect rates, production downtime, and mean time to recovery from failures, which are often used to measure DevOps success.

Preparing for the Journey

As I said, the second question, ‘How do you prepare your organization to implement DevOps?’, is a bit more challenging to answer. And, as any good consultant would respond, it depends.

The exact answer depends on many factors. How engaged is management in wanting to transform their organization? How mature is the organization’s current IT practices? Are the other parts of the organization, such as sales, marketing, training, product documentation, and customer support, aligned with IT? Is IT aligned with them?

Even the basics matter, such as the organization’s size, both physical and financial, as well as the age of the organization? The industry? Are they in a highly regulated industry? Are they a global organization with distributed IT resources? Have they tried DevOps before and failed? Why did they fail?

As overwhelming as those questions might seem, I managed to break down my answer to the question, “How do you prepare your organization to implement DevOps?”, into five key areas. In my experience, each of these is critical for any DevOps transformation to succeed. Before the journey starts, these are five areas an organization needs to consider:

  1. Have an Agile Mindset
  2. Breakdown Silos
  3. Know Your Business
  4. Take the Long View
  5. Be Introspective

Have an Agile Mindset

It is commonly accepted that DevOps was born from the need of Agile software development to increase the frequency of releases. More releases required faster feedback loops, better quality control methods, and the increased use of automation, amongst other necessities. DevOps practices evolved to meet those challenges.

If an organization is considering DevOps, it should have already successfully embraced Agile, or be well along in their Agile transformation. An outgrowth of Agile software development, DevOps follow many Agile practices. Such Agile practices as cross-team collaboration, continuous and rapid feedback loops, continuous improvement, test-driven development, continuous integration, scheduling work in sprints, and breaking down business requirements into epics, stories, and tasks, are usually all part of a successful DevOps implementation.

If your organization cannot adopt Agile, it will likely fail to successfully embrace DevOps. Imagine a typical scenario in which DevOps enables an organization to release more frequently — monthly instead of quarterly, weekly instead of monthly. However, if the rest of the organization — sales, marketing, training, product documentation, and customer support, is still working in a non-Agile manner, they will not be able to match the improved cycle time DevOps would provide.

Breakdown Silos

Closely associated with an Agile mindset, is breaking down departmental silos. If your organization has already made an Agile transformation, then one should assume those ‘silos’, the physical or more often process-induced ‘walls’ between departments, have been torn down. Having embraced Agile, we assume that Development and Testing are working side-by-side as part of an Agile software development team.

Implementing DevOps requires closing the often wide gap between Development and Operations. If your organization cannot tear down the typically shorter wall between Development and Testing, then tearing down the larger walls between Development and Operations will be impossible.

Know Your Business

Before starting your DevOps journey, an organization needs to know thyself. Most organizations establish business metrics, such as sales quotas, profit targets, employee retention objectives, and client acquisition goals. However, many organizations have not formalized their IT-related Key Performance Indicators (KPIs) or Service Level Agreements (SLAs).

DevOps is all about measurements — application response time, incident volume, severity, and impact, defect density, Mean Time To Recovery (MTTR), downtime, uptime, and so forth. Established meaningful and measurable metrics is one of the best ways to evaluate the continuous improvements achieved by a maturing DevOps practice.

To successfully implement DevOps, an organization should first identify its business-critical performance metrics and service level expectations. Additionally, an organization must accurately and honestly measure itself against those metrics, before beginning the DevOps journey.

Take the Long View

Rome was not built in a day, organizations don’t transform overnight, and DevOps is a journey, not a time-boxed task in a team’s backlog. Before an organization sets out on their journey, they must be willing to take the long view on DevOps. There is a reason DevOps maturity models exist. Like most engineering practices, cultural and organizational transformation, and skill-building exercise, DevOps takes the time to become successfully entrenched in a company.

Rome was not built in a day, organizations don’t transform overnight, and DevOps is a journey, not a time-boxed task in a team’s backlog.

Organizations need to value quick, small wins, followed by more small wins. They should not expect a big bang with DevOps. Achieving high levels DevOps performance is similar to the Agile practice of delivering small pieces of valuable functionality, in an incremental fashion.

Getting the ‘Hello World’ application successfully through a simple continuous integration pipeline might seem small, but think of all the barriers that were overcome to achieve that task — source control, continuous integration server, unit testing, artifact repository, and so on. Your next win, deploy that ‘Hello World’ application to your Test environment, automatically, through a continuous deployment pipeline…

This practice reminds me of an adage. Would you prefer a dollar, every day for the next week, or seven dollars at the end of the week? Most people prefer the immediacy of a dollar each day (small wins), as well as the satisfaction of seeing the value build consistently, day after day. Exercise the same philosophy with DevOps.

Be Introspective

As stated earlier, generally, the first step in creating a strategic plan for implementing DevOps is analyzing your organization’s current level of IT maturity. Individual departments must be willing to be open, honest, and objective when assessing their current state.

The inability of organizations to be transparent about their practices, challenges, and performance, is a sign of an unhealthy corporate culture. Not only is an accurate perspective critical for a maturity analysis and strategic planning, but the existence of an unhealthy culture can also be fatal to most DevOps transformation. DevOps only thrives in an open, collaborative, and supportive culture.

Conclusion

As Alexander Graham Bell once famously said, ‘before anything else, preparation is the key to success.’ Although not a guarantee, properly preparing for a DevOps transformation by addressing these five key areas, should greatly improve an organization’s chances of success.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

Copyright: peshkova / 123RF Stock Photo

, , , , , , ,

1 Comment

Decoupling Microservices using Message-based RPC IPC, with Spring, RabbitMQ, and AMPQ

RabbitMQ_Screen_3

Introduction

There has been a considerable growth in modern, highly scalable, distributed application platforms, built around fine-grained RESTful microservices. Microservices generally use lightweight protocols to communicate with each other, such as HTTP, TCP, UDP, WebSockets, MQTT, and AMQP. Microservices commonly communicate with each other directly using REST-based HTTP, or indirectly, using messaging brokers.

There are several well-known, production-tested messaging queues, such as Apache Kafka, Apache ActiveMQAmazon Simple Queue Service (SQS), and Pivotal’s RabbitMQ. According to Pivotal, of these messaging brokers, RabbitMQ is the most widely deployed open source message broker.

RabbitMQ supports multiple messaging protocols. RabbitMQ’s primary protocol, the Advanced Message Queuing Protocol (AMQP), is an open standard wire-level protocol and semantic framework for high-performance enterprise messaging. According to Spring, ‘AMQP has exchanges, routes, and queues. Messages are first published to exchanges. Routes define on which queue(s) to pipe the message. Consumers subscribing to that queue then receive a copy of the message.

Pivotal’s Spring AMQP project applies core Spring concepts to the development of AMQP-based messaging solutions. The project’s libraries facilitate management of AMQP resources while promoting the use of dependency injection and declarative configuration. The project provides a ‘template’ (RabbitTemplate) as a high-level abstraction for sending and receiving messages.

In this post, we will explore how to start moving Spring Boot Java services away from using synchronous REST HTTP for inter-process communications (IPC), and toward message-based IPC. Moving from synchronous IPC to messaging queues and asynchronous IPC decouples services from one another, allowing us to more easily build, test, and release individual microservices.

Message-Based RPC IPC

Decoupling services using asynchronous IPC is considered optimal by many enterprise software architects when developing modern distributed platforms. However, sometimes it is not easy or possible to get away from synchronous communications. Rightly or wrongly, often times services are architected, such that one service needs to retrieve data from another service or services, in order to process its own requests. It can be said, that service has a direct dependency on the other services. Many would argue, services, especially RESTful microservices, should not be coupled in this way.

There are several ways to break direct service-to-service dependencies using asynchronous IPC. We might implement request/async response REST HTTP-based IPC. We could also use publish/subscribe or publish/async response messaging queue-based IPC. These are all described by NGINX, in their article, Building Microservices: Inter-Process Communication in a Microservices Architecture; a must-read for anyone working with microservices. We might also implement an architecture which supports eventual consistency, eliminating the need for one service to obtain data from another service.

So what if we cannot implement asynchronous methods to break direct service dependencies, but we want to move toward message-based IPC? One answer is message-based Remote Procedure Call (RPC) IPC. I realize the mention of RPC might send cold shivers down the spine of many seasoned architected. Traditional RPC has several challenges, many which have been overcome with more modern architectural patterns.

According to Wikipedia, ‘in distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in another address space (commonly on another computer on a shared network), which is coded as if it were a normal (local) procedure call, without the programmer explicitly coding the details for the remote interaction.

Although still a form of RPC and not asynchronous, it is possible to replace REST HTTP IPC with message-based RPC IPC. Using message-based RPC, services have no direct dependencies on other services. A service only depends on a response to a message request it makes to that queue. The services are now decoupled from one another. The requestor service (the client) has no direct knowledge of the respondent service (the server).

RPC with RabbitMQ and AMQP

RabbitMQ has an excellent set of six tutorials, which cover the basics of creating messaging applications, applying different architectural patterns, using RabbitMQ, in several different programming languages. The sixth and final tutorial covers using RabbitMQ for RPC-based IPC, with the request/reply architectural pattern.

Pivotal recently added Spring AMPQ implementations to each RabbitMQ tutorial, based on their Spring AMQP project. If you recall, the Spring AMQP project applies core Spring concepts to the development of AMQP-based messaging solutions.

This post’s RPC IPC example is closely based on the architectural pattern found in the Spring AMQP RabbitMQ tutorial.

Sample Code

To demonstrate Spring AMQP-based RPC IPC messaging with RabbitMQ, we will use a pair of simple Spring Boot microservices. These services, the Voter and Candidate services, have been used in several previous posts, and for training and testing DevOps engineers. Both services are backed by MongoDB. The services and MongoDB, along with RabbitMQ, are all part of the Voter API project. The Voter API project also contains an HAProxy-based API Gateway, which provides indirect, load-balanced access to the two services.

All code necessary to build this post’s example is available on GitHub, within three projects. The Voter Service project repository contains the Voter service source code, along with the scripts and Docker Compose files required to deploy the project. The Candidate Service project repository and the Voter API Gateway project repository are also available on GitHub. For this post, you need only clone the Voter Service project repository.

Deploying Voter API

All components, including the two Spring services, MongoDB, RabbitMQ, and the API Gateway, are individually deployed using Docker. Each component is publicly available as a Docker Image, on Docker Hub.

The Voter Service repository contains scripts to deploy the entire set of Dockerized components, locally. The repository also contains optional scripts to provision a Docker Swarm, using Docker’s newer swarm mode, and deploy the components. We will only deploy the services locally for this post.

To clone and deploy the components locally, including the two Spring services, MongoDB, RabbitMQ, and the API Gateway, execute the following commands. If this is your first time running the commands, it may take a few minutes for your system to download all the required Docker Images from Docker Hub.


git clone –depth 1 –branch rabbitmq \
https://github.com/garystafford/voter-service.git
cd voter-service/scripts-services
sh ./stack_deploy_local.sh

If everything was deployed successfully, you should see the following output. You should observe five running Docker containers.


? docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8ef4866984c3 garystafford/voter-api-gateway:rabbitmq "/docker-entrypoin…" 25 hours ago Up 25 hours 0.0.0.0:8080->8080/tcp voterstack_voter-api-gateway_1
cc28d084ab17 garystafford/candidate-service:rabbitmq "java -Dspring.pro…" 25 hours ago Up 25 hours 0.0.0.0:8097->8080/tcp voterstack_candidate_1
e4c22258b77b garystafford/voter-service:rabbitmq "java -Dspring.pro…" 25 hours ago Up 25 hours 0.0.0.0:8099->8080/tcp voterstack_voter_1
fdb4b9f58a53 rabbitmq:management-alpine "docker-entrypoint…" 25 hours ago Up 25 hours 4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp, 15671/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp voterstack_rabbitmq_1
1678227b143c mongo:latest "docker-entrypoint…" 25 hours ago Up 25 hours 0.0.0.0:27017->27017/tcp voterstack_mongodb_1

Using Voter API

The Voter Service and Candidate Service GitHub repositories both contain README files, which detail all the API endpoints each service exposes, and how to call them.

In addition to casting votes for candidates, the Voter service has the ability to simulate election results. By calling a /simulation endpoint, and indicating the desired election, the Voter service will randomly generate a number of votes for each candidate in that election. This will save us the burden of casting votes for this demonstration. However, the Voter service has no knowledge of elections or candidates. To obtain a list of candidates, the Voter service depends on the Candidate service.

The Candidate service manages electoral candidates, their political affiliation, and the election in which they are running. Like the Voter service, the Candidate service also has a /simulation endpoint. The service will create a list of candidates based on the 2012 and 2016 US Presidential Elections. The simulation capability of the service saves us the burden of inputting candidates for this demonstration.

REST HTTP Endpoint

The Voter service exposes two almost identical endpoints. Both endpoints generate random votes. However, below the covers, the two endpoints are very different. Calling the /voter/simulation/http/{election} endpoint, prompts the Voter service to request a list of candidates from the Candidate service, based on the election parameter you input. This request is done using synchronous REST HTTP. The Voter service uses the HTTP GET method to request the data from the Candidate service. The Voter service then waits for a response.

The HTTP request is received by the Candidate service. The Candidate service responds to the Voter service with a list of candidates, in JSON format. The Voter service receives the response containing the list of candidates. The Voter service then proceeds to generate a random number of votes for each candidate. Finally, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message Queue Diagram 1D

Message-based RPC Endpoint

Similarly, calling the /voter/simulation/rpc/{election} endpoint with a specific election prompts the Voter service to request the same list of candidates. However, this time, the Voter service (the client), produces a request message and places in RabbitMQ’s voter.rpc.requests queue. The Voter service then waits for a response. The Voter service has no direct dependency on the Candidate service. It only depends on a response to its message request. In this way, it is still a form of synchronous IPC, but the Voter service is now decoupled from the Candidate service.

The request message is consumed by the Candidate service (the server), who is listening to that queue. In response, the Candidate service produces a message containing the list of candidates, serialized to JSON. The Candidate service (the server) sends a response back to the Voter service (the client), through RabbitMQ. This is done using the Direct reply-to feature of RabbitMQ or using a unique response queue, specified in the reply-to header of the request message, sent by the Voter Service.

According to RabbitMQ, ‘the direct reply-to feature allows RPC clients to receive replies directly from their RPC server, without going through a reply queue. (“Directly” here still means going through AMQP and the RabbitMQ server; there is no separate network connection between RPC client and RPC server.)

According to Spring, ‘starting with version 3.4.0, the RabbitMQ server now supports Direct reply-to; this eliminates the main reason for a fixed reply queue (to avoid the need to create a temporary queue for each request). Starting with Spring AMQP version 1.4.1 Direct reply-to will be used by default (if supported by the server) instead of creating temporary reply queues. When no replyQueue is provided (or it is set with the name amq.rabbitmq.reply-to), the RabbitTemplate will automatically detect whether Direct reply-to is supported and use it, or fall back to using a temporary reply queue. When using Direct reply-to, a reply-listener is not required and should not be configured.’ We are using the latest versions of both RabbitMQ and Spring AMQP, which should support Direct reply-to.

The Voter service receives the message containing the list of candidates. The Voter service deserializes the JSON payload to Candidate objects and proceeds to generate a random number of votes for each candidate in the list. Finally, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message Queue Diagram 2D

Exploring the RPC Code

We will not examine the REST HTTP IPC code in this post. Instead, we will explore the RPC code. You are welcome to download the source code and explore the REST HTTP code pattern; it uses some advanced features of Spring Boot and Spring Data.

Spring Dependencies

In order to use RabbitMQ, we need to add a project dependency on org.springframework.boot.spring-boot-starter-amqp. Below is a snippet from the Candidate service’s build.gradle file, showing project dependencies. The Voter service’s dependencies are identical.


dependencies {
compile group: 'org.springframework.boot', name: 'spring-boot-actuator-docs'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-actuator'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-amqp'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-data-mongodb'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-data-rest'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-hateoas'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-logging'
compile group: 'org.springframework.boot', name: 'spring-boot-starter-web'
compile group: 'org.webjars', name: 'hal-browser'
testCompile group: 'org.springframework.boot', name: 'spring-boot-starter-test'
}

view raw

build.gradle

hosted with ❤ by GitHub

AMQP Configuration

Next, we need to add a small amount of RabbitMQ AMQP configuration to both services. We accomplish this by using Spring’s @Configuration annotation on our configuration classes. Below is the configuration class for the Voter service.


package com.voterapi.voter.configuration;
import org.springframework.amqp.core.DirectExchange;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class VoterConfig {
@Bean
public DirectExchange directExchange() {
return new DirectExchange("voter.rpc");
}
}

And here, the configuration class for the Candidate service.


package com.voterapi.candidate.configuration;
import org.springframework.amqp.core.Binding;
import org.springframework.amqp.core.BindingBuilder;
import org.springframework.amqp.core.DirectExchange;
import org.springframework.amqp.core.Queue;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class CandidateConfig {
@Bean
public Queue queue() {
return new Queue("voter.rpc.requests");
}
@Bean
public DirectExchange exchange() {
return new DirectExchange("voter.rpc");
}
@Bean
public Binding binding(DirectExchange exchange, Queue queue) {
return BindingBuilder.bind(queue).to(exchange).with("rpc");
}
}

Candidate Service Code

With the dependencies and configuration in place, we define the method in the Voter service, which will request the candidates from the Candidate service, using RabbitMQ. Below is an abridged version of the Voter service’s CandidateListService class, containing the getCandidatesMessageRpc method. This method calls the rabbitTemplate.convertSendAndReceive method (see line 5, below).


public List<CandidateVoterView> getCandidatesMessageRpc(String election) {
logger.debug("Sending RPC request message for list of candidates…");
String requestMessage = election;
String candidates = (String) rabbitTemplate.convertSendAndReceive(
directExchange.getName(), "rpc", requestMessage);
TypeReference<Map<String, List<CandidateVoterView>>> mapType =
new TypeReference<Map<String, List<CandidateVoterView>>>() {};
ObjectMapper objectMapper = new ObjectMapper();
Map<String, List<CandidateVoterView>> candidatesMap = null;
try {
candidatesMap = objectMapper.readValue(candidates, mapType);
} catch (IOException e) {
logger.info(String.valueOf(e));
}
List<CandidateVoterView> candidatesList = candidatesMap.get("candidates");
logger.debug("List of {} candidates received…", candidatesList.size());
return candidatesList;
}

Voter Service Code

Next, we define a method in the Candidate service, which will process the Voter service’s request. Below is an abridged version of the CandidateController class, containing the getCandidatesMessageRpc method. This method is decorated with Spring’s @RabbitListener annotation (see line 1, below). This annotation marks c to be the target of a Rabbit message listener on the voter.rpc.requests queue.

Also shown, are the getCandidatesMessageRpc method’s two helper methods, getByElection and serializeToJson. These methods query MongoDB for the list of candidates and serialize the list to JSON.


@RabbitListener(queues = "voter.rpc.requests")
private String getCandidatesMessageRpc(String requestMessage) {
logger.debug("Request message: {}", requestMessage);
logger.debug("Sending RPC response message with list of candidates…");
List<CandidateVoterView> candidates = getByElection(requestMessage);
return serializeToJson(candidates);
}
private List<CandidateVoterView> getByElection(String election) {
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(Criteria.where("election").is(election)),
project("firstName", "lastName", "politicalParty", "election")
.andExpression("concat(firstName,' ', lastName)")
.as("fullName"),
sort(Sort.Direction.ASC, "lastName")
);
AggregationResults<CandidateVoterView> groupResults
= mongoTemplate.aggregate(aggregation, Candidate.class, CandidateVoterView.class);
return groupResults.getMappedResults();
}
private String serializeToJson(List<CandidateVoterView> candidates) {
ObjectMapper mapper = new ObjectMapper();
String jsonInString = "";
final Map<String, List<CandidateVoterView>> dataMap = new HashMap<>();
dataMap.put("candidates", candidates);
try {
jsonInString = mapper.writeValueAsString(dataMap);
} catch (JsonProcessingException e) {
logger.info(String.valueOf(e));
}
logger.debug(jsonInString);
return jsonInString;
}

Demonstration

To demonstrate both the synchronous REST HTTP IPC code and the Spring AMQP-based RPC IPC code, we will make a few REST HTTP calls to the Voter API Gateway. For convenience, I have provided a shell script, demostrate_ipc.sh, which executes all the API calls necessary. I have added sleep commands to slow the output to the terminal down a bit, for easier analysis. The script requires HTTPie, a great time saver when working with RESTful services.

The demostrate_ipc.sh script does three things. First, it calls the Candidate service to generate a group of sample candidates. Next, the script calls the Voter service to simulate votes, using synchronous REST HTTP. Lastly, the script repeats the voter simulation, this time using message-based RPC IPC. All API calls are done through the Voter API Gateway on port 8080. To understand the API calls, examine the script, below.


#!/bin/sh
# Demostrate API calls for REST HTTP IPC and RPC IPC via API Gateway
# Requires HTTPie
# Requires all services are running
set -e
HOST=${1:-localhost:8080}
API_GATEWAY="http://${HOST}"
ELECTION="2016%20Presidential%20Election"
echo "Simulating candidates…"
http ${API_GATEWAY}/candidate/simulation && sleep 2
http ${API_GATEWAY}/candidate/candidates/summary/${ELECTION} && sleep 2
echo "Simulating voting using REST HTTP IPC…"
http ${API_GATEWAY}/voter/simulation/http/${ELECTION} && sleep 2
http ${API_GATEWAY}/voter/results && sleep 4
http ${API_GATEWAY}/voter/winners && sleep 2
echo "Simulating voting using message-based RPC IPC…"
http ${API_GATEWAY}/voter/simulation/rpc/${ELECTION} && sleep 2
http ${API_GATEWAY}/voter/results && sleep 4
http ${API_GATEWAY}/voter/winners && sleep 2
echo "Script completed…"

Below is the list of candidates for the 2016 Presidential Election, generated by the Candidate service. The JSON payload was retrieved using the Voter service’s /voter/candidates/rpc/{election} endpoint. This endpoint uses the same RPC IPC method as the Voter service’s /voter/simulation/rpc/{election} endpoint.


HTTP/1.1 200
Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: Content-Type, Accept, X-Requested-With, remember-me
Access-Control-Allow-Methods: POST, GET, OPTIONS, DELETE
Access-Control-Max-Age: 3600
Content-Type: application/json;charset=UTF-8
Date: Sun, 07 May 2017 19:10:22 GMT
Transfer-Encoding: chunked
X-Application-Context: Voter Service:docker-local:8099
{
"candidates": [
{
"election": "2016 Presidential Election",
"fullName": "Darrell Castle",
"politicalParty": "Constitution Party"
},
{
"election": "2016 Presidential Election",
"fullName": "Hillary Clinton",
"politicalParty": "Democratic Party"
},
{
"election": "2016 Presidential Election",
"fullName": "Gary Johnson",
"politicalParty": "Libertarian Party"
},
{
"election": "2016 Presidential Election",
"fullName": "Chris Keniston",
"politicalParty": "Veterans Party"
},
{
"election": "2016 Presidential Election",
"fullName": "Jill Stein",
"politicalParty": "Green Party"
},
{
"election": "2016 Presidential Election",
"fullName": "Donald Trump",
"politicalParty": "Republican Party"
}
]
}

Based on the list of candidates, below are the simulated election results. This JSON payload was retrieved using the Voter service’s /voter/results endpoint.


HTTP/1.1 200
Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: Content-Type, Accept, X-Requested-With, remember-me
Access-Control-Allow-Methods: POST, GET, OPTIONS, DELETE
Access-Control-Max-Age: 3600
Content-Type: application/json;charset=UTF-8
Date: Sun, 07 May 2017 19:42:42 GMT
Transfer-Encoding: chunked
X-Application-Context: Voter Service:docker-local:8099
{
"results": [
{
"candidate": "Jill Stein",
"votes": 19
},
{
"candidate": "Gary Johnson",
"votes": 16
},
{
"candidate": "Hillary Clinton",
"votes": 12
},
{
"candidate": "Donald Trump",
"votes": 12
},
{
"candidate": "Chris Keniston",
"votes": 10
},
{
"candidate": "Darrell Castle",
"votes": 9
}
]
}

RabbitMQ Management Console

The easiest way to observe what is happening with our messages is using the RabbitMQ Management Console. To access the console, point your web-browser to localhost, on port 15672. The default login credentials for the console are guest/guest.

As you successfully send and receive messages between the services through RabbitMQ, you should see activity on the Overview tab. In addition, you should see a number of Connections, Channels, Exchanges, Queues, and Consumers.

RabbitMQ_Screen_3

In the Queues tab, you should find a single queue, the voter.rpc.requests queue. This queue was configured in the Candidate service’s configuration class, shown previously.

RabbitMQ_Screen_2

In the Exchanges tab, you should see one exchange, voter.rpc, which we configured in both the Voter and the Candidate service’s configuration classes (aka DirectExchange). Also, visible in the Exchanges tab, should be the routing key rpc, which we configured in the Candidate service’s configuration class (aka Binding).

The route binds the exchange to the voter.rpc.requests queue. If you recall Spring’s description, AMQP has exchanges (DirectExchange), routes (Binding), and queues (Queue). Messages are first published to exchanges. Routes define on which queue(s) to pipe the message. Consumers subscribing to that queue then receive a copy of the message.

RabbitMQ_Screen_1

In the Channels tab, you should note two connections, the single instances of the Voter and Candidate services. Likewise, there are two channels, one for each service. You can differentiate the channels by the presence of the consumer tag. The consumer tag, in this example, amq.ctag-Anv7GXs7ZWVoznO64euyjQ, uniquely identifies the consumer. In this example, the Voter service is the consumer. For a more complete explanation of the consumer tag, check out RabbitMQ’s AMQP documentation.

RabbitMQ_Screen_4.png

Message Structure

Messages cannot be viewed directly in the RabbitMQ Management Console. One way I have found to view messages is using your IDE’s debugger. Below, I have added a breakpoint on the Candidate service’ getCandidatesMessageRpc method, using IntelliJ IDEA. You can view the Voter service’s request message, as it is received by the Candidate service.

Debug_RPC_Message.png

Note the message payload, the requested election. Note the twelve message header elements. The headers include the AMQP exchange, queue, and binding. The message headers also include the consumer tag. The message also uniquely identifies the reply-to queue to use, if the server does not support Direct reply-to (see earlier explanation).

Service Logs

In addition to the RabbitMQ Management Console, we may obverse communications between the two services, by looking at the Voter and Candidate service’s logs. I have grabbed a snippet of both service’s logs and added a few comments to show where different processes are being executed. First the Voter service logs.


# API request is made
2017-05-03 21:10:32.947 DEBUG 1 — [nio-8099-exec-3] s.w.s.m.m.a.RequestMappingHandlerMapping : Looking up handler method for path /simulation/rpc/2016 Presidential Election
2017-05-03 21:10:32.962 DEBUG 1 — [nio-8099-exec-3] s.w.s.m.m.a.RequestMappingHandlerMapping : Returning handler method [public org.springframework.http.ResponseEntity<java.util.Map<java.lang.String, java.lang.String>> com.voter_api.voter.controller.VoterController.getSimulationRpc(java.lang.String)]
2017-05-03 21:10:32.967 DEBUG 1 — [nio-8099-exec-3] o.s.b.f.s.DefaultListableBeanFactory : Returning cached instance of singleton bean 'voterController'
2017-05-03 21:10:32.969 DEBUG 1 — [nio-8099-exec-3] o.s.web.servlet.DispatcherServlet : Last-Modified value for [/voter/simulation/rpc/2016%20Presidential%20Election] is: -1
# Clearing out previous MongoDB data
2017-05-03 21:10:32.977 DEBUG 1 — [nio-8099-exec-3] o.s.data.mongodb.core.MongoDbUtils : Getting Mongo Database name=[voter]
2017-05-03 21:10:32.980 DEBUG 1 — [nio-8099-exec-3] o.s.data.mongodb.core.MongoTemplate : Remove using query: { } in collection: vote.
2017-05-03 21:10:32.985 DEBUG 1 — [nio-8099-exec-3] org.mongodb.driver.protocol.delete : Deleting documents from namespace voter.vote on connection [connectionId{localValue:2, serverValue:4}] to server mongodb:27017
2017-05-03 21:10:32.990 DEBUG 1 — [nio-8099-exec-3] org.mongodb.driver.protocol.delete : Delete completed
# Publishing request message to queue
2017-05-03 21:10:32.999 DEBUG 1 — [nio-8099-exec-3] o.s.amqp.rabbit.core.RabbitTemplate : Executing callback on RabbitMQ Channel: Cached Rabbit Channel: AMQChannel(amqp://guest@172.20.0.2:5672/,1), conn: Proxy@247be51c Shared Rabbit Connection: SimpleConnection@61797757 [delegate=amqp://guest@172.20.0.2:5672/, localPort= 57908]
2017-05-03 21:10:33.018 DEBUG 1 — [nio-8099-exec-3] o.s.amqp.rabbit.core.RabbitTemplate : Publishing message on exchange [voter.rpc], routingKey = [rpc]
# Receiving response
2017-05-03 21:10:33.109 DEBUG 1 — [nio-8099-exec-3] o.s.amqp.rabbit.core.RabbitTemplate : Reply: (Body:'[{"fullName":"Darrell Castle","politicalParty":"Constitution Party","election":"2016 Presidential Election"},{"fullName":"Hillary Clinton","politicalParty":"Democratic Party","election":"2016 Presidential Election"},{"fullName":"Gary Johnson","politicalParty":"Libertarian Party","election":"2016 Presidential Election"},{"fullName":"Chris Keniston","politicalParty":"Veterans Party","election":"2016 Presidential Election"},{"fullName":"Jill Stein","politicalParty":"Green Party","election":"2016 Presidential Election"},{"fullName":"Donald Trump","politicalParty":"Republican Party","election":"2016 Presidential Election"}]' MessageProperties [headers={}, timestamp=null, messageId=null, userId=null, receivedUserId=null, appId=null, clusterId=null, type=null, correlationId=null, correlationIdString=null, replyTo=null, contentType=text/plain, contentEncoding=UTF-8, contentLength=0, deliveryMode=null, receivedDeliveryMode=PERSISTENT, expiration=null, priority=0, redelivered=false, receivedExchange=, receivedRoutingKey=amq.rabbitmq.reply-to.g2dkAA9yYWJiaXRAcmFiYml0bXEAAAH3AAAAAAI=.GREaYm1ow+4nMWzSClXlfQ==, receivedDelay=null, deliveryTag=1, messageCount=null, consumerTag=null, consumerQueue=null])
# Inserting simulation data into MongoDB
2017-05-03 21:10:33.154 DEBUG 1 — [nio-8099-exec-3] o.s.data.mongodb.core.MongoTemplate : Inserting list of DBObjects containing 34 items
2017-05-03 21:10:33.154 DEBUG 1 — [nio-8099-exec-3] o.s.data.mongodb.core.MongoDbUtils : Getting Mongo Database name=[voter]
2017-05-03 21:10:33.157 DEBUG 1 — [nio-8099-exec-3] org.mongodb.driver.protocol.insert : Inserting 34 documents into namespace voter.vote on connection [connectionId{localValue:2, serverValue:4}] to server mongodb:27017
2017-05-03 21:10:33.169 DEBUG 1 — [nio-8099-exec-3] org.mongodb.driver.protocol.insert : Insert completed
# Sending response to API call
2017-05-03 21:10:33.180 DEBUG 1 — [nio-8099-exec-3] o.s.b.f.s.DefaultListableBeanFactory : Returning cached instance of singleton bean 'org.springframework.boot.actuate.autoconfigure.EndpointWebMvcHypermediaManagementContextConfiguration$ActuatorEndpointLinksAdvice'
2017-05-03 21:10:33.182 DEBUG 1 — [nio-8099-exec-3] o.s.w.s.m.m.a.HttpEntityMethodProcessor : Written [{message=Simulation data created using RPC!}] as "application/json" using [org.springframework.http.converter.json.MappingJackson2HttpMessageConverter@387a8303]
2017-05-03 21:10:33.185 DEBUG 1 — [nio-8099-exec-3] o.s.web.servlet.DispatcherServlet : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
2017-05-03 21:10:33.186 DEBUG 1 — [nio-8099-exec-3] o.s.web.servlet.DispatcherServlet : Successfully completed request

view raw

voter_log.txt

hosted with ❤ by GitHub

Next, the Candidate service logs.


# Listening for messages
2017-05-03 21:10:30.000 DEBUG 1 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Retrieving delivery for Consumer@662706a7: tags=[{amq.ctag-Anv7GXs7ZWVoznO64euyjQ=voter.rpc.requests}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@172.20.0.2:5672/,1), conn: Proxy@6f92666f Shared Rabbit Connection: SimpleConnection@6badffc2 [deleg2017-05-03 21:10:31.001 DEBUG 1 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Retrieving delivery for Consumer@662706a7: tags=[{amq.ctag-Anv7GXs7ZWVoznO64euyjQ=voter.rpc.requests}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@172.20.0.2:5672/,1), conn: Proxy@6f92666f Shared Rabbit Connection: SimpleConnection@6badffc2 [delegate=amqp://guest@172.20.0.2:5672/, localPort= 33932], acknowledgeMode=AUTO local queue size=0
2017-05-03 21:10:32.003 DEBUG 1 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Retrieving delivery for Consumer@662706a7: tags=[{amq.ctag-Anv7GXs7ZWVoznO64euyjQ=voter.rpc.requests}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@172.20.0.2:5672/,1), conn: Proxy@6f92666f Shared Rabbit Connection: SimpleConnection@6badffc2 [delegate=amqp://guest@172.20.0.2:5672/, localPort= 33932], acknowledgeMode=AUTO local queue size=0
2017-05-03 21:10:33.005 DEBUG 1 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Retrieving delivery for Consumer@662706a7: tags=[{amq.ctag-Anv7GXs7ZWVoznO64euyjQ=voter.rpc.requests}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@172.20.0.2:5672/,1), conn: Proxy@6f92666f Shared Rabbit Connection: SimpleConnection@6badffc2 [delegate=amqp://guest@172.20.0.2:5672/, localPort= 33932], acknowledgeMode=AUTO local queue size=0
# Retrieving message
2017-05-03 21:10:33.044 DEBUG 1 — [pool-1-thread-5] o.s.a.r.listener.BlockingQueueConsumer : Storing delivery for Consumer@662706a7: tags=[{amq.ctag-Anv7GXs7ZWVoznO64euyjQ=voter.rpc.requests}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@172.20.0.2:5672/,1), conn: Proxy@6f92666f Shared Rabbit Connection: SimpleConnection@6badffc2 [delegate=amqp://guest@172.20.0.2:5672/, localPort= 33932], acknowledgeMode=AUTO local queue size=0
2017-05-03 21:10:33.049 DEBUG 1 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Received message: (Body:'2016 Presidential Election' MessageProperties [headers={}, timestamp=null, messageId=null, userId=null, receivedUserId=null, appId=null, clusterId=null, type=null, correlationId=null, correlationIdString=null, replyTo=amq.rabbitmq.reply-to.g2dkAA9yYWJiaXRAcmFiYml0bXEAAAH3AAAAAAI=.GREaYm1ow+4nMWzSClXlfQ==, contentType=text/plain, contentEncoding=UTF-8, contentLength=0, deliveryMode=null, receivedDeliveryMode=PERSISTENT, expiration=null, priority=0, redelivered=false, receivedExchange=voter.rpc, receivedRoutingKey=rpc, receivedDelay=null, deliveryTag=14, messageCount=0, consumerTag=amq.ctag-Anv7GXs7ZWVoznO64euyjQ, consumerQueue=voter.rpc.requests])
2017-05-03 21:10:33.054 DEBUG 1 — [cTaskExecutor-1] .a.r.l.a.MessagingMessageListenerAdapter : Processing [GenericMessage [payload=2016 Presidential Election, headers={amqp_receivedDeliveryMode=PERSISTENT, amqp_receivedRoutingKey=rpc, amqp_contentEncoding=UTF-8, amqp_receivedExchange=voter.rpc, amqp_deliveryTag=14, amqp_replyTo=amq.rabbitmq.reply-to.g2dkAA9yYWJiaXRAcmFiYml0bXEAAAH3AAAAAAI=.GREaYm1ow+4nMWzSClXlfQ==, amqp_consumerQueue=voter.rpc.requests, amqp_redelivered=false, id=bbd84286-fae6-36e2-f5e8-d8d9714cde6c, amqp_consumerTag=amq.ctag-Anv7GXs7ZWVoznO64euyjQ, contentType=text/plain, timestamp=1493845833053}]]
2017-05-03 21:10:33.057 DEBUG 1 — [cTaskExecutor-1] c.v.c.controller.CandidateController : Request message: 2016 Presidential Election
# Querying MongDB for candidates
2017-05-03 21:10:33.063 DEBUG 1 — [cTaskExecutor-1] o.s.data.mongodb.core.MongoTemplate : Executing aggregation: { "aggregate" : "candidate" , "pipeline" : [ { "$match" : { "election" : "2016 Presidential Election"}} , { "$project" : { "firstName" : 1 , "lastName" : 1 , "politicalParty" : 1 , "election" : 1 , "fullName" : { "$concat" : [ "$firstName" , " " , "$lastName"]}}} , { "$sort" : { "lastName" : 1}}]}
2017-05-03 21:10:33.063 DEBUG 1 — [cTaskExecutor-1] o.s.data.mongodb.core.MongoDbUtils : Getting Mongo Database name=[candidates]
2017-05-03 21:10:33.064 DEBUG 1 — [cTaskExecutor-1] org.mongodb.driver.protocol.command : Sending command {aggregate : BsonString{value='candidate'}} to database candidates on connection [connectionId{localValue:2, serverValue:3}] to server mongodb:27017
2017-05-03 21:10:33.067 DEBUG 1 — [cTaskExecutor-1] org.mongodb.driver.protocol.command : Command execution completed
# Responding to queue with results
2017-05-03 21:10:33.094 DEBUG 1 — [cTaskExecutor-1] .a.r.l.a.MessagingMessageListenerAdapter : Listener method returned result [[{"fullName":"Darrell Castle","politicalParty":"Constitution Party","election":"2016 Presidential Election"},{"fullName":"Hillary Clinton","politicalParty":"Democratic Party","election":"2016 Presidential Election"},{"fullName":"Gary Johnson","politicalParty":"Libertarian Party","election":"2016 Presidential Election"},{"fullName":"Chris Keniston","politicalParty":"Veterans Party","election":"2016 Presidential Election"},{"fullName":"Jill Stein","politicalParty":"Green Party","election":"2016 Presidential Election"},{"fullName":"Donald Trump","politicalParty":"Republican Party","election":"2016 Presidential Election"}]] – generating response message for it
2017-05-03 21:10:33.096 DEBUG 1 — [cTaskExecutor-1] .a.r.l.a.MessagingMessageListenerAdapter : Publishing response to exchange = [], routingKey = [amq.rabbitmq.reply-to.g2dkAA9yYWJiaXRAcmFiYml0bXEAAAH3AAAAAAI=.GREaYm1ow+4nMWzSClXlfQ==]
# Returning to listening for messages
2017-05-03 21:10:33.123 DEBUG 1 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Retrieving delivery for Consumer@662706a7: tags=[{amq.ctag-Anv7GXs7ZWVoznO64euyjQ=voter.rpc.requests}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@172.20.0.2:5672/,1), conn: Proxy@6f92666f Shared Rabbit Connection: SimpleConnection@6badffc2 [delegate=amqp://guest@172.20.0.2:5672/, localPort= 33932], acknowledgeMode=AUTO local queue size=0
2017-05-03 21:10:34.125 DEBUG 1 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Retrieving delivery for Consumer@662706a7: tags=[{amq.ctag-Anv7GXs7ZWVoznO64euyjQ=voter.rpc.requests}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@172.20.0.2:5672/,1), conn: Proxy@6f92666f Shared Rabbit Connection: SimpleConnection@6badffc2 [delegate=amqp://guest@172.20.0.2:5672/, localPort= 33932], acknowledgeMode=AUTO local queue size=0
2017-05-03 21:10:35.126 DEBUG 1 — [cTaskExecutor-1] o.s.a.r.listener.BlockingQueueConsumer : Retrieving delivery for Consumer@662706a7: tags=[{amq.ctag-Anv7GXs7ZWVoznO64euyjQ=voter.rpc.requests}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest@172.20.0.2:5672/,1), conn: Proxy@6f92666f Shared Rabbit Connection: SimpleConnection@6badffc2 [delegate=amqp://guest@172.20.0.2:5672/, localPort= 33932], acknowledgeMode=AUTO local queue size=0

Performance

What about the performance of Spring AMQP RPC IPC versus REST HTTP IPC? RabbitMQ has proven to be very performant, having been clocked at one million messages per second on GCE. I performed a series of fairly ‘unscientific’ performance tests, completing 250, 500, and then 1,000 requests. The tests were performed on a six-node Docker Swarm cluster with three instances of each service in a round-robin load-balanced configuration, and a single instance of RabbitMQ. The scripts to create the swarm cluster can be found in the Voter service GitHub project.

Based on consistent test results, the speed of the two methods was almost identical. Both methods performed between 3.1 to 3.2 responses per second. For example, the Spring AMQP RPC IPC method successfully completed 1,000 requests in 5 minutes and 11 seconds, while the REST HTTP IPC method successfully completed 1,000 requests in 5 minutes and 18 seconds, 7 seconds slower than the RPC method.

RabbitMQ on Docker Swarm

There are many variables to consider, which could dramatically impact IPC performance. For example, RabbitMQ was not clustered. Also, we did not use any type of caching, such as Varnish, Memcached, or Redis. Both these could dramatically increase IPC performance.

There are also several notable differences between the two methods from a code perspective. The REST HTTP method relies on Spring Data Projection combined with Spring Data MongoDB Repository, to obtain the candidate list from MongoDB. Somewhat differently, the RPC method makes use of Spring Data MongoDB Aggregation to return a list of candidates. Therefore, the test results should be taken with a grain of salt.

Production Considerations

The post demonstrated a simple example of RPC communications between two services using Spring AMQP. In an actual Production environment, there are a few things that must be considered, as Pivotal points out:

  • How should either service react on startup if RabbitMQ is not available? What if RabbitMQ fails after the services have started?
  • How should the Voter server (the client) react if there are no Candidate service instances (the server) running?
  • Should the Voter service have a timeout for the RPC response to return? What should happen if the request times out?
  • If the Candidate service malfunctions and raises an exception, should it be forwarded to the Voter service?
  • How does the Voter service protect against invalid incoming messages (eg checking bounds of the candidate list) before processing?
  • In all of the above scenarios, what, if any, response is returned to the API end user?

Conclusion

Although in this post we did not achieve asynchronous inter-process communications, we did achieve a higher level of service decoupling, using message-based RPC IPC. Adopting a message-based, loosely-coupled architecture, whether asynchronous or synchronous, wherever possible, will improve the overall functionality and deliverability of a microservices-based platform.

References

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , ,

4 Comments

Provision and Deploy a Consul Cluster on AWS, using Terraform, Docker, and Jenkins

Cover2

Introduction

Modern DevOps tools, such as HashiCorp’s Packer and Terraform, make it easier to provision and manage complex cloud architecture. Utilizing a CI/CD server, such as Jenkins, to securely automate the use of these DevOps tools, ensures quick and consistent results.

In a recent post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, we built a Consul cluster using Docker swarm mode, to host distributed configurations for a Spring Boot application. The cluster was built locally with VirtualBox. This architecture is fine for development and testing, but not for use in Production.

In this post, we will deploy a highly available three-node Consul cluster to AWS. We will use Terraform to provision a set of EC2 instances and accompanying infrastructure. The instances will be built from a hybrid AMIs containing the new Docker Community Edition (CE). In a recent post, Baking AWS AMI with new Docker CE Using Packer, we provisioned an Ubuntu AMI with Docker CE, using Packer. We will deploy Docker containers to each EC2 host, containing an instance of Consul server.

All source code can be found on GitHub.

Jenkins

I have chosen Jenkins to automate all of the post’s build, provisioning, and deployment tasks. However, none of the code is written specifically to Jenkins; you may run all of it from the command line.

For this post, I have built four projects in Jenkins, as follows:

  1. Provision Docker CE AMI: Builds Ubuntu AMI with Docker CE, using Packer
  2. Provision Consul Infra AWS: Provisions Consul infrastructure on AWS, using Terraform
  3. Deploy Consul Cluster AWS: Deploys Consul to AWS, using Docker
  4. Destroy Consul Infra AWS: Destroys Consul infrastructure on AWS, using Terraform

Jenkins UI

We will primarily be using the ‘Provision Consul Infra AWS’, ‘Deploy Consul Cluster AWS’, and ‘Destroy Consul Infra AWS’ Jenkins projects in this post. The fourth Jenkins project, ‘Provision Docker CE AMI’, automates the steps found in the recent post, Baking AWS AMI with new Docker CE Using Packer, to build the AMI used to provision the EC2 instances in this post.

Consul AWS Diagram 2

Terraform

Using Terraform, we will provision EC2 instances in three different Availability Zones within the US East 1 (N. Virginia) Region. Using Terraform’s Amazon Web Services (AWS) provider, we will create the following AWS resources:

  • (1) Virtual Private Cloud (VPC)
  • (1) Internet Gateway
  • (1) Key Pair
  • (3) Elastic Cloud Compute (EC2) Instances
  • (2) Security Groups
  • (3) Subnets
  • (1) Route
  • (3) Route Tables
  • (3) Route Table Associations

The final AWS architecture should resemble the following:

Consul AWS Diagram

Production Ready AWS

Although we have provisioned a fairly complete VPC for this post, it is far from being ready for Production. I have created two security groups, limiting the ingress and egress to the cluster. However, to further productionize the environment would require additional security hardening. At a minimum, you should consider adding public/private subnets, NAT gateways, network access control list rules (network ACLs), and the use of HTTPS for secure communications.

In production, applications would communicate with Consul through local Consul clients. Consul clients would take part in the LAN gossip pool from different subnets, Availability Zones, Regions, or VPCs using VPC peering. Communications would be tightly controlled by IAM, VPC, subnet, IP address, and port.

Also, you would not have direct access to the Consul UI through a publicly exposed IP or DNS address. Access to the UI would be removed altogether or locked down to specific IP addresses, and accessed restricted to secure communication channels.

Consul

We will achieve high availability (HA) by clustering three Consul server nodes across the three Elastic Cloud Compute (EC2) instances. In this minimally sized, three-node cluster of Consul servers, we are protected from the loss of one Consul server node, one EC2 instance, or one Availability Zone(AZ). The cluster will still maintain a quorum of two nodes. An additional level of HA that Consul supports, multiple datacenters (multiple AWS Regions), is not demonstrated in this post.

Docker

Having Docker CE already installed on each EC2 instance allows us to execute remote Docker commands over SSH from Jenkins. These commands will deploy and configure a Consul server node, within a Docker container, on each EC2 instance. The containers are built from HashiCorp’s latest Consul Docker image pulled from Docker Hub.

Getting Started

Preliminary Steps

If you have built infrastructure on AWS with Terraform, these steps should be familiar to you:

  1. First, you will need an AMI with Docker. I suggest reading Baking AWS AMI with new Docker CE Using Packer.
  2. You will need an AWS IAM User with the proper access to create the required infrastructure. For this post, I created a separate Jenkins IAM User with PowerUser level access.
  3. You will need to have an RSA public-private key pair, which can be used to SSH into the EC2 instances and install Consul.
  4. Ensure you have your AWS credentials set. I usually source mine from a .env file, as environment variables. Jenkins can securely manage credentials, using secret text or files.
  5. Fork and/or clone the Consul cluster project from  GitHub.
  6. Change the aws_key_name and public_key_path variable values to your own RSA key, in the variables.tf file
  7. Change the aws_amis_base variable values to your own AMI ID (see step 1)
  8. If you are do not want to use the US East 1 Region and its AZs, modify the variables.tf, network.tf, and instances.tf files.
  9. Disable Terraform’s remote state or modify the resource to match your remote state configuration, in the main.tf file. I am using an Amazon S3 bucket to store my Terraform remote state.

Building an AMI with Docker

If you have not built an Amazon Machine Image (AMI) for use in this post already, you can do so using the scripts provided in the previous post’s GitHub repository. To automate the AMI build task, I built the ‘Provision Docker CE AMI’ Jenkins project. Identical to the other three Jenkins projects in this post, this project has three main tasks, which include: 1) SCM: clone the Packer AMI GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Packer.

The SCM and Bindings tasks are identical to the other projects (see below for details), except for the use of a different GitHub repository. The project’s Build step, which runs the packer_build_ami.sh script looks as follows:

jenkins_13

The resulting AMI ID will need to be manually placed in Terraform’s variables.tf file, before provisioning the AWS infrastructure with Terraform. The new AMI ID will be displayed in Jenkin’s build output.

jenkins_14

Provisioning with Terraform

Based on the modifications you made in the Preliminary Steps, execute the terraform validate command to confirm your changes. Then, run the terraform plan command to review the plan. Assuming are were no errors, finally, run the terraform apply command to provision the AWS infrastructure components.

In Jenkins, I have created the ‘Provision Consul Infra AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Terraform. Those tasks look as follows:

Jenkins_08.png

You will obviously need to use your modified GitHub project, incorporating the configuration changes detailed above, as the SCM source for Jenkins.

Jenkins Credentials

You will also need to configure your AWS credentials.

Jenkins_03.png

The provision_infra.sh script provisions the AWS infrastructure using Terraform. The script also updates Terraform’s remote state. Remember to update the remote state configuration in the script to match your personal settings.


cd tf_env_aws/
terraform remote config \
-backend=s3 \
-backend-config="bucket=your_bucket" \
-backend-config="key=terraform_consul.tfstate" \
-backend-config="region=your_region"
terraform plan
terraform apply

The Jenkins build output should look similar to the following:

jenkins_12.png

Although the build only takes about 90 seconds to complete, the EC2 instances could take a few extra minutes to complete their Status Checks and be completely ready. The final results in the AWS EC2 Management Console should look as follows:

EC2 Management Console

Note each EC2 instance is running in a different US East 1 Availability Zone.

Installing Consul

Once the AWS infrastructure is running and the EC2 instances have completed their Status Checks successfully, we are ready to deploy Consul. In Jenkins, I have created the ‘Deploy Consul Cluster AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run an SSH remote Docker command on each EC2 instance to deploy Consul. The SCM and Bindings tasks are identical to the project above. The project’s Build step looks as follows:

Jenkins_04.png

First, the delete_containers.sh script deletes any previous instances of Consul containers. This is helpful if you need to re-deploy Consul. Next, the deploy_consul.sh script executes a series of SSH remote Docker commands to install and configure Consul on each EC2 instance.


# Advertised Consul IP
export ec2_server1_private_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-1' \
–output text –query 'Reservations[*].Instances[*].PrivateIpAddress')
echo "consul-server-1 private ip: ${ec2_server1_private_ip}"

view raw

consul_01.sh

hosted with ❤ by GitHub


# Deploy Consul Server 1
ec2_public_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-1' \
–output text –query 'Reservations[*].Instances[*].PublicIpAddress')
consul_server="consul-server-1"
ssh -oStrictHostKeyChecking=no -T \
-i ~/.ssh/consul_aws_rsa \
ubuntu@${ec2_public_ip} << EOSSH
docker run -d \
–net=host \
–hostname ${consul_server} \
–name ${consul_server} \
–env "SERVICE_IGNORE=true" \
–env "CONSUL_CLIENT_INTERFACE=eth0" \
–env "CONSUL_BIND_INTERFACE=eth0" \
–volume /home/ubuntu/consul/data:/consul/data \
–publish 8500:8500 \
consul:latest \
consul agent -server -ui -client=0.0.0.0 \
-bootstrap-expect=3 \
-advertise='{{ GetInterfaceIP "eth0" }}' \
-data-dir="/consul/data"
sleep 5
docker logs consul-server-1
docker exec -i consul-server-1 consul members
EOSSH

view raw

consul_02.sh

hosted with ❤ by GitHub


# Deploy Consul Server 2
ec2_public_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-2' \
–output text –query 'Reservations[*].Instances[*].PublicIpAddress')
consul_server="consul-server-2"
ssh -oStrictHostKeyChecking=no -T \
-i ~/.ssh/consul_aws_rsa \
ubuntu@${ec2_public_ip} << EOSSH
docker run -d \
–net=host \
–hostname ${consul_server} \
–name ${consul_server} \
–env "SERVICE_IGNORE=true" \
–env "CONSUL_CLIENT_INTERFACE=eth0" \
–env "CONSUL_BIND_INTERFACE=eth0" \
–volume /home/ubuntu/consul/data:/consul/data \
–publish 8500:8500 \
consul:latest \
consul agent -server -ui -client=0.0.0.0 \
-advertise='{{ GetInterfaceIP "eth0" }}' \
-retry-join="${ec2_server1_private_ip}" \
-data-dir="/consul/data"
sleep 5
docker logs consul-server-2
docker exec -i consul-server-2 consul members
EOSSH

view raw

consul_03.sh

hosted with ❤ by GitHub


# Deploy Consul Server 3
ec2_public_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-3' \
–output text –query 'Reservations[*].Instances[*].PublicIpAddress')
consul_server="consul-server-3"
ssh -oStrictHostKeyChecking=no -T \
-i ~/.ssh/consul_aws_rsa \
ubuntu@${ec2_public_ip} << EOSSH
docker run -d \
–net=host \
–hostname ${consul_server} \
–name ${consul_server} \
–env "SERVICE_IGNORE=true" \
–env "CONSUL_CLIENT_INTERFACE=eth0" \
–env "CONSUL_BIND_INTERFACE=eth0" \
–volume /home/ubuntu/consul/data:/consul/data \
–publish 8500:8500 \
consul:latest \
consul agent -server -ui -client=0.0.0.0 \
-advertise='{{ GetInterfaceIP "eth0" }}' \
-retry-join="${ec2_server1_private_ip}" \
-data-dir="/consul/data"
sleep 5
docker logs consul-server-3
docker exec -i consul-server-3 consul members
EOSSH

view raw

consul_04.sh

hosted with ❤ by GitHub


# Output Consul Web UI URL
ec2_public_ip=$(aws ec2 describe-instances \
–filters Name='tag:Name,Values=tf-instance-consul-server-1' \
–output text –query 'Reservations[*].Instances[*].PublicIpAddress')
echo " "
echo "*** Consul UI: http://${ec2_public_ip}:8500/ui/ ***"

view raw

consul_05.sh

hosted with ❤ by GitHub

The entire Jenkins build process only takes about 30 seconds. Afterward, the output from a successful Jenkins build should show that all three Consul server instances are running, have formed a quorum, and have elected a Leader.

Jenkins_05.png

Persisting State

The Consul Docker image exposes VOLUME /consul/data, which is a path were Consul will place its persisted state. Using Terraform’s remote-exec provisioner, we create a directory on each EC2 instance, at /home/ubuntu/consul/config. The docker run command bind-mounts the container’s /consul/data path to the EC2 host’s /home/ubuntu/consul/config directory.

According to Consul, the Consul server container instance will ‘store the client information plus snapshots and data related to the consensus algorithm and other state, like Consul’s key/value store and catalog’ in the /consul/data directory. That container directory is now bind-mounted to the EC2 host, as demonstrated below.

jenkins_15

Accessing Consul

Following a successful deployment, you should be able to use the public URL, displayed in the build output of the ‘Deploy Consul Cluster AWS’ project, to access the Consul UI. Clicking on the Nodes tab in the UI, you should see all three Consul server instances, one per EC2 instance, running and healthy.

Consul UI

Destroying Infrastructure

When you are finished with the post, you may want to remove the running infrastructure, so you don’t continue to get billed by Amazon. The ‘Destroy Consul Infra AWS’ project destroys all the AWS infrastructure, provisioned as part of this post, in about 60 seconds. The project’s SCM and Bindings tasks are identical to the both previous projects. The Build step calls the destroy_infra.sh script, which is included in the GitHub project. The script executes the terraform destroy -force command. It will delete all running infrastructure components associated with the post and update Terraform’s remote state.

Jenkins_09

Conclusion

This post has demonstrated how modern DevOps tooling, such as HashiCorp’s Packer and Terraform, make it easy to build, provision and manage complex cloud architecture. Using a CI/CD server, such as Jenkins, to securely automate the use of these tools, ensures quick and consistent results.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , ,

5 Comments

Baking AWS AMI with new Docker CE Using Packer

AWS for Docker

Introduction

On March 2 (less than a week ago as of this post), Docker announced the release of Docker Enterprise Edition (EE), a new version of the Docker platform optimized for business-critical deployments. As part of the release, Docker also renamed the free Docker products to Docker Community Edition (CE). Both products are adopting a new time-based versioning scheme for both Docker EE and CE. The initial release of Docker CE and EE, the 17.03 release, is the first to use the new scheme.

Along with the release, Docker delivered excellent documentation on installing, configuring, and troubleshooting the new Docker EE and CE. In this post, I will demonstrate how to partially bake an existing Amazon Machine Image (Amazon AMI) with the new Docker CE, preparing it as a base for the creation of Amazon Elastic Compute Cloud (Amazon EC2) compute instances.

Adding Docker and similar tooling to an AMI is referred to as partially baking an AMI, often referred to as a hybrid AMI. According to AWS, ‘hybrid AMIs provide a subset of the software needed to produce a fully functional instance, falling in between the fully baked and JeOS (just enough operating system) options on the AMI design spectrum.

Installing Docker CE on an AWS AMI should not be confused with Docker’s also recently announced Docker Community Edition (CE) for AWS. Docker for AWS offers multiple CloudFormation templates for Docker EE and CE. According to Docker, Docker for AWS ‘provides a Docker-native solution that avoids operational complexity and adding unneeded additional APIs to the Docker stack.

Base AMI

Docker provides detailed directions for installing Docker CE and EE onto several major Linux distributions. For this post, we will choose a widely used Linux distro, Ubuntu. According to Docker, currently Docker CE and EE can be installed on three popular Ubuntu releases:

  • Yakkety 16.10
  • Xenial 16.04 (LTS)
  • Trusty 14.04 (LTS)

To provision a small EC2 instance in Amazon’s US East (N. Virginia) Region, I will choose Ubuntu 16.04.2 LTS Xenial Xerus . According to Canonical’s Amazon EC2 AMI Locator website, a Xenial 16.04 LTS AMI is available, ami-09b3691f, for US East 1, as a t2.micro EC2 instance type.

Packer

HashiCorp Packer will be used to partially bake the base Ubuntu Xenial 16.04 AMI with Docker CE 17.03. HashiCorp describes Packer as ‘a tool for creating machine and container images for multiple platforms from a single source configuration.’ The JSON-format Packer file is as follows:

{
  "variables": {
    "aws_access_key": "{{env `AWS_ACCESS_KEY_ID`}}",
    "aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}",
    "us_east_1_ami": "ami-09b3691f",
    "name": "aws-docker-ce-base",
    "us_east_1_name": "ubuntu-xenial-docker-ce-base",
    "ssh_username": "ubuntu"
  },
  "builders": [
    {
      "name": "{{user `us_east_1_name`}}",
      "type": "amazon-ebs",
      "access_key": "{{user `aws_access_key`}}",
      "secret_key": "{{user `aws_secret_key`}}",
      "region": "us-east-1",
      "vpc_id": "",
      "subnet_id": "",
      "source_ami": "{{user `us_east_1_ami`}}",
      "instance_type": "t2.micro",
      "ssh_username": "{{user `ssh_username`}}",
      "ssh_timeout": "10m",
      "ami_name": "{{user `us_east_1_name`}} {{timestamp}}",
      "ami_description": "{{user `us_east_1_name`}} AMI",
      "run_tags": {
        "ami-create": "{{user `us_east_1_name`}}"
      },
      "tags": {
        "ami": "{{user `us_east_1_name`}}"
      },
      "ssh_private_ip": false,
      "associate_public_ip_address": true
    }
  ],
  "provisioners": [
    {
      "type": "file",
      "source": "bootstrap_docker_ce.sh",
      "destination": "/tmp/bootstrap_docker_ce.sh"
    },
    {
          "type": "file",
          "source": "cleanup.sh",
          "destination": "/tmp/cleanup.sh"
    },
    {
      "type": "shell",
      "execute_command": "echo 'packer' | sudo -S sh -c '{{ .Vars }} {{ .Path }}'",
      "inline": [
        "whoami",
        "cd /tmp",
        "chmod +x bootstrap_docker_ce.sh",
        "chmod +x cleanup.sh",
        "ls -alh /tmp",
        "./bootstrap_docker_ce.sh",
        "sleep 10",
        "./cleanup.sh"
      ]
    }
  ]
}

The Packer file uses Packer’s amazon-ebs builder type. This builder is used to create Amazon AMIs backed by Amazon Elastic Block Store (EBS) volumes, for use in EC2.

Bootstrap Script

To install Docker CE on the AMI, the Packer file executes a bootstrap shell script. The bootstrap script and subsequent cleanup script are executed using  Packer’s remote shell provisioner. The bootstrap is like the following:

#!/bin/sh

sudo apt-get remove docker docker-engine

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get install -y docker-ce

sudo groupadd docker
sudo usermod -aG docker ubuntu

sudo systemctl enable docker

This script closely follows directions provided by Docker, for installing Docker CE on Ubuntu. After removing any previous copies of Docker, the script installs Docker CE. To ensure sudo is not required to execute Docker commands on any EC2 instance provisioned from resulting AMI, the script adds the ubuntu user to the docker group.

The bootstrap script also uses systemd to start the Docker daemon. Starting with Ubuntu 15.04, Systemd System and Service Manager is used by default instead of the previous init system, Upstart. Systemd ensures Docker will start on boot.

Cleaning Up

It is best good practice to clean up your activities after baking an AMI. I have included a basic clean up script. The cleanup script is as follows:

#!/bin/sh

set -e

echo 'Cleaning up after bootstrapping...'
sudo apt-get -y autoremove
sudo apt-get -y clean
sudo rm -rf /tmp/*
cat /dev/null > ~/.bash_history
history -c
exit

Partially Baking

Before running Packer to build the Docker CE AMI, I set both my AWS access key and AWS secret access key. The Packer file expects the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Running the packer build ubuntu_docker_ce_ami.json command builds the AMI. The abridged output should look similar to the following:

$ packer build docker_ami.json
ubuntu-xenial-docker-ce-base output will be in this color.

==> ubuntu-xenial-docker-ce-base: Prevalidating AMI Name...
    ubuntu-xenial-docker-ce-base: Found Image ID: ami-09b3691f
==> ubuntu-xenial-docker-ce-base: Creating temporary keypair: packer_58bc7a49-9e66-7f76-ce8e-391a67d94987
==> ubuntu-xenial-docker-ce-base: Creating temporary security group for this instance...
==> ubuntu-xenial-docker-ce-base: Authorizing access to port 22 the temporary security group...
==> ubuntu-xenial-docker-ce-base: Launching a source AWS instance...
    ubuntu-xenial-docker-ce-base: Instance ID: i-0ca883ecba0c28baf
==> ubuntu-xenial-docker-ce-base: Waiting for instance (i-0ca883ecba0c28baf) to become ready...
==> ubuntu-xenial-docker-ce-base: Adding tags to source instance
==> ubuntu-xenial-docker-ce-base: Waiting for SSH to become available...
==> ubuntu-xenial-docker-ce-base: Connected to SSH!
==> ubuntu-xenial-docker-ce-base: Uploading bootstrap_docker_ce.sh => /tmp/bootstrap_docker_ce.sh
==> ubuntu-xenial-docker-ce-base: Uploading cleanup.sh => /tmp/cleanup.sh
==> ubuntu-xenial-docker-ce-base: Provisioning with shell script: /var/folders/kf/637b0qns7xb0wh9p8c4q0r_40000gn/T/packer-shell189662158
    ...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: E: Unable to locate package docker-engine
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: ca-certificates is already the newest version (20160104ubuntu1).
    ubuntu-xenial-docker-ce-base: apt-transport-https is already the newest version (1.2.19).
    ubuntu-xenial-docker-ce-base: curl is already the newest version (7.47.0-1ubuntu2.2).
    ubuntu-xenial-docker-ce-base: software-properties-common is already the newest version (0.96.20.5).
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: OK
    ubuntu-xenial-docker-ce-base: pub   4096R/0EBFCD88 2017-02-22
    ubuntu-xenial-docker-ce-base: Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
    ubuntu-xenial-docker-ce-base: uid                  Docker Release (CE deb) 
    ubuntu-xenial-docker-ce-base: sub   4096R/F273FCD8 2017-02-22
    ubuntu-xenial-docker-ce-base:
    ubuntu-xenial-docker-ce-base: Hit:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial InRelease
    ubuntu-xenial-docker-ce-base: Get:2 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
    ...
    ubuntu-xenial-docker-ce-base: Get:27 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [89.5 kB]
    ubuntu-xenial-docker-ce-base: Fetched 10.6 MB in 2s (4,065 kB/s)
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: Calculating upgrade...
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: The following additional packages will be installed:
    ubuntu-xenial-docker-ce-base: aufs-tools cgroupfs-mount libltdl7
    ubuntu-xenial-docker-ce-base: Suggested packages:
    ubuntu-xenial-docker-ce-base: mountall
    ubuntu-xenial-docker-ce-base: The following NEW packages will be installed:
    ubuntu-xenial-docker-ce-base: aufs-tools cgroupfs-mount docker-ce libltdl7
    ubuntu-xenial-docker-ce-base: 0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: Need to get 19.4 MB of archives.
    ubuntu-xenial-docker-ce-base: After this operation, 89.4 MB of additional disk space will be used.
    ubuntu-xenial-docker-ce-base: Get:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/universe amd64 aufs-tools amd64 1:3.2+20130722-1.1ubuntu1 [92.9 kB]
    ...
    ubuntu-xenial-docker-ce-base: Get:4 https://download.docker.com/linux/ubuntu xenial/stable amd64 docker-ce amd64 17.03.0~ce-0~ubuntu-xenial [19.3 MB]
    ubuntu-xenial-docker-ce-base: debconf: unable to initialize frontend: Dialog
    ubuntu-xenial-docker-ce-base: debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
    ubuntu-xenial-docker-ce-base: debconf: falling back to frontend: Readline
    ubuntu-xenial-docker-ce-base: debconf: unable to initialize frontend: Readline
    ubuntu-xenial-docker-ce-base: debconf: (This frontend requires a controlling tty.)
    ubuntu-xenial-docker-ce-base: debconf: falling back to frontend: Teletype
    ubuntu-xenial-docker-ce-base: dpkg-preconfigure: unable to re-open stdin:
    ubuntu-xenial-docker-ce-base: Fetched 19.4 MB in 1s (17.8 MB/s)
    ubuntu-xenial-docker-ce-base: Selecting previously unselected package aufs-tools.
    ubuntu-xenial-docker-ce-base: (Reading database ... 53844 files and directories currently installed.)
    ubuntu-xenial-docker-ce-base: Preparing to unpack .../aufs-tools_1%3a3.2+20130722-1.1ubuntu1_amd64.deb ...
    ubuntu-xenial-docker-ce-base: Unpacking aufs-tools (1:3.2+20130722-1.1ubuntu1) ...
    ...
    ubuntu-xenial-docker-ce-base: Setting up docker-ce (17.03.0~ce-0~ubuntu-xenial) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for libc-bin (2.23-0ubuntu5) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for systemd (229-4ubuntu16) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for ureadahead (0.100.0-19) ...
    ubuntu-xenial-docker-ce-base: groupadd: group 'docker' already exists
    ubuntu-xenial-docker-ce-base: Synchronizing state of docker.service with SysV init with /lib/systemd/systemd-sysv-install...
    ubuntu-xenial-docker-ce-base: Executing /lib/systemd/systemd-sysv-install enable docker
    ubuntu-xenial-docker-ce-base: Cleanup...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
==> ubuntu-xenial-docker-ce-base: Stopping the source instance...
==> ubuntu-xenial-docker-ce-base: Waiting for the instance to stop...
==> ubuntu-xenial-docker-ce-base: Creating the AMI: ubuntu-xenial-docker-ce-base 1288227081
    ubuntu-xenial-docker-ce-base: AMI: ami-e9ca6eff
==> ubuntu-xenial-docker-ce-base: Waiting for AMI to become ready...
==> ubuntu-xenial-docker-ce-base: Modifying attributes on AMI (ami-e9ca6eff)...
    ubuntu-xenial-docker-ce-base: Modifying: description
==> ubuntu-xenial-docker-ce-base: Modifying attributes on snapshot (snap-058a26c0250ee3217)...
==> ubuntu-xenial-docker-ce-base: Adding tags to AMI (ami-e9ca6eff)...
==> ubuntu-xenial-docker-ce-base: Tagging snapshot: snap-043a16c0154ee3217
==> ubuntu-xenial-docker-ce-base: Creating AMI tags
==> ubuntu-xenial-docker-ce-base: Creating snapshot tags
==> ubuntu-xenial-docker-ce-base: Terminating the source AWS instance...
==> ubuntu-xenial-docker-ce-base: Cleaning up any extra volumes...
==> ubuntu-xenial-docker-ce-base: No volumes to clean up, skipping
==> ubuntu-xenial-docker-ce-base: Deleting temporary security group...
==> ubuntu-xenial-docker-ce-base: Deleting temporary keypair...
Build 'ubuntu-xenial-docker-ce-base' finished.

==> Builds finished. The artifacts of successful builds are:
--> ubuntu-xenial-docker-ce-base: AMIs were created:

us-east-1: ami-e9ca6eff

Results

The result is an Ubuntu 16.04 AMI in US East 1 with Docker CE 17.03 installed. To confirm the new AMI is now available, I will use the AWS CLI to examine the resulting AMI:

aws ec2 describe-images \
  --filters Name=tag-key,Values=ami Name=tag-value,Values=ubuntu-xenial-docker-ce-base \
  --query 'Images[*].{ID:ImageId}'

Resulting output:

{
    "Images": [
        {
            "VirtualizationType": "hvm",
            "Name": "ubuntu-xenial-docker-ce-base 1488747081",
            "Tags": [
                {
                    "Value": "ubuntu-xenial-docker-ce-base",
                    "Key": "ami"
                }
            ],
            "Hypervisor": "xen",
            "SriovNetSupport": "simple",
            "ImageId": "ami-e9ca6eff",
            "State": "available",
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/sda1",
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "SnapshotId": "snap-048a16c0250ee3227",
                        "VolumeSize": 8,
                        "VolumeType": "gp2",
                        "Encrypted": false
                    }
                },
                {
                    "DeviceName": "/dev/sdb",
                    "VirtualName": "ephemeral0"
                },
                {
                    "DeviceName": "/dev/sdc",
                    "VirtualName": "ephemeral1"
                }
            ],
            "Architecture": "x86_64",
            "ImageLocation": "931066906971/ubuntu-xenial-docker-ce-base 1488747081",
            "RootDeviceType": "ebs",
            "OwnerId": "931066906971",
            "RootDeviceName": "/dev/sda1",
            "CreationDate": "2017-03-05T20:53:41.000Z",
            "Public": false,
            "ImageType": "machine",
            "Description": "ubuntu-xenial-docker-ce-base AMI"
        }
    ]
}

Finally, here is the new AMI as seen in the AWS EC2 Management Console:

EC2 Management Console - AMI

Terraform

To confirm Docker CE is installed and running, I can provision a new EC2 instance, using HashiCorp Terraform. This post is too short to detail all the Terraform code required to stand up a complete environment. I’ve included the complete code in the GitHub repo for this post. Not, the Terraform code is only used to testing. No security, including the use of a properly configured security groups, public/private subnets, and a NAT server, is configured.

Below is a greatly abridged version of the Terraform code I used to provision a new EC2 instance, using Terraform’s aws_instance resource. The resulting EC2 instance should have Docker CE available.

# test-docker-ce instance
resource "aws_instance" "test-docker-ce" {
  connection {
    user        = "ubuntu"
    private_key = "${file("~/.ssh/test-docker-ce")}"
    timeout     = "${connection_timeout}"
  }

  ami               = "ami-e9ca6eff"
  instance_type     = "t2.nano"
  availability_zone = "us-east-1a"
  count             = "1"

  key_name               = "${aws_key_pair.auth.id}"
  vpc_security_group_ids = ["${aws_security_group.test-docker-ce.id}"]
  subnet_id              = "${aws_subnet.test-docker-ce.id}"

  tags {
    Owner       = "Gary A. Stafford"
    Terraform   = true
    Environment = "test-docker-ce"
    Name        = "tf-instance-test-docker-ce"
  }
}

By using the AWS CLI, once again, we can confirm the new EC2 instance was built using the correct AMI:

aws ec2 describe-instances \
  --filters Name='tag:Name,Values=tf-instance-test-docker-ce' \
  --output text --query 'Reservations[*].Instances[*].ImageId'

Resulting output looks good:

ami-e9ca6eff

Finally, here is the new EC2 as seen in the AWS EC2 Management Console:

EC2 Management Console - EC2

SSHing into the new EC2 instance, I should observe that the operating system is Ubuntu 16.04.2 LTS and that Docker version 17.03.0-ce is installed and running:

Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-64-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.

Last login: Sun Mar  5 22:06:01 2017 from 

ubuntu@ip-:~$ docker --version
Docker version 17.03.0-ce, build 3a232c8

Conclusion

Docker EE and CE represent a significant step forward in expanding Docker’s enterprise-grade toolkit. Replacing or installing Docker EE or CE on your AWS AMIs is easy, using Docker’s guide along with HashiCorp Packer.

All source code for this post can be found on GitHub.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , , ,

2 Comments