Archive for category Build Automation

Managing Applications Across Multiple Kubernetes Environments with Istio: Part 2

In this two-part post, we are exploring the creation of a GKE cluster, replete with the latest version of Istio, often referred to as IoK (Istio on Kubernetes). We will then deploy, perform integration testing, and promote an application across multiple environments within the cluster.

Part Two

In part one of this post, we created a Kubernetes cluster on the Google Cloud Platform, installed Istio, provisioned a PostgreSQL database, and configured DNS for routing. Under the assumption that v1 of the election microservice had already been released to Production, we deployed v1 to each of the three namespaces.

In part two of this post, we will learn how to utilize the advanced API testing capabilities of Postman and Newman to ensure v2 is ready for UAT and release to Production. We will deploy and perform integration testing of a new v2 of the election microservice, locally on Kubernetes Minikube. Once confident v2 is functioning as intended, we will promote and test v2 across the dev, test, and uat namespaces.

Testing Locally with Minikube

Deploying to GKE, no matter how automated, takes time and resources, whether those resources are team members or just compute and system resources. Before deploying v2 of the election service to the non-prod GKE cluster, we should ensure that it has been thoroughly tested locally. Local testing should include the following criteria:

  1. Source code builds successfully;
  2. All unit-tests pass;
  3. A new Docker Image can be created from the build artifact;
  4. The Service can be deployed to Kubernetes (Minikube);
  5. The deployed instance can connect to the database and execute the Liquibase change sets;
  6. The deployed instance passes a minimal set of integration tests;

Minikube gives us the ability to quickly iterate and test an application, as well as the Kubernetes and Istio resources required for its operation, before promoting to GKE. These resources include the Kubernetes Namespace, Secret, Deployment, Service, Route Rule, and Istio Ingress. Since Minikube is just that, a miniature version of our GKE cluster, we should be able to have a nearly one-to-one parity between the Kubernetes resources we apply locally and those applied to GKE. This post assumes you have the latest version of Minikube installed, and are familiar with its operation.

This project includes a minikube sub-directory, containing all the Kubernetes resource files and scripts necessary to recreate the Minikube deployment example shown in this post. The three included scripts are designed to be easily adapted to a CI/CD DevOps workflow. You may need to modify the scripts to match your environment’s configuration. Note this Minikube-deployed version of the election service relies on the external Amazon RDS database instance.

Local Database Version

To eliminate AWS costs, I have included a second version of the minikube Kubernetes resource files, minikube_db_local This version deploys an ephemeral PostgreSQL database instance to Minikube, as opposed to relying on the external Amazon RDS instance. Be aware, the database does not have persistent storage or an Istio sidecar proxy. The alternate version is not covered in this post.

istio_100.png

Minikube Cluster

If you do not have a running Minikube cluster, create one with the minikube start command.

istio_081

Minikube allows you to use normal kubectl CLI commands to interact with the Minikube cluster. Using the kubectl get nodes command, we should see a single Minikube node running the latest Kubernetes v1.10.0.

istio_082

Istio on Minikube

Next, install Istio following Istio’s online installation instructions. A basic Istio installation on Minikube, without the additional add-ons, should only require a single Istio install script.

istio_083

If successful, you should observe a new istio-system namespace, containing the four main Istio components: istio-ca, istio-ingress, istio-mixer, and istio-pilot.

istio_084

Deploy v2 to Minikube

Next, create a Minikube Development environment, consisting of a dev Namespace, Istio Ingress, and Secret, using the part1-create-environment.sh script. Next, deploy v2 of the election service to thedev Namespace, along with an associated Route Rule, using the part2-deploy-v2.sh script. One v2 instance should be sufficient to satisfy the testing requirements.

istio_085

Access to v2 of the election service on Minikube is a bit different than with GKE. When routing external HTTP requests, there is no load balancer, no external public IP address, and no public DNS or subdomains. To access the single instance of v2 running on Minikube, we use the local IP address of the Minikube cluster, obtained with the minikube ip command. The access port required is the Node Port (nodePort) of the istio-ingress Service. The command is shown below (gist) and included in the part3-smoke-test.sh script.

The second part of our HTTP request routing is the same as with GKE, relying on an Istio Route Rules. The /v2/ sub-collection resource in the HTTP request URL is rewritten and routed to the v2 election Pod by the Route Rule. To confirm v2 of the election service is running and addressable, curl the /v2/actuator/health endpoint. Spring Actuator’s /health endpoint is frequently used at the end of a CI/CD server’s deployment pipeline to confirm success. The Spring Boot application can take a few minutes to fully start up and be responsive to requests, depending on the speed of your local machine.

istio_093.png

Using the Kubernetes Dashboard, we should see our deployment of the single election service Pod is running successfully in Minikube’s dev namespace.

istio_087

Once deployed, we run a battery of integration tests to confirm that the new v2 functionality is working as intended before deploying to GKE. In the next section of this post, we will explore the process creating and managing Postman Collections and Postman Environments, and how to automate those Collections of tests with Newman and Jenkins.

istio_088

Integration Testing

The typical reason an application is deployed to lower environments, prior to Production, is to perform application testing. Although definitions vary across organizations, testing commonly includes some or all of the following types: Integration Testing, Functional Testing, System Testing, Stress or Load Testing, Performance Testing, Security Testing, Usability Testing, Acceptance Testing, Regression Testing, Alpha and Beta Testing, and End-to-End Testing. Test teams may also refer to other testing forms, such as Whitebox (Glassbox), Blackbox Testing, Smoke, Validation, or Sanity Testing, and Happy Path Testing.

The site, softwaretestinghelp.com, defines integration testing as, ‘testing of all integrated modules to verify the combined functionality after integration is termed so. Modules are typically code modules, individual applications, client and server applications on a network, etc. This type of testing is especially relevant to client/server and distributed systems.

In this post, we are concerned that our integrated modules are functioning cohesively, primarily the election service, Amazon RDS database, DNS, Istio Ingress, Route Rules, and the Istio sidecar Proxy. Unlike Unit Testing and Static Code Analysis (SCA), which is done pre-deployment, integration testing requires an application to be deployed and running in an environment.

Postman

I have chosen Postman, along with Newman, to execute a Collection of integration tests before promoting to the next environment. The integration tests confirm the deployed application’s name and version. The integration tests then perform a series of HTTP GET, POST, PUT, PATCH, and DELETE actions against the service’s resources. The integration tests verify a successful HTTP response code is returned, based on the type of request made.

istio_055

Postman tests are written in JavaScript, similar to other popular, modern testing frameworks. Postman offers advanced features such as test-chaining. Tests can be chained together through the use of environment variables to store response values and pass them onto to other tests. Values shared between tests are also stored in the Postman Environments. Below, we store the ID of the new candidate, the result of an HTTP POST to the /candidates endpoint. We then use the stored candidate ID in proceeding HTTP GET, PUT, and PATCH test requests to the same /candidates endpoint.

istio_056

Environment-specific variables, such as the resource host, port, and environment sub-collection resource, are abstracted and stored as key/value pairs within Postman Environments, and called through variables in the request URL and within the tests. Thus, the same Postman Collection of tests may be run against multiple environments using different Postman Environments.

istio_057

Postman Runner allows us to run multiple iterations of our Collection. We also have the option to build in delays between tests. Lastly, Postman Runner can load external JSON and CSV formatted test data, which is beyond the scope of this post.

istio_058

Postman contains a simple Run Summary UI for viewing test results.

istio_060

Test Automation

To support running tests from the command line, Postman provides Newman. According to Postman, Newman is a command-line collection runner for Postman. Newman offers the same functionality as Postman’s Collection Runner, all part of the newman CLI. Newman is Node.js module, installed globally as an npm package, npm install newman --global.

Typically, Development and Testing teams compose Postman Collections and define Postman Environments, locally. Teams run their tests locally in Postman, during their development cycle. Then, those same Postman Collections are executed from the command line, or more commonly as part of a CI/CD pipeline, such as with Jenkins.

Below, the same Collection of integration tests ran in the Postman Runner UI, are run from the command line, using Newman.

istio_061

Jenkins

Below, the same Postman Collection of integration tests are executed as part of a CI/CD pipeline in Jenkins Blue Ocean.

istio_101B.png

In this pipeline, a set of smoke tests is run first to ensure the new deployment is running properly, and then the integration tests are executed.

istio_102

In this simple example, we have a three-stage CI/CD pipeline created from a Jenkinsfile (gist).

Test Results

Newman offers several options for displaying test results. For easy integration with Jenkins, Newman results can be delivered in a format that can be displayed as JUnit test reports. The JUnit test report format, XML, is a popular method of standardizing test results from different testing tools. Below is a truncated example of a test report file (gist).

Translating Newman test results to JUnit reports allows the percentage of test cases successfully executed, to be tracked over multiple deployments, a universal testing metric. Below we see the JUnit Test Reports Test Result Trend graph for a series of test runs.

istio_103

Deploying to Development

Development environments typically have a rapid turnover of application versions. Many teams use their Development environment as a Continuous Integration (CI) environment, where every commit that successfully builds and passes all unit tests, is deployed. The purpose of the deployment is to ensure build artifacts will successfully deploy through the CI/CD pipeline, start properly, and pass a basic set of smoke tests.

Other teams use their Development environments as an extension of their local environment. The Development environment will possess some or all of the required external integration points, which the Developer’s local environment may not.

External integration points, such as payment gateways, customer relationship management (CRM) systems, content management systems (CMS), or data analytics engines, are often stubbed-out in lower environments. Generally, third-party providers only offer a limited number of parallel non-Production integration environments. While an application may pass through several non-prod environments, testing against all external integration points will only occur in one or two of those environments.

The goal of the Development environment is to help Developers ensure their application is functioning correctly and is ready for the Test teams to evaluate.

With v2 of the election service ready for testing on GKE, we deploy it to the GKE cluster’s dev namespace using the part4a-deploy-v2-dev.sh script. We will also delete the previous v1 version of the election service. Similar to the v1 deployment script, the v2 scripts perform a kube-inject command, which manually injects the Istio sidecar proxy alongside the election service, into each election v2 Pod. The deployment script also deploys an alternate Istio Route Rule, which routes requests to api.dev.voter-demo.com/v2/* resource of v2 of the election service.

istio_054.png

Once deployed, we run our Postman Collection of integration tests with Newman or as part of a CI/CD pipeline. In the Development environment, we may choose to run a limited set of tests for the sake of expediency, or because not all external integration points are accessible.

Promotion to Test

With local Minikube and Development environment testing complete, we promote and deploy v2 of the election service to the Test environment, using the part4b-deploy-v2-test.sh script. In Test, we will not delete v1 of the election service.

istio_062

Often, an organization will maintain a running copy of all versions of an application currently deployed to Production, in a lower environment. Let’s look at two scenarios where this is common. First, v1 of the election service has an issue in Production, which needs to be confirmed and may require a hot-fix by the Development team. Validation of the v1 Production bug is often done in a lower environment. The second scenario for having both versions running in an environment is when v1 and v2 both need to co-exist in Production. Organizations frequently support multiple API versions. Cutting over an entire API user-base to a new API version is often completed over a series of releases, and requires careful coordination with API consumers.

Testing All Versions

An essential role of integration testing should be to confirm that both versions of the election service are functioning correctly, while simultaneously running in the same namespace. For example, we want to verify traffic is routed correctly, based on the HTTP request URL, to the correct version. Another common test scenario is database schema changes. Suppose we make what we believe are backward-compatible database changes to v2 of the election service. We should be able to prove, through testing, that both the old and new versions function correctly against the latest version of the database schema.

There are a few Postman-Newman-Jenkins automation strategies that could employ to test multiple versions of an application without creating separate Collections and Environments. A simple solution would be to programmatically change the Postman Environment’s version variable injected from a pipeline parameter (abridged environment file shown below).

istio_095.png

Once initial automated integration testing is complete, Test teams will typically execute additional forms of application testing if necessary, before signing off for UAT and Performance Testing to begin.

User-Acceptance Testing

With testing in the Test environments completed, we continue onto UAT. The term UAT suggest that a set of actual end-users (API consumers) of the election service will perform their own testing. Frequently, UAT is only done for a short, fixed period of time, often with a specialized team of Testers. Issues experienced during UAT can be expensive and impact the ability to release an application to Production on-time if sign-off is delayed.

After deploying v2 of the election service to UAT, and before opening it up to the UAT team, we would naturally want to repeat the same integration testing process we conducted in the previous Test environment. We must ensure that v2 is functioning as expected before our end-users begin their testing. This is where leveraging a tool like Jenkins makes automated integration testing more manageable and repeatable. One strategy would be to duplicate our existing Development and Test pipelines, and re-target the new pipeline to call v2 of the election service in UAT.

istio_104.png

Again, in a JUnit report format, we can examine individual results through the Jenkins Console.

istio_105.png

We can also examine individual results from each test run using a specific build’s Console Output.

istio_106.png

Testing and Instrumentation

To fully evaluate the integration test results, you must look beyond just the percentage of test cases executed successfully. It makes little sense to release a new version of an application if it passes all functional tests, but significantly increases client response times, unnecessarily increases memory consumption or wastes other compute resources, or is grossly inefficient in the number of calls it makes to the database or third-party dependencies. Often times, integration testing uncovers potential performance bottlenecks that are incorporated into performance test plans.

Critical intelligence about the performance of the application can only be obtained through the use of logging and metrics collection and instrumentation. Istio provides this telemetry out-of-the-box with Zipkin, Jaeger, Service Graph, Fluentd, Prometheus, and Grafana. In the included Grafana Istio Dashboard below, we see the performance of v1 of the election service, under test, in the Test environment. We can compare request and response payload size and timing, as well as request and response times to external integration points, such as our Amazon RDS database. We are able to observe the impact of individual test requests on the application and all its integration points.

istio_067

As part of integration testing, we should monitor the Amazon RDS CloudWatch metrics. CloudWatch allows us to evaluate critical database performance metrics, such as the number of concurrent database connections, CPU utilization, read and write IOPS, Memory consumption, and disk storage requirements.

istio_043

A discussion of metrics starts moving us toward load and performance testing against Production service-level agreements (SLAs). Using a similar approach to integration testing, with load and performance testing, we should be able to accurately estimate the sizing requirements our new application for Production. Load and Performance Testing helps answer questions like the type and size of compute resources are required for our GKE Production cluster and for our Amazon RDS database, or how many compute nodes and number of instances (Pods) are necessary to support the expected user-load.

All opinions expressed in this post are my own, and not necessarily the views of my current or past employers, or their clients.

, , , , , , , , , , , , , ,

Leave a comment

Managing Applications Across Multiple Kubernetes Environments with Istio: Part 1

In the following two-part post, we will explore the creation of a GKE cluster, replete with the latest version of Istio, often referred to as IoK (Istio on Kubernetes). We will then deploy, perform integration testing, and promote an application across multiple environments within the cluster.

Application Environment Management

Container orchestration engines, such as Kubernetes, have revolutionized the deployment and management of microservice-based architectures. Combined with a Service Mesh, such as Istio, Kubernetes provides a secure, instrumented, enterprise-grade platform for modern, distributed applications.

One of many challenges with any platform, even one built on Kubernetes, is managing multiple application environments. Whether applications run on bare-metal, virtual machines, or within containers, deploying to and managing multiple application environments increases operational complexity.

As Agile software development practices continue to increase within organizations, the need for multiple, ephemeral, on-demand environments also grows. Traditional environments that were once only composed of Development, Test, and Production, have expanded in enterprises to include a dozen or more environments, to support the many stages of the modern software development lifecycle. Current application environments often include Continous Integration and Delivery (CI), Sandbox, Development, Integration Testing (QA), User Acceptance Testing (UAT), Staging, Performance, Production, Disaster Recovery (DR), and Hotfix. Each environment requiring its own compute, security, networking, configuration, and corresponding dependencies, such as databases and message queues.

Environments and Kubernetes

There are various infrastructure architectural patterns employed by Operations and DevOps teams to provide Kubernetes-based application environments to Development teams. One pattern consists of separate physical Kubernetes clusters. Separate clusters provide a high level of isolation. Isolation offers many advantages, including increased performance and security, the ability to tune each cluster’s compute resources to meet differing SLAs, and ensuring a reduced blast radius when things go terribly wrong. Conversely, separate clusters often result in increased infrastructure costs and operational overhead, and complex deployment strategies. This pattern is often seen in heavily regulated, compliance-driven organizations, where security, auditability, and separation of duties are paramount.

Kube Clusters Diagram F15

Namespaces

An alternative to separate physical Kubernetes clusters is virtual clusters. Virtual clusters are created using Kubernetes Namespaces. According to Kubernetes documentation, ‘Kubernetes supports multiple virtual clusters backed by the same physical cluster. These virtual clusters are called namespaces’.

In most enterprises, Operations and DevOps teams deliver a combination of both virtual and physical Kubernetes clusters. For example, lower environments, such as those used for Development, Test, and UAT, often reside on the same physical cluster, each in a separate virtual cluster (namespace). At the same time, environments such as Performance, Staging, Production, and DR, often require the level of isolation only achievable with physical Kubernetes clusters.

In the Cloud, physical clusters may be further isolated and secured using separate cloud accounts. For example, with AWS you might have a Non-Production AWS account and a Production AWS account, both managed by an AWS Organization.

Kube Clusters Diagram v2 F3

In a multi-environment scenario, a single physical cluster would contain multiple namespaces, into which separate versions of an application or applications are independently deployed, accessed, and tested. Below we see a simple example of a single Kubernetes non-prod cluster on the left, containing multiple versions of different microservices, deployed across three namespaces. You would likely see this type of deployment pattern as applications are deployed, tested, and promoted across lower environments, before being released to Production.

Kube Clusters Diagram v2 F5.png

Example Application

To demonstrate the promotion and testing of an application across multiple environments, we will use a simple election-themed microservice, developed for a previous post, Developing Cloud-Native Data-Centric Spring Boot Applications for Pivotal Cloud Foundry. The Spring Boot-based application allows API consumers to create, read, update, and delete, candidates, elections, and votes, through an exposed set of resources, accessed via RESTful endpoints.

Source Code

All source code for this post can be found on GitHub. The project’s README file contains a list of the election microservice’s endpoints. To get started quickly, use one of the two following options (gist).

Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.

This project includes a kubernetes sub-directory, containing all the Kubernetes resource files and scripts necessary to recreate the example shown in the post. The scripts are designed to be easily adapted to a CI/CD DevOps workflow. You will need to modify the script’s variables to match your own environment’s configuration.

istio_107small

Database

The post’s Spring Boot application relies on a PostgreSQL database. In the previous post, ElephantSQL was used to host the PostgreSQL instance. This time, I have used Amazon RDS for PostgreSQL. Amazon RDS for PostgreSQL and ElephantSQL are equivalent choices. For simplicity, you might also consider a containerized version of PostgreSQL, managed as part of your Kubernetes environment.

Ideally, each environment should have a separate database instance. Separate database instances provide better isolation, fine-grained RBAC, easier test data lifecycle management, and improved performance. Although, for this post, I suggest a single, shared, minimally-sized RDS instance.

The PostgreSQL database’s sensitive connection information, including database URL, username, and password, are stored as Kubernetes Secrets, one secret for each namespace, and accessed by the Kubernetes Deployment controllers.

istio_043.png

Istio

Although not required, Istio makes the task of managing multiple virtual and physical clusters significantly easier. Following Istio’s online installation instructions, download and install Istio 0.7.1.

To create a Google Kubernetes Engine (GKE) cluster with Istio, you could use gcloud CLI’s container clusters create command, followed by installing Istio manually using Istio’s supplied Kubernetes resource files. This was the method used in the previous post, Deploying and Configuring Istio on Google Kubernetes Engine (GKE).

Alternatively, you could use Istio’s Google Cloud Platform (GCP) Deployment Manager files, along with the gcloud CLI’s deployment-manager deployments create command to create a Kubernetes cluster, replete with Istio, in a single step. Although arguably simpler, the deployment-manager method does not provide the same level of fine-grain control over cluster configuration as the container clusters create method. For this post, the deployment-manager method will suffice.

istio_001

The latest version of the Google Kubernetes Engine, available at the time of this post, is 1.9.6-gke.0. However, to install this version of Kubernetes Engine using the Istio’s supplied deployment Manager Jinja template requires updating the hardcoded value in the istio-cluster.jinja file from 1.9.2-gke.1. This has been updated in the next release of Istio.

istio_002

Another change, the latest version of Istio offered as an option in the istio-cluster-jinja.schema file. Specifically, the installIstioRelease configuration variable, is only 0.6.0. The template does not include 0.7.1 as an option. Modify the istio-cluster-jinja.schema file to include the choice of 0.7.1. Optionally, I also set 0.7.1 as the default. This change should also be included in the next version of Istio.

istio_075.png

There are a limited number of GKE and Istio configuration defaults defined in the istio-cluster.yaml file, all of which can be overridden from the command line.

istio_002B.png

To optimize the cluster, and keep compute costs to a minimum, I have overridden several of the default configuration values using the properties flag with the gcloud CLI’s deployment-manager deployments create command. The README file provided by Istio explains how to use this feature. Configuration changes include the name of the cluster, the version of Istio (0.7.1), the number of nodes (2), the GCP zone (us-east1-b), and the node instance type (n1-standard-1). I also disabled automatic sidecar injection and chose not to install the Istio sample book application onto the cluster (gist).

Cluster Provisioning

To provision the GKE cluster and deploy Istio, first modify the variables in the part1-create-gke-cluster.sh file (shown above), then execute the script. The script also retrieves your cluster’s credentials, to enable command line interaction with the cluster using the kubectl CLI.

istio_002C.png

Once complete, validate the version of Istio by examining Istio’s Docker image versions, using the following command (gist).

The result should be a list of Istio 0.7.1 Docker images.

istio_076.png

The new cluster should be running GKE version 1.9.6.gke.0. This can be confirmed using the following command (gist).

Or, from the GCP Cloud Console.

istio_037

The new GKE cluster should be composed of (2) n1-standard-1 nodes, running in the us-east-1b zone.

istio_038

As part of the deployment, all of the separate Istio components should be running within the istio-system namespace.

istio_040

As part of the deployment, an external IP address and a load balancer were provisioned by GCP and associated with the Istio Ingress. GCP’s Deployment Manager should have also created the necessary firewall rules for cluster ingress and egress.

istio_010.png

Building the Environments

Next, we will create three namespaces,dev, test, and uat, which represent three non-production environments. Each environment consists of a Kubernetes Namespace, Istio Ingress, and Secret. The three environments are deployed using the part2-create-environments.sh script.

istio_048.png

Deploying v1

For this demonstration, we will assume v1 of the election service has been previously promoted, tested, and released to Production. Hence, we would expect v1 to be deployed to each of the lower environments. Additionally, a new v2 of the election service has been developed and tested locally using Minikube. It is ready for deployment to the three environments and will undergo integration testing (detailed in part 2 of the post).

If you recall from our GKE/Istio configuration, we chose manual sidecar injection of the Istio proxy. Therefore, all election deployment scripts perform a kube-inject command. To connect to our external Amazon RDS database, this kube-inject command requires the includeIPRanges flag, which contains two cluster configuration values, the cluster’s IPv4 CIDR (clusterIpv4Cidr) and the service’s IPv4 CIDR (servicesIpv4Cidr).

Before deployment, we export the includeIPRanges value as an environment variable, which will be used by the deployment scripts (gist).

Using this method with manual sidecar injection is discussed in the previous post, Deploying and Configuring Istio on Google Kubernetes Engine (GKE).

To deploy v1 of the election service to all three namespaces, execute the part3-deploy-v1-all-envs.sh script.

istio_051.png

We should now have two instances of v1 of the election service, running in the dev, test, and uat namespaces, for a total of six election-v1 Kubernetes Pods.

istio_052

HTTP Request Routing

Before deploying additional versions of the election service in Part 2 of this post, we should understand how external HTTP requests will be routed to different versions of the election service, in multiple namespaces. In the post’s simple example, we have matrix of three namespaces and two versions of the election service. That means we need a method to route external traffic to up to six different election versions. There multiple ways to solve this problem, each with their own pros and cons. For this post, I found a combination of DNS and HTTP request rewriting is most effective.

DNS

First, to route external HTTP requests to the correct namespace, we will use subdomains. Using my current DNS management solution, Azure DNS, I create three new A records for my registered domain, voter-demo.com. There is one A record for each namespace, including api.dev, api.test, and api.uat.

istio_077.png

All three subdomains should resolve to the single external IP address assigned to the cluster’s load balancer.

istio_010.png

As part of the environments creation, the script deployed an Istio Ingress, one to each environment. The ingress accepts traffic based on a match to the Request URL (gist).

The istio-ingress service load balancer, running in the istio-system namespace, routes inbound external traffic, based on the Request URL, to the Istio Ingress in the appropriate namespace.

istio_053.png

The Istio Ingress in the namespace then directs the traffic to one of the Kubernetes Pods, containing the election service and the Istio sidecar proxy.

istio_068.png

HTTP Rewrite

To direct the HTTP request to v1 or v2 of the election service, an Istio Route Rule is used. As part of the environment creation, along with a Namespace and Ingress resources, we also deployed an Istio Route Rule to each environment. This particular route rule examines the HTTP request URL for a /v1/ or /v2/ sub-collection resource. If it finds the sub-collection resource, it performs a HTTPRewrite, removing the sub-collection resource from the HTTP request. The Route Rule then directs the HTTP request to the appropriate version of the election service, v1 or v2 (gist).

According to Istio, ‘if there are multiple registered instances with the specified tag(s), they will be routed to based on the load balancing policy (algorithm) configured for the service (round-robin by default).’ We are using the default load balancing algorithm to distribute requests across multiple copies of each election service.

The final external HTTP request routing for the election service in the Non-Production GKE cluster is shown on the left, in the diagram, below.

Kube Clusters Diagram F14

Below are some examples of HTTP GET requests that would be successfully routed to our election service, using the above-described routing strategy (gist).

Part Two

In part one of this post, we created the Kubernetes cluster on the Google Cloud Platform, installed Istio, provisioned a PostgreSQL database, and configured DNS for routing. Under the assumption that v1 of the election microservice had already been released to Production, we deployed v1 to each of the three namespaces.

In part two of this post, we will learn how to utilize the sophisticated API testing capabilities of Postman and Newman to ensure v2 is ready for UAT and release to Production. We will deploy and perform integration testing of a new, v2 of the election microservice, locally, on Kubernetes Minikube. Once we are confident v2 is functioning as intended, we will promote and test v2, across the dev, test, and uat namespaces.

All opinions expressed in this post are my own, and not necessarily the views of my current or past employers, or their clients.

, , , , , , , , , , ,

Leave a comment

Provision and Deploy a Consul Cluster on AWS, using Terraform, Docker, and Jenkins

Cover2

Introduction

Modern DevOps tools, such as HashiCorp’s Packer and Terraform, make it easier to provision and manage complex cloud architecture. Utilizing a CI/CD server, such as Jenkins, to securely automate the use of these DevOps tools, ensures quick and consistent results.

In a recent post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, we built a Consul cluster using Docker swarm mode, to host distributed configurations for a Spring Boot application. The cluster was built locally with VirtualBox. This architecture is fine for development and testing, but not for use in Production.

In this post, we will deploy a highly available three-node Consul cluster to AWS. We will use Terraform to provision a set of EC2 instances and accompanying infrastructure. The instances will be built from a hybrid AMIs containing the new Docker Community Edition (CE). In a recent post, Baking AWS AMI with new Docker CE Using Packer, we provisioned an Ubuntu AMI with Docker CE, using Packer. We will deploy Docker containers to each EC2 host, containing an instance of Consul server.

All source code can be found on GitHub.

Jenkins

I have chosen Jenkins to automate all of the post’s build, provisioning, and deployment tasks. However, none of the code is written specific to Jenkins; you may run all of it from the command line.

For this post, I have built four projects in Jenkins, as follows:

  1. Provision Docker CE AMI: Builds Ubuntu AMI with Docker CE, using Packer
  2. Provision Consul Infra AWS: Provisions Consul infrastructure on AWS, using Terraform
  3. Deploy Consul Cluster AWS: Deploys Consul to AWS, using Docker
  4. Destroy Consul Infra AWS: Destroys Consul infrastructure on AWS, using Terraform

Jenkins UI

We will primarily be using the ‘Provision Consul Infra AWS’, ‘Deploy Consul Cluster AWS’, and ‘Destroy Consul Infra AWS’ Jenkins projects in this post. The fourth Jenkins project, ‘Provision Docker CE AMI’, automates the steps found in the recent post, Baking AWS AMI with new Docker CE Using Packer, to build the AMI used to provision the EC2 instances in this post.

Consul AWS Diagram 2

Terraform

Using Terraform, we will provision EC2 instances in three different Availability Zones within the US East 1 (N. Virginia) Region. Using Terraform’s Amazon Web Services (AWS) provider, we will create the following AWS resources:

  • (1) Virtual Private Cloud (VPC)
  • (1) Internet Gateway
  • (1) Key Pair
  • (3) Elastic Cloud Compute (EC2) Instances
  • (2) Security Groups
  • (3) Subnets
  • (1) Route
  • (3) Route Tables
  • (3) Route Table Associations

The final AWS architecture should resemble the following:

Consul AWS Diagram

Production Ready AWS

Although we have provisioned a fairly complete VPC for this post, it is far from being ready for Production. I have created two security groups, limiting the ingress and egress to the cluster. However, to further productionize the environment would require additional security hardening. At a minimum, you should consider adding public/private subnets, NAT gateways, network access control list rules (network ACLs), and the use of HTTPS for secure communications.

In production, applications would communicate with Consul through local Consul clients. Consul clients would take part in the LAN gossip pool from different subnets, Availability Zones, Regions, or VPCs using VPC peering. Communications would be tightly controlled by IAM, VPC, subnet, IP address, and port.

Also, you would not have direct access to the Consul UI through a publicly exposed IP or DNS address. Access to the UI would be removed altogether or locked down to specific IP addresses, and accessed restricted to secure communication channels.

Consul

We will achieve high availability (HA) by clustering three Consul server nodes across the three Elastic Cloud Compute (EC2) instances. In this minimally sized, three-node cluster of Consul servers, we are protected from the loss of one Consul server node, one EC2 instance, or one Availability Zone(AZ). The cluster will still maintain a quorum of two nodes. An additional level of HA that Consul supports, multiple datacenters (multiple AWS Regions), is not demonstrated in this post.

Docker

Having Docker CE already installed on each EC2 instance allows us to execute remote Docker commands over SSH from Jenkins. These commands will deploy and configure a Consul server node, within a Docker container, on each EC2 instance. The containers are built from HashiCorp’s latest Consul Docker image pulled from Docker Hub.

Getting Started

Preliminary Steps

If you have built infrastructure on AWS with Terraform, these steps should be familiar to you:

  1. First, you will need an AMI with Docker. I suggest reading Baking AWS AMI with new Docker CE Using Packer.
  2. You will need an AWS IAM User with the proper access to create the required infrastructure. For this post, I created a separate Jenkins IAM User with PowerUser level access.
  3. You will need to have an RSA public-private key pair, which can be used to SSH into the EC2 instances and install Consul.
  4. Ensure you have your AWS credentials set. I usually source mine from a .env file, as environment variables. Jenkins can securely manage credentials, using secret text or files.
  5. Fork and/or clone the Consul cluster project from  GitHub.
  6. Change the aws_key_name and public_key_path variable values to your own RSA key, in the variables.tf file
  7. Change the aws_amis_base variable values to your own AMI ID (see step 1)
  8. If you are do not want to use the US East 1 Region and its AZs, modify the variables.tf, network.tf, and instances.tf files.
  9. Disable Terraform’s remote state or modify the resource to match your remote state configuration, in the main.tf file. I am using an Amazon S3 bucket to store my Terraform remote state.

Building an AMI with Docker

If you have not built an Amazon Machine Image (AMI) for use in this post already, you can do so using the scripts provided in the previous post’s GitHub repository. To automate the AMI build task, I built the ‘Provision Docker CE AMI’ Jenkins project. Identical to the other three Jenkins projects in this post, this project has three main tasks, which include: 1) SCM: clone the Packer AMI GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Packer.

The SCM and Bindings tasks are identical to the other projects (see below for details), except for the use of a different GitHub repository. The project’s Build step, which runs the packer_build_ami.sh script looks as follows:

jenkins_13

The resulting AMI ID will need to be manually placed in Terraform’s variables.tf file, before provisioning the AWS infrastructure with Terraform. The new AMI ID will be displayed in Jenkin’s build output.

jenkins_14

Provisioning with Terraform

Based on the modifications you made in the Preliminary Steps, execute the terraform validate command to confirm your changes. Then, run the terraform plan command to review the plan. Assuming are were no errors, finally, run the terraform apply command to provision the AWS infrastructure components.

In Jenkins, I have created the ‘Provision Consul Infra AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Terraform. Those tasks look as follows:

Jenkins_08.png

You will obviously need to use your modified GitHub project, incorporating the configuration changes detailed above, as the SCM source for Jenkins.

Jenkins Credentials

You will also need to configure your AWS credentials.

Jenkins_03.png

The provision_infra.sh script provisions the AWS infrastructure using Terraform. The script also updates Terraform’s remote state. Remember to update the remote state configuration in the script to match your personal settings.

The Jenkins build output should look similar to the following:

jenkins_12.png

Although the build only takes about 90 seconds to complete, the EC2 instances could take a few extra minutes to complete their Status Checks and be completely ready. The final results in the AWS EC2 Management Console should look as follows:

EC2 Management Console

Note each EC2 instance is running in a different US East 1 Availability Zone.

Installing Consul

Once the AWS infrastructure is running and the EC2 instances have completed their Status Checks successfully, we are ready to deploy Consul. In Jenkins, I have created the ‘Deploy Consul Cluster AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run an SSH remote Docker command on each EC2 instance to deploy Consul. The SCM and Bindings tasks are identical to the project above. The project’s Build step looks as follows:

Jenkins_04.png

First, the delete_containers.sh script deletes any previous instances of Consul containers. This is helpful if you need to re-deploy Consul. Next, the deploy_consul.sh script executes a series of SSH remote Docker commands to install and configure Consul on each EC2 instance.

The entire Jenkins build process only takes about 30 seconds. Afterward, the output from a successful Jenkins build should show that all three Consul server instances are running, have formed a quorum, and have elected a Leader.

Jenkins_05.png

Persisting State

The Consul Docker image exposes VOLUME /consul/data, which is a path were Consul will place its persisted state. Using Terraform’s remote-exec provisioner, we create a directory on each EC2 instance, at /home/ubuntu/consul/config. The docker run command bind-mounts the container’s /consul/data path to the EC2 host’s /home/ubuntu/consul/config directory.

According to Consul, the Consul server container instance will ‘store the client information plus snapshots and data related to the consensus algorithm and other state, like Consul’s key/value store and catalog’ in the /consul/data directory. That container directory is now bind-mounted to the EC2 host, as demonstrated below.

jenkins_15

Accessing Consul

Following a successful deployment, you should be able to use the public URL, displayed in the build output of the ‘Deploy Consul Cluster AWS’ project, to access the Consul UI. Clicking on the Nodes tab in the UI, you should see all three Consul server instances, one per EC2 instance, running and healthy.

Consul UI

Destroying Infrastructure

When you are finished with the post, you may want to remove the running infrastructure, so you don’t continue to get billed by Amazon. The ‘Destroy Consul Infra AWS’ project destroys all the AWS infrastructure, provisioned as part of this post, in about 60 seconds. The project’s SCM and Bindings tasks are identical to the both previous projects. The Build step calls the destroy_infra.sh script, which is included in the GitHub project. The script executes the terraform destroy -force command. It will delete all running infrastructure components associated with the post and update Terraform’s remote state.

Jenkins_09

Conclusion

This post has demonstrated how modern DevOps tooling, such as HashiCorp’s Packer and Terraform, make it easy to build, provision and manage complex cloud architecture. Using a CI/CD server, such as Jenkins, to securely automate the use of these tools, ensures quick and consistent results.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , ,

3 Comments

Baking AWS AMI with new Docker CE Using Packer

AWS for Docker

Introduction

On March 2 (less than a week ago as of this post), Docker announced the release of Docker Enterprise Edition (EE), a new version of the Docker platform optimized for business-critical deployments. As part of the release, Docker also renamed the free Docker products to Docker Community Edition (CE). Both products are adopting a new time-based versioning scheme for both Docker EE and CE. The initial release of Docker CE and EE, the 17.03 release, is the first to use the new scheme.

Along with the release, Docker delivered excellent documentation on installing, configuring, and troubleshooting the new Docker EE and CE. In this post, I will demonstrate how to partially bake an existing Amazon Machine Image (Amazon AMI) with the new Docker CE, preparing it as a base for the creation of Amazon Elastic Compute Cloud (Amazon EC2) compute instances.

Adding Docker and similar tooling to an AMI is referred to as partially baking an AMI, often referred to as a hybrid AMI. According to AWS, ‘hybrid AMIs provide a subset of the software needed to produce a fully functional instance, falling in between the fully baked and JeOS (just enough operating system) options on the AMI design spectrum.

Installing Docker CE on an AWS AMI should not be confused with Docker’s also recently announced Docker Community Edition (CE) for AWS. Docker for AWS offers multiple CloudFormation templates for Docker EE and CE. According to Docker, Docker for AWS ‘provides a Docker-native solution that avoids operational complexity and adding unneeded additional APIs to the Docker stack.

Base AMI

Docker provides detailed directions for installing Docker CE and EE onto several major Linux distributions. For this post, we will choose a widely used Linux distro, Ubuntu. According to Docker, currently Docker CE and EE can be installed on three popular Ubuntu releases:

  • Yakkety 16.10
  • Xenial 16.04 (LTS)
  • Trusty 14.04 (LTS)

To provision a small EC2 instance in Amazon’s US East (N. Virginia) Region, I will choose Ubuntu 16.04.2 LTS Xenial Xerus . According to Canonical’s Amazon EC2 AMI Locator website, a Xenial 16.04 LTS AMI is available, ami-09b3691f, for US East 1, as a t2.micro EC2 instance type.

Packer

HashiCorp Packer will be used to partially bake the base Ubuntu Xenial 16.04 AMI with Docker CE 17.03. HashiCorp describes Packer as ‘a tool for creating machine and container images for multiple platforms from a single source configuration.’ The JSON-format Packer file is as follows:

{
  "variables": {
    "aws_access_key": "{{env `AWS_ACCESS_KEY_ID`}}",
    "aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}",
    "us_east_1_ami": "ami-09b3691f",
    "name": "aws-docker-ce-base",
    "us_east_1_name": "ubuntu-xenial-docker-ce-base",
    "ssh_username": "ubuntu"
  },
  "builders": [
    {
      "name": "{{user `us_east_1_name`}}",
      "type": "amazon-ebs",
      "access_key": "{{user `aws_access_key`}}",
      "secret_key": "{{user `aws_secret_key`}}",
      "region": "us-east-1",
      "vpc_id": "",
      "subnet_id": "",
      "source_ami": "{{user `us_east_1_ami`}}",
      "instance_type": "t2.micro",
      "ssh_username": "{{user `ssh_username`}}",
      "ssh_timeout": "10m",
      "ami_name": "{{user `us_east_1_name`}} {{timestamp}}",
      "ami_description": "{{user `us_east_1_name`}} AMI",
      "run_tags": {
        "ami-create": "{{user `us_east_1_name`}}"
      },
      "tags": {
        "ami": "{{user `us_east_1_name`}}"
      },
      "ssh_private_ip": false,
      "associate_public_ip_address": true
    }
  ],
  "provisioners": [
    {
      "type": "file",
      "source": "bootstrap_docker_ce.sh",
      "destination": "/tmp/bootstrap_docker_ce.sh"
    },
    {
          "type": "file",
          "source": "cleanup.sh",
          "destination": "/tmp/cleanup.sh"
    },
    {
      "type": "shell",
      "execute_command": "echo 'packer' | sudo -S sh -c '{{ .Vars }} {{ .Path }}'",
      "inline": [
        "whoami",
        "cd /tmp",
        "chmod +x bootstrap_docker_ce.sh",
        "chmod +x cleanup.sh",
        "ls -alh /tmp",
        "./bootstrap_docker_ce.sh",
        "sleep 10",
        "./cleanup.sh"
      ]
    }
  ]
}

The Packer file uses Packer’s amazon-ebs builder type. This builder is used to create Amazon AMIs backed by Amazon Elastic Block Store (EBS) volumes, for use in EC2.

Bootstrap Script

To install Docker CE on the AMI, the Packer file executes a bootstrap shell script. The bootstrap script and subsequent cleanup script are executed using  Packer’s remote shell provisioner. The bootstrap is like the following:

#!/bin/sh

sudo apt-get remove docker docker-engine

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get install -y docker-ce

sudo groupadd docker
sudo usermod -aG docker ubuntu

sudo systemctl enable docker

This script closely follows directions provided by Docker, for installing Docker CE on Ubuntu. After removing any previous copies of Docker, the script installs Docker CE. To ensure sudo is not required to execute Docker commands on any EC2 instance provisioned from resulting AMI, the script adds the ubuntu user to the docker group.

The bootstrap script also uses systemd to start the Docker daemon. Starting with Ubuntu 15.04, Systemd System and Service Manager is used by default instead of the previous init system, Upstart. Systemd ensures Docker will start on boot.

Cleaning Up

It is best good practice to clean up your activities after baking an AMI. I have included a basic clean up script. The cleanup script is as follows:

#!/bin/sh

set -e

echo 'Cleaning up after bootstrapping...'
sudo apt-get -y autoremove
sudo apt-get -y clean
sudo rm -rf /tmp/*
cat /dev/null > ~/.bash_history
history -c
exit

Partially Baking

Before running Packer to build the Docker CE AMI, I set both my AWS access key and AWS secret access key. The Packer file expects the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Running the packer build ubuntu_docker_ce_ami.json command builds the AMI. The abridged output should look similar to the following:

$ packer build docker_ami.json
ubuntu-xenial-docker-ce-base output will be in this color.

==> ubuntu-xenial-docker-ce-base: Prevalidating AMI Name...
    ubuntu-xenial-docker-ce-base: Found Image ID: ami-09b3691f
==> ubuntu-xenial-docker-ce-base: Creating temporary keypair: packer_58bc7a49-9e66-7f76-ce8e-391a67d94987
==> ubuntu-xenial-docker-ce-base: Creating temporary security group for this instance...
==> ubuntu-xenial-docker-ce-base: Authorizing access to port 22 the temporary security group...
==> ubuntu-xenial-docker-ce-base: Launching a source AWS instance...
    ubuntu-xenial-docker-ce-base: Instance ID: i-0ca883ecba0c28baf
==> ubuntu-xenial-docker-ce-base: Waiting for instance (i-0ca883ecba0c28baf) to become ready...
==> ubuntu-xenial-docker-ce-base: Adding tags to source instance
==> ubuntu-xenial-docker-ce-base: Waiting for SSH to become available...
==> ubuntu-xenial-docker-ce-base: Connected to SSH!
==> ubuntu-xenial-docker-ce-base: Uploading bootstrap_docker_ce.sh => /tmp/bootstrap_docker_ce.sh
==> ubuntu-xenial-docker-ce-base: Uploading cleanup.sh => /tmp/cleanup.sh
==> ubuntu-xenial-docker-ce-base: Provisioning with shell script: /var/folders/kf/637b0qns7xb0wh9p8c4q0r_40000gn/T/packer-shell189662158
    ...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: E: Unable to locate package docker-engine
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: ca-certificates is already the newest version (20160104ubuntu1).
    ubuntu-xenial-docker-ce-base: apt-transport-https is already the newest version (1.2.19).
    ubuntu-xenial-docker-ce-base: curl is already the newest version (7.47.0-1ubuntu2.2).
    ubuntu-xenial-docker-ce-base: software-properties-common is already the newest version (0.96.20.5).
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: OK
    ubuntu-xenial-docker-ce-base: pub   4096R/0EBFCD88 2017-02-22
    ubuntu-xenial-docker-ce-base: Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
    ubuntu-xenial-docker-ce-base: uid                  Docker Release (CE deb) <docker@docker.com>
    ubuntu-xenial-docker-ce-base: sub   4096R/F273FCD8 2017-02-22
    ubuntu-xenial-docker-ce-base:
    ubuntu-xenial-docker-ce-base: Hit:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial InRelease
    ubuntu-xenial-docker-ce-base: Get:2 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
    ...
    ubuntu-xenial-docker-ce-base: Get:27 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [89.5 kB]
    ubuntu-xenial-docker-ce-base: Fetched 10.6 MB in 2s (4,065 kB/s)
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: Calculating upgrade...
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: The following additional packages will be installed:
    ubuntu-xenial-docker-ce-base: aufs-tools cgroupfs-mount libltdl7
    ubuntu-xenial-docker-ce-base: Suggested packages:
    ubuntu-xenial-docker-ce-base: mountall
    ubuntu-xenial-docker-ce-base: The following NEW packages will be installed:
    ubuntu-xenial-docker-ce-base: aufs-tools cgroupfs-mount docker-ce libltdl7
    ubuntu-xenial-docker-ce-base: 0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: Need to get 19.4 MB of archives.
    ubuntu-xenial-docker-ce-base: After this operation, 89.4 MB of additional disk space will be used.
    ubuntu-xenial-docker-ce-base: Get:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/universe amd64 aufs-tools amd64 1:3.2+20130722-1.1ubuntu1 [92.9 kB]
    ...
    ubuntu-xenial-docker-ce-base: Get:4 https://download.docker.com/linux/ubuntu xenial/stable amd64 docker-ce amd64 17.03.0~ce-0~ubuntu-xenial [19.3 MB]
    ubuntu-xenial-docker-ce-base: debconf: unable to initialize frontend: Dialog
    ubuntu-xenial-docker-ce-base: debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
    ubuntu-xenial-docker-ce-base: debconf: falling back to frontend: Readline
    ubuntu-xenial-docker-ce-base: debconf: unable to initialize frontend: Readline
    ubuntu-xenial-docker-ce-base: debconf: (This frontend requires a controlling tty.)
    ubuntu-xenial-docker-ce-base: debconf: falling back to frontend: Teletype
    ubuntu-xenial-docker-ce-base: dpkg-preconfigure: unable to re-open stdin:
    ubuntu-xenial-docker-ce-base: Fetched 19.4 MB in 1s (17.8 MB/s)
    ubuntu-xenial-docker-ce-base: Selecting previously unselected package aufs-tools.
    ubuntu-xenial-docker-ce-base: (Reading database ... 53844 files and directories currently installed.)
    ubuntu-xenial-docker-ce-base: Preparing to unpack .../aufs-tools_1%3a3.2+20130722-1.1ubuntu1_amd64.deb ...
    ubuntu-xenial-docker-ce-base: Unpacking aufs-tools (1:3.2+20130722-1.1ubuntu1) ...
    ...
    ubuntu-xenial-docker-ce-base: Setting up docker-ce (17.03.0~ce-0~ubuntu-xenial) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for libc-bin (2.23-0ubuntu5) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for systemd (229-4ubuntu16) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for ureadahead (0.100.0-19) ...
    ubuntu-xenial-docker-ce-base: groupadd: group 'docker' already exists
    ubuntu-xenial-docker-ce-base: Synchronizing state of docker.service with SysV init with /lib/systemd/systemd-sysv-install...
    ubuntu-xenial-docker-ce-base: Executing /lib/systemd/systemd-sysv-install enable docker
    ubuntu-xenial-docker-ce-base: Cleanup...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
==> ubuntu-xenial-docker-ce-base: Stopping the source instance...
==> ubuntu-xenial-docker-ce-base: Waiting for the instance to stop...
==> ubuntu-xenial-docker-ce-base: Creating the AMI: ubuntu-xenial-docker-ce-base 1288227081
    ubuntu-xenial-docker-ce-base: AMI: ami-e9ca6eff
==> ubuntu-xenial-docker-ce-base: Waiting for AMI to become ready...
==> ubuntu-xenial-docker-ce-base: Modifying attributes on AMI (ami-e9ca6eff)...
    ubuntu-xenial-docker-ce-base: Modifying: description
==> ubuntu-xenial-docker-ce-base: Modifying attributes on snapshot (snap-058a26c0250ee3217)...
==> ubuntu-xenial-docker-ce-base: Adding tags to AMI (ami-e9ca6eff)...
==> ubuntu-xenial-docker-ce-base: Tagging snapshot: snap-043a16c0154ee3217
==> ubuntu-xenial-docker-ce-base: Creating AMI tags
==> ubuntu-xenial-docker-ce-base: Creating snapshot tags
==> ubuntu-xenial-docker-ce-base: Terminating the source AWS instance...
==> ubuntu-xenial-docker-ce-base: Cleaning up any extra volumes...
==> ubuntu-xenial-docker-ce-base: No volumes to clean up, skipping
==> ubuntu-xenial-docker-ce-base: Deleting temporary security group...
==> ubuntu-xenial-docker-ce-base: Deleting temporary keypair...
Build 'ubuntu-xenial-docker-ce-base' finished.

==> Builds finished. The artifacts of successful builds are:
--> ubuntu-xenial-docker-ce-base: AMIs were created:

us-east-1: ami-e9ca6eff

Results

The result is an Ubuntu 16.04 AMI in US East 1 with Docker CE 17.03 installed. To confirm the new AMI is now available, I will use the AWS CLI to examine the resulting AMI:

aws ec2 describe-images \
  --filters Name=tag-key,Values=ami Name=tag-value,Values=ubuntu-xenial-docker-ce-base \
  --query 'Images[*].{ID:ImageId}'

Resulting output:

{
    "Images": [
        {
            "VirtualizationType": "hvm",
            "Name": "ubuntu-xenial-docker-ce-base 1488747081",
            "Tags": [
                {
                    "Value": "ubuntu-xenial-docker-ce-base",
                    "Key": "ami"
                }
            ],
            "Hypervisor": "xen",
            "SriovNetSupport": "simple",
            "ImageId": "ami-e9ca6eff",
            "State": "available",
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/sda1",
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "SnapshotId": "snap-048a16c0250ee3227",
                        "VolumeSize": 8,
                        "VolumeType": "gp2",
                        "Encrypted": false
                    }
                },
                {
                    "DeviceName": "/dev/sdb",
                    "VirtualName": "ephemeral0"
                },
                {
                    "DeviceName": "/dev/sdc",
                    "VirtualName": "ephemeral1"
                }
            ],
            "Architecture": "x86_64",
            "ImageLocation": "931066906971/ubuntu-xenial-docker-ce-base 1488747081",
            "RootDeviceType": "ebs",
            "OwnerId": "931066906971",
            "RootDeviceName": "/dev/sda1",
            "CreationDate": "2017-03-05T20:53:41.000Z",
            "Public": false,
            "ImageType": "machine",
            "Description": "ubuntu-xenial-docker-ce-base AMI"
        }
    ]
}

Finally, here is the new AMI as seen in the AWS EC2 Management Console:

EC2 Management Console - AMI

Terraform

To confirm Docker CE is installed and running, I can provision a new EC2 instance, using HashiCorp Terraform. This post is too short to detail all the Terraform code required to stand up a complete environment. I’ve included the complete code in the GitHub repo for this post. Not, the Terraform code is only used to testing. No security, including the use of a properly configured security groups, public/private subnets, and a NAT server, is configured.

Below is a greatly abridged version of the Terraform code I used to provision a new EC2 instance, using Terraform’s aws_instance resource. The resulting EC2 instance should have Docker CE available.

# test-docker-ce instance
resource "aws_instance" "test-docker-ce" {
  connection {
    user        = "ubuntu"
    private_key = "${file("~/.ssh/test-docker-ce")}"
    timeout     = "${connection_timeout}"
  }

  ami               = "ami-e9ca6eff"
  instance_type     = "t2.nano"
  availability_zone = "us-east-1a"
  count             = "1"

  key_name               = "${aws_key_pair.auth.id}"
  vpc_security_group_ids = ["${aws_security_group.test-docker-ce.id}"]
  subnet_id              = "${aws_subnet.test-docker-ce.id}"

  tags {
    Owner       = "Gary A. Stafford"
    Terraform   = true
    Environment = "test-docker-ce"
    Name        = "tf-instance-test-docker-ce"
  }
}

By using the AWS CLI, once again, we can confirm the new EC2 instance was built using the correct AMI:

aws ec2 describe-instances \
  --filters Name='tag:Name,Values=tf-instance-test-docker-ce' \
  --output text --query 'Reservations[*].Instances[*].ImageId'

Resulting output looks good:

ami-e9ca6eff

Finally, here is the new EC2 as seen in the AWS EC2 Management Console:

EC2 Management Console - EC2

SSHing into the new EC2 instance, I should observe that the operating system is Ubuntu 16.04.2 LTS and that Docker version 17.03.0-ce is installed and running:

Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-64-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.

Last login: Sun Mar  5 22:06:01 2017 from <my_ip_address>

ubuntu@ip-<ec2_local_ip>:~$ docker --version
Docker version 17.03.0-ce, build 3a232c8

Conclusion

Docker EE and CE represent a significant step forward in expanding Docker’s enterprise-grade toolkit. Replacing or installing Docker EE or CE on your AWS AMIs is easy, using Docker’s guide along with HashiCorp Packer.

All source code for this post can be found on GitHub.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , , ,

2 Comments

Distributed Service Configuration with Consul, Spring Cloud, and Docker

Consul UI - Nodes

Introduction

In this post, we will explore the use of HashiCorp Consul for distributed configuration of containerized Spring Boot services, deployed to a Docker swarm cluster. In the first half of the post, we will provision a series of virtual machines, build a Docker swarm on top of those VMs, and install Consul and Registrator on each swarm host. In the second half of the post, we will configure and deploy multiple instances of a containerized Spring Boot service, backed by MongoDB, to the swarm cluster, using Docker Compose.

The final objective of this post is to have all the deployed services registered with Consul, via Registrator, and the service’s runtime configuration provided dynamically by Consul.

Objectives

  1. Provision a series of virtual machine hosts, using Docker Machine and Oracle VirtualBox
  2. Provide distributed and highly available cluster management and service orchestration, using Docker swarm mode
  3. Provide distributed and highly available service discovery, health checking, and a hierarchical key/value store, using HashiCorp Consul
  4. Provide service discovery and automatic registration of containerized services to Consul, using Registrator, Glider Labs’ service registry bridge for Docker
  5. Provide distributed configuration for containerized Spring Boot services using Consul and Pivotal Spring Cloud Consul Config
  6. Deploy multiple instances of a Spring Boot service, backed by MongoDB, to the swarm cluster, using Docker Compose version 3.

Technologies

  • Docker
  • Docker Compose (v3)
  • Docker Hub
  • Docker Machine
  • Docker swarm mode
  • Docker Swarm Visualizer (Mano Marks)
  • Glider Labs Registrator
  • Gradle
  • HashiCorp Consul
  • Java
  • MongoDB
  • Oracle VirtualBox VM Manager
  • Spring Boot
  • Spring Cloud Consul Config
  • Travis CI

Code Sources

All code in this post exists in two GitHub repositories. I have labeled each code snippet in the post with the corresponding file in the repositories. The first repository, consul-docker-swarm-compose, contains all the code necessary for provisioning the VMs and building the Docker swarm and Consul clusters. The repo also contains code for deploying Swarm Visualizer and Registrator. Make sure you clone the swarm-mode branch.

git clone --depth 1 --branch swarm-mode \
  https://github.com/garystafford/microservice-docker-demo-consul.git
cd microservice-docker-demo-consul

The second repository, microservice-docker-demo-widget, contains all the code necessary for configuring Consul and deploying the Widget service stack. Make sure you clone the consul branch.

git clone --depth 1 --branch consul \
  https://github.com/garystafford/microservice-docker-demo-widget.git
cd microservice-docker-demo-widget

Docker Versions

With the Docker toolset evolving so quickly, features frequently change or become outmoded. At the time of this post, I am running the following versions on my Mac.

  • Docker Engine version 1.13.1
  • Boot2Docker version 1.13.1
  • Docker Compose version 1.11.2
  • Docker Machine version 0.10.0
  • HashiCorp Consul version 0.7.5

Provisioning VM Hosts

First, we will provision a series of six virtual machines (aka machines, VMs, or hosts), using Docker Machine and Oracle VirtualBox.

consul-post-1

By switching Docker Machine’s driver, you can easily switch from VirtualBox to other vendors, such as VMware, AWS, GCE, or AWS. I have explicitly set several VirtualBox driver options, using the default values, for better clarification into the VirtualBox VMs being created.

vms=( "manager1" "manager2" "manager3"
      "worker1" "worker2" "worker3" )

for vm in ${vms[@]}
do
  docker-machine create \
    --driver virtualbox \
    --virtualbox-memory "1024" \
    --virtualbox-cpu-count "1" \
    --virtualbox-disk-size "20000" \
    ${vm}
done

Using the docker-machine ls command, we should observe the resulting series of VMs looks similar to the following. Note each of the six VMs has a machine name and an IP address in the range of 192.168.99.1/24.

$ docker-machine ls
NAME       ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER    ERRORS
manager1   -        virtualbox   Running   tcp://192.168.99.109:2376           v1.13.1
manager2   -        virtualbox   Running   tcp://192.168.99.110:2376           v1.13.1
manager3   -        virtualbox   Running   tcp://192.168.99.111:2376           v1.13.1
worker1    -        virtualbox   Running   tcp://192.168.99.112:2376           v1.13.1
worker2    -        virtualbox   Running   tcp://192.168.99.113:2376           v1.13.1
worker3    -        virtualbox   Running   tcp://192.168.99.114:2376           v1.13.1

We may also view the VMs using the Oracle VM VirtualBox Manager application.

Oracle VirtualBox VM Manager

Docker Swarm Mode

Next, we will provide, amongst other capabilities, cluster management and orchestration, using Docker swarm mode. It is important to understand that the relatively new Docker swarm mode is not the same the Docker Swarm. Legacy Docker Swarm was succeeded by Docker’s integrated swarm mode, with the release of Docker v1.12.0, in July 2016.

We will create a swarm (a cluster of Docker Engines or nodes), consisting of three Manager nodes and three Worker nodes, on the six VirtualBox VMs. Using this configuration, the swarm will be distributed and highly available, able to suffer the loss of one of the Manager nodes, without failing.

consul-post-2

Manager Nodes

First, we will create the initial Docker swarm Manager node.

SWARM_MANAGER_IP=$(docker-machine ip manager1)
echo ${SWARM_MANAGER_IP}

docker-machine ssh manager1 \
  "docker swarm init \
  --advertise-addr ${SWARM_MANAGER_IP}"

This initial Manager node advertises its IP address to future swarm members.

Next, we will create two additional swarm Manager nodes, which will join the initial Manager node, by using the initial Manager node’s advertised IP address. The three Manager nodes will then elect a single Leader to conduct orchestration tasks. According to Docker, Manager nodes implement the Raft Consensus Algorithm to manage the global cluster state.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

docker-machine env manager1
eval $(docker-machine env manager1)

MANAGER_SWARM_JOIN=$(docker-machine ssh ${vms[0]} "docker swarm join-token manager")
MANAGER_SWARM_JOIN=$(echo ${MANAGER_SWARM_JOIN} | grep -E "(docker).*(2377)" -o)
MANAGER_SWARM_JOIN=$(echo ${MANAGER_SWARM_JOIN//\\/''})
echo ${MANAGER_SWARM_JOIN}

for vm in ${vms[@]:1:2}
do
  docker-machine ssh ${vm} ${MANAGER_SWARM_JOIN}
done

A quick note on the string manipulation of the MANAGER_SWARM_JOIN variable, above. Running the docker swarm join-token manager command, outputs something similar to the following.

To add a manager to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-1s53oajsj19lgniar3x1gtz3z9x0iwwumlew0h9ism5alt2iic-1qxo1nx24pyd0pg61hr6pp47t \
    192.168.99.109:2377

Using a bit of string manipulation, the resulting value of the MANAGER_SWARM_JOIN variable will be similar to the following command. We then ssh into each host and execute this command, one for the Manager nodes and another similar command, for the Worker nodes.

docker swarm join --token SWMTKN-1-1s53oajsj19lgniar3x1gtz3z9x0iwwumlew0h9ism5alt2iic-1qxo1nx24pyd0pg61hr6pp47t 192.168.99.109:2377

Worker Nodes

Next, we will create three swarm Worker nodes, using a similar method. The three Worker nodes will join the three swarm Manager nodes, as part of the swarm cluster. The main difference, according to Docker, Worker node’s “sole purpose is to execute containers. Worker nodes don’t participate in the Raft distributed state, make in scheduling decisions, or serve the swarm mode HTTP API.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

WORKER_SWARM_JOIN=$(docker-machine ssh manager1 "docker swarm join-token worker")
WORKER_SWARM_JOIN=$(echo ${WORKER_SWARM_JOIN} | grep -E "(docker).*(2377)" -o)
WORKER_SWARM_JOIN=$(echo ${WORKER_SWARM_JOIN//\\/''})
echo ${WORKER_SWARM_JOIN}

for vm in ${vms[@]:3:3}
do
  docker-machine ssh ${vm} ${WORKER_SWARM_JOIN}
done

Using the docker node ls command, we should observe the resulting Docker swarm cluster looks similar to the following:

$ docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
u75gflqvu7blmt5mcr2dcp4g5 *  manager1  Ready   Active        Leader
0b7pkek76dzeedtqvvjssdt2s    manager2  Ready   Active        Reachable
s4mmra8qdgwukjsfg007rvbge    manager3  Ready   Active        Reachable
n89upik5v2xrjjaeuy9d4jybl    worker1   Ready   Active
nsy55qzavzxv7xmanraijdw4i    worker2   Ready   Active
hhn1l3qhej0ajmj85gp8qhpai    worker3   Ready   Active

Note the three swarm Manager nodes, three swarm Worker nodes, and the Manager, which was elected Leader. The other two Manager nodes are marked as Reachable. The asterisk indicates manager1 is the active machine.

I have also deployed Mano Marks’ Docker Swarm Visualizer to each of the swarm cluster’s three Manager nodes. This tool is described as a visualizer for Docker Swarm Mode using the Docker Remote API, Node.JS, and D3. It provides a great visualization the swarm cluster and its running components.

docker service create \
  --name swarm-visualizer \
  --publish 5001:8080/tcp \
  --constraint node.role==manager \
  --mode global \
  --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
  manomarks/visualizer:latest

The Swarm Visualizer should be available on any of the Manager’s IP addresses, on port 5001. We constrained the Visualizer to the Manager nodes, by using the node.role==manager constraint.

Docker Swarm Visualizer

HashiCorp Consul

Next, we will install HashiCorp Consul onto our swarm cluster of VirtualBox hosts. Consul will provide us with service discovery, health checking, and a hierarchical key/value store. Consul will be installed, such that we end up with six Consul Agents. Agents can run as Servers or Clients. Similar to Docker swarm mode, we will install three Consul servers and three Consul clients, all in one Consul datacenter (our set of six local VMs). This clustered configuration will ensure Consul is distributed and highly available, able to suffer the loss of one of the Consul servers instances, without failing.

consul-post-3

Consul Servers

Again, similar to Docker swarm mode, we will install the initial Consul server. Both the Consul servers and clients run inside Docker containers, one per swarm host.

consul_server="consul-server1"
docker-machine env manager1
eval $(docker-machine env manager1)

docker run -d \
  --net=host \
  --hostname ${consul_server} \
  --name ${consul_server} \
  --env "SERVICE_IGNORE=true" \
  --env "CONSUL_CLIENT_INTERFACE=eth0" \
  --env "CONSUL_BIND_INTERFACE=eth1" \
  --volume consul_data:/consul/data \
  --publish 8500:8500 \
  consul:latest \
  consul agent -server -ui -bootstrap-expect=3 -client=0.0.0.0 -advertise=${SWARM_MANAGER_IP} -data-dir="/consul/data"

The first Consul server advertises itself on its host’s IP address. The bootstrap-expect=3option instructs Consul to wait until three Consul servers are available before bootstrapping the cluster. This option also allows an initial Leader to be elected automatically. The three Consul servers form a consensus quorum, using the Raft consensus algorithm.

All Consul server and Consul client Docker containers will have a data directory (/consul/data), mapped to a volume (consul_data) on their corresponding VM hosts.

Consul provides a basic browser-based user interface. By using the -ui option each Consul server will have the UI available on port 8500.

Next, we will create two additional Consul Server instances, which will join the initial Consul server, by using the first Consul server’s advertised IP address.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

consul_servers=( "consul-server2" "consul-server3" )

i=0
for vm in ${vms[@]:1:2}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})

  docker run -d \
    --net=host \
    --hostname ${consul_servers[i]} \
    --name ${consul_servers[i]} \
    --env "SERVICE_IGNORE=true" \
    --env "CONSUL_CLIENT_INTERFACE=eth0" \
    --env "CONSUL_BIND_INTERFACE=eth1" \
    --volume consul_data:/consul/data \
    --publish 8500:8500 \
    consul:latest \
    consul agent -server -ui -client=0.0.0.0 -advertise='{{ GetInterfaceIP "eth1" }}' -retry-join=${SWARM_MANAGER_IP} -data-dir="/consul/data"
  let "i++"
done

Consul Clients

Next, we will install Consul clients on the three swarm worker nodes, using a similar method to the servers. According to Consul, The client is relatively stateless. The only background activity a client performs is taking part in the LAN gossip pool. The lack of the -server option, indicates Consul will install this agent as a client.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

consul_clients=( "consul-client1" "consul-client2" "consul-client3" )

i=0
for vm in ${vms[@]:3:3}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})
  docker rm -f $(docker ps -a -q)

  docker run -d \
    --net=host \
    --hostname ${consul_clients[i]} \
    --name ${consul_clients[i]} \
    --env "SERVICE_IGNORE=true" \
    --env "CONSUL_CLIENT_INTERFACE=eth0" \
    --env "CONSUL_BIND_INTERFACE=eth1" \
    --volume consul_data:/consul/data \
    consul:latest \
    consul agent -client=0.0.0.0 -advertise='{{ GetInterfaceIP "eth1" }}' -retry-join=${SWARM_MANAGER_IP} -data-dir="/consul/data"
  let "i++"
done

From inside the consul-server1 Docker container, on the manager1 host, using the consul members command, we should observe a current list of cluster members along with their current state.

$ docker exec -it consul-server1 consul members
Node            Address              Status  Type    Build  Protocol  DC
consul-client1  192.168.99.112:8301  alive   client  0.7.5  2         dc1
consul-client2  192.168.99.113:8301  alive   client  0.7.5  2         dc1
consul-client3  192.168.99.114:8301  alive   client  0.7.5  2         dc1
consul-server1  192.168.99.109:8301  alive   server  0.7.5  2         dc1
consul-server2  192.168.99.110:8301  alive   server  0.7.5  2         dc1
consul-server3  192.168.99.111:8301  alive   server  0.7.5  2         dc1

Note the three Consul servers, the three Consul clients, and the single datacenter, dc1. Also, note the address of each agent matches the IP address of their swarm hosts.

To further validate Consul is installed correctly, access Consul’s web-based UI using the IP address of any of Consul’s three server’s swarm host, on port 8500. Shown below is the initial view of Consul prior to services being registered. Note the Consol instance’s names. Also, notice the IP address of each Consul instance corresponds to the swarm host’s IP address, on which it resides.

Consul UI - Nodes

Service Container Registration

Having provisioned the VMs, and built the Docker swarm cluster and Consul cluster, there is one final step to prepare our environment for the deployment of containerized services. We want to provide service discovery and registration for Consul, using Glider Labs’ Registrator.

Registrator will be installed on each host, within a Docker container. According to Glider Labs’ website, “Registrator will automatically register and deregister services for any Docker container, as the containers come online.

We will install Registrator on only five of our six hosts. We will not install it on manager1. Since this host is already serving as both the Docker swarm Leader and Consul Leader, we will not be installing any additional service containers on that host. This is a personal choice, not a requirement.

vms=( "manager1" "manager2" "manager3"
      "worker1" "worker2" "worker3" )

for vm in ${vms[@]:1}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})

  HOST_IP=$(docker-machine ip ${vm})
  # echo ${HOST_IP}

  docker run -d \
    --name=registrator \
    --net=host \
    --volume=/var/run/docker.sock:/tmp/docker.sock \
    gliderlabs/registrator:latest \
      -internal consul://${HOST_IP:localhost}:8500
done

Multiple Service Discovery Options

You might be wondering why we are worried about service discovery and registration with Consul, using Registrator, considering Docker swarm mode already has service discovery. Deploying our services using swarm mode, we are relying on swarm mode for service discovery. I chose to include Registrator in this example, to demonstrate an alternative method of service discovery, which can be used with other tools such as Consul Template, for dynamic load-balancer templating.

We actually have a third option for automatic service registration, Spring Cloud Consul Discovery. I chose not use Spring Cloud Consul Discovery in this post, to register the Widget service. Spring Cloud Consul Discovery would have automatically registered the Spring Boot service with Consul. The actual Widget service stack contains MongoDB, as well as other non-Spring Boot service components, which I removed for this post, such as the NGINX load balancer. Using Registrator, all the containerized services, not only the Spring Boot services, are automatically registered.

You will note in the Widget source code, I commented out the @EnableDiscoveryClient annotation on the WidgetApplication class. If you want to use Spring Cloud Consul Discovery, simply uncomment this annotation.

Distributed Configuration

We are almost ready to deploy some services. For this post’s demonstration, we will deploy the Widget service stack. The Widget service stack is composed of a simple Spring Boot service, backed by MongoDB; it is easily deployed as a containerized application. I often use the service stack for testing and training.

Widgets represent inanimate objects, users purchase with points. Widgets have particular physical characteristics, such as product id, name, color, size, and price. The inventory of widgets is stored in the widgets MongoDB database.

Hierarchical Key/Value Store

Before we can deploy the Widget service, we need to store the Widget service’s Spring Profiles in Consul’s hierarchical key/value store. The Widget service’s profiles are sets of configuration the service needs to run in different environments, such as Development, QA, UAT, and Production. Profiles include configuration, such as port assignments, database connection information, and logging and security settings. The service’s active profile will be read by the service at startup, from Consul, and applied at runtime.

As opposed to the traditional Java properties’ key/value format the Widget service uses YAML to specify it’s hierarchical configuration data. Using Spring Cloud Consul Config, an alternative to Spring Cloud Config Server and Client, we will store the service’s Spring Profile in Consul as blobs of YAML.

There are multi ways to store configuration in Consul. Storing the Widget service’s profiles as YAML was the quickest method of migrating the existing application.yml file’s multiple profiles to Consul. I am not insisting YAML is the most effective method of storing configuration in Consul’s k/v store; that depends on the application’s requirements.

Using the Consul KV HTTP API, using the HTTP PUT method, we will place each profile, as a YAML blob, into the appropriate data key, in Consul. For convenience, I have separated the Widget service’s three existing Spring profiles into individual YAML files. We will load each of the YAML file’s contents into Consul, using curl.

Below is the Widget service’s default Spring profile. The default profile is activated if no other profiles are explicitly active when the application context starts.

endpoints:
  enabled: true
  sensitive: false
info:
  java:
    source: "${java.version}"
    target: "${java.version}"
logging:
  level:
    root: DEBUG
management:
  security:
    enabled: false
  info:
    build:
      enabled: true
    git:
      mode: full
server:
  port: 8030
spring:
  data:
    mongodb:
      database: widgets
      host: localhost
      port: 27017

Below is the Widget service’s docker-local Spring profile. This is the profile we will use when we deploy the Widget service to our swarm cluster. The docker-local profile overrides two properties of the default profile — the name of our MongoDB host and the logging level. All other configuration will come from the default profile.

logging:
  level:
    root: INFO
spring:
  data:
    mongodb:
      host: mongodb

To load the profiles into Consul, from the root of the Widget local git repository, we will execute curl commands. We will use the Consul cluster Leader’s IP address for our HTTP PUT methods.

docker-machine env manager1
eval $(docker-machine env manager1)

CONSUL_SERVER=$(docker-machine ip $(docker node ls | grep Leader | awk '{print $3}'))

# default profile
KEY="config/widget-service/data"
VALUE="consul-configs/default.yaml"
curl -X PUT --data-binary @${VALUE} \
  -H "Content-type: text/x-yaml" \
  ${CONSUL_SERVER:localhost}:8500/v1/kv/${KEY}

# docker-local profile
KEY="config/widget-service/docker-local/data"
VALUE="consul-configs/docker-local.yaml"
curl -X PUT --data-binary @${VALUE} \
  -H "Content-type: text/x-yaml" \
  ${CONSUL_SERVER:localhost}:8500/v1/kv/${KEY}

# docker-production profile
KEY="config/widget-service/docker-production/data"
VALUE="consul-configs/docker-production.yaml"
curl -X PUT --data-binary @${VALUE} \
  -H "Content-type: text/x-yaml" \
  ${CONSUL_SERVER:localhost}:8500/v1/kv/${KEY}

Returning to the Consul UI, we should now observe three Spring profiles, in the appropriate data keys, listed under the Key/Value tab. The default Spring profile YAML blob, the value, will be assigned to the config/widget-service/data key.

Consul UI - Docker Default Spring Profile

The docker-local profile will be assigned to the config/widget-service,docker-local/data key. The keys follow default spring.cloud.consul.config conventions.

Consul UI - Docker Local Spring Profile

Spring Cloud Consul Config

In order for our Spring Boot service to connect to Consul and load the requested active Spring Profile, we need to add a dependency to the gradle.build file, on Spring Cloud Consul Config.

dependencies {
  compile group: 'org.springframework.cloud', name: 'spring-cloud-starter-consul-all';
  ...
}

Next, we need to configure the bootstrap.yml file, to connect and properly read the profile. We must properly set the CONSUL_SERVER environment variable. This value is the Consul server instance hostname or IP address, which the Widget service instances will contact to retrieve its configuration.

spring:
  application:
    name: widget-service
  cloud:
    consul:
      host: ${CONSUL_SERVER:localhost}
      port: 8500
      config:
        fail-fast: true
        format: yaml

Deployment Using Docker Compose

We are almost ready to deploy the Widget service instances and the MongoDB instance (also considered a ‘service’ by Docker), in Docker containers, to the Docker swarm cluster. The Docker Compose file is written using version 3 of the Docker Compose specification. It takes advantage of some of the specification’s new features.

version: '3.0'

services:
  widget:
    image: garystafford/microservice-docker-demo-widget:latest
    depends_on:
    - widget_stack_mongodb
    hostname: widget
    ports:
    - 8030:8030/tcp
    networks:
    - widget_overlay_net
    deploy:
      mode: global
      placement:
        constraints: [node.role == worker]
    environment:
    - "CONSUL_SERVER_URL=${CONSUL_SERVER}"
    - "SERVICE_NAME=widget-service"
    - "SERVICE_TAGS=service"
    command: "java -Dspring.profiles.active=${WIDGET_PROFILE} -Djava.security.egd=file:/dev/./urandom -jar widget/widget-service.jar"

  mongodb:
    image: mongo:latest
    command:
    - --smallfiles
    hostname: mongodb
    ports:
    - 27017:27017/tcp
    networks:
    - widget_overlay_net
    volumes:
    - widget_data_vol:/data/db
    deploy:
      replicas: 1
      placement:
        constraints: [node.role == worker]
    environment:
    - "SERVICE_NAME=mongodb"
    - "SERVICE_TAGS=database"

networks:
  widget_overlay_net:
    external: true

volumes:
  widget_data_vol:
    external: true

External Container Volumes and Network

Note the external volume, widget_data_vol, which will be mounted to the MongoDB container, to the /data/db directory. The volume must be created on each host in the swarm cluster, which may contain the MongoDB instance.

Also, note the external overlay network, widget_overlay_net, which will be used by all the service containers in the service stack to communicate with each other. These must be created before deploying our services.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

for vm in ${vms[@]}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})
  docker volume create --name=widget_data_vol
  echo "Volume created: ${vm}..."
done

We should be able to view the new volume on one of the swarm Worker host, using the docker volume ls command.

$ docker volume ls
DRIVER              VOLUME NAME
local               widget_data_vol

The network is created once, for the whole swarm cluster.

docker-machine env manager1
eval $(docker-machine env manager1)

docker network create \
  --driver overlay \
  --subnet=10.0.0.0/16 \
  --ip-range=10.0.11.0/24 \
  --opt encrypted \
  --attachable=true \
  widget_overlay_net

We can view the new overlay network using the docker network ls command.

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
03bcf76d3cc4        bridge              bridge              local
533285df0ce8        docker_gwbridge     bridge              local
1f627d848737        host                host                local
6wdtuhwpuy4f        ingress             overlay             swarm
b8a1f277067f        none                null                local
4hy77vlnpkkt        widget_overlay_net  overlay             swarm

Deploying the Services

With the profiles loaded into Consul, and the overlay network and data volumes created on the hosts, we can deploy the Widget service instances and MongoDB service instance. We will assign the Consul cluster Leader’s IP address as the CONSUL_SERVER variable. We will assign the Spring profile, docker-local, to the WIDGET_PROFILE variable. Both these variables are used by the Widget service’s Docker Compose file.

docker-machine env manager1
eval $(docker-machine env manager1)

export CONSUL_SERVER=$(docker-machine ip $(docker node ls | grep Leader | awk '{print $3}'))
export WIDGET_PROFILE=docker-local

docker stack deploy --compose-file=docker-compose.yml widget_stack

The Docker images, used to instantiate the service’s Docker containers, must be pulled from Docker Hub, by each swarm host. Consequently, the initial deployment process can take up to several minutes, depending on your Internet connection.

After deployment, using the docker stack ls command, we should observe that the widget_stack stack is deployed to the swarm cluster with two services.

$ docker stack ls
NAME          SERVICES
widget_stack  2

After deployment, using the docker stack ps widget_stack command, we should observe that all the expected services in the widget_stack stack are running. Note this command also shows us where the services are running.

$ docker service ls
20:37:24-gstafford:~/Documents/projects/widget-docker-demo/widget-service$ docker stack ps widget_stack
ID            NAME                                           IMAGE                                                NODE     DESIRED STATE  CURRENT STATE          ERROR  PORTS
qic4j1pl6n4l  widget_stack_widget.hhn1l3qhej0ajmj85gp8qhpai  garystafford/microservice-docker-demo-widget:latest  worker3  Running        Running 4 minutes ago
zrbnyoikncja  widget_stack_widget.nsy55qzavzxv7xmanraijdw4i  garystafford/microservice-docker-demo-widget:latest  worker2  Running        Running 4 minutes ago
81z3ejqietqf  widget_stack_widget.n89upik5v2xrjjaeuy9d4jybl  garystafford/microservice-docker-demo-widget:latest  worker1  Running        Running 4 minutes ago
nx8dxlib3wyk  widget_stack_mongodb.1                         mongo:latest                                         worker1  Running        Running 4 minutes ago

Using the docker service ls command, we should observe that all the expected service instances (replicas) running, including all the widget_stack’s services and the three instances of the swarm-visualizer service.

$ docker service ls
ID            NAME                  MODE        REPLICAS  IMAGE
almmuqqe9v55  swarm-visualizer      global      3/3       manomarks/visualizer:latest
i9my74tl536n  widget_stack_widget   global      0/3       garystafford/microservice-docker-demo-widget:latest
ju2t0yjv9ily  widget_stack_mongodb  replicated  1/1       mongo:latest

Since it may take a few moments for the widget_stack’s services to come up, you may need to re-run the command until you see all expected replicas running.

$ docker service ls
ID            NAME                  MODE        REPLICAS  IMAGE
almmuqqe9v55  swarm-visualizer      global      3/3       manomarks/visualizer:latest
i9my74tl536n  widget_stack_widget   global      3/3       garystafford/microservice-docker-demo-widget:latest
ju2t0yjv9ily  widget_stack_mongodb  replicated  1/1       mongo:latest

If you see 3 of 3 replicas of the Widget service running, this is a good sign everything is working! We can confirm this, by checking back in with the Consul UI. We should see the three Widget services and the single instance of MongoDB. They were all registered with Registrator. For clarity, we have purposefully did not register the Swarm Visualizer instances with Consul, using Registrator’s "SERVICE_IGNORE=true" environment variable. Registrator will also not register itself with Consul.

Consul UI - Services

We can also view the Widget stack’s services, by revisiting the Swarm Visualizer, on any of the Docker swarm cluster Manager’s IP address, on port 5001.

Docker Swarm Visualizer - Deployed Services

Spring Boot Actuator

The most reliable way to confirm that the Widget service instances did indeed load their configuration from Consul, is to check the Spring Boot Actuator Environment endpoint for any of the Widget instances. As shown below, all configuration is loaded from the default profile, except for two configuration items, which override the default profile values and are loaded from the active docker-local Spring profile.

Spring Actuator Environment Endpoint

Consul Endpoint

There is yet another way to confirm Consul is working properly with our services, without accessing the Consul UI. This ability is provided by Spring Boot Actuator’s Consul endpoint. If the Spring Boot service has a dependency on Spring Cloud Consul Config, this endpoint will be available. It displays useful information about the Consul cluster and the services which have been registered with Consul.

Spring Actuator Consul Endpoint

Cleaning Up the Swarm

If you want to start over again, without destroying the Docker swarm cluster, you may use the following commands to delete all the stacks, services, stray containers, and unused images, networks, and volumes. The Docker swarm cluster and all Docker images pulled to the VMs will be left intact.

# remove all stacks and services
docker-machine env manager1
eval $(docker-machine env manager1)
for stack in $(docker stack ls | awk '{print $1}'); do docker stack rm ${stack}; done
for service in $(docker service ls | awk '{print $1}'); do docker service rm ${service}; done

# remove all containers, networks, and volumes
vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

for vm in ${vms[@]}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})
  docker system prune -f
  docker stop $(docker ps -a -q)
  docker rm -f $(docker ps -a -q)
done

Conclusion

We have barely scratched the surface of the features and capabilities of Docker swarm mode or Consul. However, the post did demonstrate several key concepts, critical to configuring, deploying, and managing a modern, distributed, containerized service application platform. These concepts included cluster management, service discovery, service orchestration, distributed configuration, hierarchical key/value stores, and distributed and highly-available systems.

This post’s example was designed for demonstration purposes only and meant to simulate a typical Development or Test environment. Although the swarm cluster, Consul cluster, and the Widget service instances were deployed in a distributed and highly available configuration, this post’s example is far from being production-ready. Some things that would be considered, if you were to make this more production-like, include:

  • MongoDB database is a single point of failure. It should be deployed in a sharded and clustered configuration.
  • The swarm nodes were deployed to a single datacenter. For redundancy, the nodes should be spread across multiple physical hypervisors, separate availability zones and/or geographically separate datacenters (regions).
  • The example needs centralized logging, monitoring, and alerting, to better understand and react to how the swarm, Docker containers, and services are performing.
  • Most importantly, we made was no attempt to secure the services, containers, data, network, or hosts. Security is a critical component for moving this example to Production.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , ,

6 Comments

Infrastructure as Code Maturity Model

Systematically Evolving an Organization’s Infrastructure

Infrastructure and software development teams are increasingly building and managing infrastructure using automated tools that have been described as “infrastructure as code.” – Kief Morris (Infrastructure as Code)

The process of managing and provisioning computing infrastructure and their configuration through machine-processable, declarative, definition files, rather than physical hardware configuration or the use of interactive configuration tools. – Wikipedia (abridged)

Convergence of CD, Cloud, and IaC

In 2011, co-authors Jez Humble, formerly of ThoughtWorks, and David Farley, published their ground-breaking book, Continuous Delivery. Humble and Farley’s book set out, in their words, to automate the ‘painful, risky, and time-consuming process’ of the software ‘build, deployment, and testing process.

cd_image_02

Over the next five years, Humble and Farley’s Continuous Delivery made a significant contribution to the modern phenomena of DevOps. According to Wikipedia, DevOps is the ‘culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes.

In parallel with the growth of DevOps, Cloud Computing continued to grow at an explosive rate. Amazon pioneered modern cloud computing in 2006 with the launch of its Elastic Compute Cloud. Two years later, in 2008, Microsoft launched its cloud platform, Azure. In 2010, Rackspace launched OpenStack.

Today, there is a flock of ‘cloud’ providers. Their services fall into three primary service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Since we will be discussing infrastructure, we will focus on IaaS and PaaS. Leaders in this space include Google Cloud Platform, RedHat, Oracle Cloud, Pivotal Cloud Foundry, CenturyLink Cloud, Apprenda, IBM SmartCloud Enterprise, and Heroku, to mention just a few.

Finally, fast forward to June 2016, O’Reilly releases Infrastructure as Code
Managing Servers in the Cloud
, by Kief Morris, ThoughtWorks. This crucial work bridges many of the concepts first introduced in Humble and Farley’s Continuous Delivery, with the evolving processes and practices to support cloud computing.

cd_image_03

This post examines how to apply the principles found in the Continuous Delivery Maturity Model, an analysis tool detailed in Humble and Farley’s Continuous Delivery, and discussed herein, to the best practices found in Morris’ Infrastructure as Code.

Infrastructure as Code

Before we continue, we need a shared understanding of infrastructure as code. Below are four examples of infrastructure as code, as Wikipedia defined them, ‘machine-processable, declarative, definition files.’ The code was written using four popular tools, including HashiCorp Packer, Docker, AWS CloudFormation, and HashiCorp Terraform. Executing the code provisions virtualized cloud infrastructure.

HashiCorp Packer

Packer definition of an AWS EBS-backed AMI, based on Ubuntu.

{
  "variables": {
    "aws_access_key": "",
    "aws_secret_key": ""
  },
  "builders": [{
    "type": "amazon-ebs",
    "access_key": "{{user `aws_access_key`}}",
    "secret_key": "{{user `aws_secret_key`}}",
    "region": "us-east-1",
    "source_ami": "ami-fce3c696",
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu",
    "ami_name": "packer-example {{timestamp}}"
  }]
}

Docker

Dockerfile, used to create a Docker image, and subsequently a Docker container, running MongoDB.

FROM ubuntu:16.04
MAINTAINER Docker
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
RUN echo "deb http://repo.mongodb.org/apt/ubuntu" \
$(cat /etc/lsb-release | grep DISTRIB_CODENAME | cut -d= -f2)/mongodb-org/3.2 multiverse" | \
tee /etc/apt/sources.list.d/mongodb-org-3.2.list
RUN apt-get update && apt-get install -y mongodb-org
RUN mkdir -p /data/db
EXPOSE 27017
ENTRYPOINT ["/usr/bin/mongod"]

AWS CloudFormation

AWS CloudFormation declaration for three services enabled on a running instance.

services:
  sysvinit:
    nginx:
      enabled: "true"
      ensureRunning: "true"
      files:
        - "/etc/nginx/nginx.conf"
      sources:
        - "/var/www/html"
    php-fastcgi:
      enabled: "true"
      ensureRunning: "true"
      packages:
        yum:
          - "php"
          - "spawn-fcgi"
    sendmail:
      enabled: "false"
      ensureRunning: "false"

HashiCorp Terraform

Terraform definition of an AWS m1.small EC2 instance, running NGINX on Ubuntu.

resource "aws_instance" "web" {
  connection { user = "ubuntu" }
instance_type = "m1.small"
Ami = "${lookup(var.aws_amis, var.aws_region)}"
Key_name = "${aws_key_pair.auth.id}"
vpc_security_group_ids = ["${aws_security_group.default.id}"]
Subnet_id = "${aws_subnet.default.id}"
provisioner "remote-exec" {
  inline = [
    "sudo apt-get -y update",
    "sudo apt-get -y install nginx",
    "sudo service nginx start",
  ]
 }
}

Cloud-based Infrastructure as a Service

The previous examples provide but the narrowest of views into the potential breadth of infrastructure as code. Leading cloud providers, such as Amazon and Microsoft, offer hundreds of unique offerings, most of which may be defined and manipulated through code — infrastructure as code.

cd_image_05

cd_image_04

What Infrastructure as Code?

The question many ask is, what types of infrastructure can be defined as code? Although vendors and cloud providers have their unique names and descriptions, most infrastructure is divided into a few broad categories:

  • Compute
  • Databases, Caching, and Messaging
  • Storage, Backup, and Content Delivery
  • Networking
  • Security and Identity
  • Monitoring, Logging, and Analytics
  • Management Tooling

Continuous Delivery Maturity Model

We also need a common understanding of the Continuous Delivery Maturity Model. According to Humble and Farley, the Continuous Delivery Maturity Model was distilled as a model that ‘helps to identify where an organization stands in terms of the maturity of its processes and practices and defines a progression that an organization can work through to improve.

The Continuous Delivery Maturity Model is a 5×6 matrix, consisting of six areas of practice and five levels of maturity. Each of the matrix’s 30 elements defines a required discipline an organization needs to follow, to be considered at that level of maturity within that practice.

Areas of Practice

The CD Maturity Model examines six broad areas of practice found in most enterprise software organizations:

  • Build Management and Continuous Integration
  • Environments and Deployment
  • Release Management and Compliance
  • Testing
  • Data Management
  • Configuration Management

Levels of Maturity

The CD Maturity Model defines five level of increasing maturity, from a score of -1 to 3, from Regressive to Optimizing:

  • Level 3: Optimizing – Focus on process improvement
  • Level 2: Quantitatively Managed – Process measured and controlled
  • Level 1: Consistent – Automated processes applied across whole application lifecycle
  • Level 0: Repeatable – Process documented and partly automated
  • Level -1: Regressive – Processes unrepeatable, poorly controlled, and reactive

cd_image_06

Maturity Model Analysis

The CD Maturity Model is an analysis tool. In my experience, organizations use the maturity model in one of two ways. First, an organization completes an impartial evaluation of their existing levels of maturity across all areas of practice. Then, the organization focuses on improving the overall organization’s maturity, attempting to achieve a consistent level of maturity across all areas of practice. Alternately, the organization concentrates on a subset of the practices, which have the greatest business value, or given their relative immaturity, are a detriment to the other practices.

cd_image_01

* CD Maturity Model Analysis Tool available on GitHub.

Infrastructure as Code Maturity Levels

Although infrastructure as code is not explicitly called out as a practice in the CD Maturity Model, many of it’s best practices can be found in the maturity model. For example, the model prescribes automated environment provisioning, orchestrated deployments, and the use of metrics for continuous improvement.

Instead of trying to retrofit infrastructure as code into the existing CD Maturity Model, I believe it is more effective to independently apply the model’s five levels of maturity to infrastructure as code. To that end, I have selected many of the best practices from the book, Infrastructure as Code, as well as from my experiences. Those selected practices have been distributed across the model’s five levels of maturity.

The result is the first pass at an evolving Infrastructure as Code Maturity Model. This model may be applied alongside the broader CD Maturity Model, or independently, to evaluate and further develop an organization’s infrastructure practices.

IaC Level -1: Regressive

Processes unrepeatable, poorly controlled, and reactive

  • Limited infrastructure is provisioned and managed as code
  • Infrastructure provisioning still requires many manual processes
  • Infrastructure code is not written using industry-standard tooling and patterns
  • Infrastructure code not built, unit-tested, provisioned and managed, as part of a pipeline
  • Infrastructure code, processes, and procedures are inconsistently documented, and not available to all required parties

IaC Level 0: Repeatable

Processes documented and partly automated

  • All infrastructure code and configuration are stored in a centralized version control system
  • Testing, provisioning, and management of infrastructure are done as part of automated pipeline
  • Infrastructure is deployable as individual components
  • Leverages programmatic interfaces into physical devices
  • Automated security inspection of components and dependencies
  • Self-service CLI or API, where internal customers provision their resources
  • All code, processes, and procedures documented and available
  • Immutable infrastructure and processes

IaC Level 1: Consistent

Automated processes applied across whole application lifecycle

  • Fully automated provisioning and management of infrastructure
  • Minimal use of unsupported, ‘home-grown’ infrastructure tooling
  • Unit-tests meet code-coverage requirements
  • Code is continuously tested upon every check-in to version control system
  • Continuously available infrastructure using zero-downtime provisioning
  • Uses configuration registries
  • Templatized configuration files (no awk/sed magic)
  • Secrets are securely management
  • Auto-scaling based on user-defined load characteristics

IaC Level 2: Quantitatively Managed

Processes measured and controlled

  • Uses infrastructure definition files
  • Capable of automated rollbacks
  • Infrastructure and supporting systems are highly available and fault tolerant
  • Externalized configuration, no black box API to modify configuration
  • Fully monitored infrastructure with configurable alerting
  • Aggregated, auditable infrastructure logging
  • All code, processes, and procedures are well documented in a Knowledge Management System
  • Infrastructure code uses declarative versus imperative programming model, maybe…

IaC Level 3: Optimizing

Focus on process improvement

  • Self-healing, self-configurable, self-optimizing, infrastructure
  • Performance tested and monitored against business KPIs
  • Maximal infrastructure utilization and workload density
  • Adheres to Cloud Native and 12-Factor patterns
  • Cloud-agnostic code that minimizes cloud vendor lock-in

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , ,

4 Comments

Spring Music Revisited: Java-Spring-MongoDB Web App with Docker 1.12

Build, test, deploy, and monitor a multi-container, MongoDB-backed, Java Spring web application, using the new Docker 1.12.

Spring Music Infrastructure

Introduction

** This post and associated project code were updated 9/3/2016 to use Tomcat 8.5.4 with OpenJDK 8.**

This post and the post’s example project represent an update to a previous post, Build and Deploy a Java-Spring-MongoDB Application using Docker. This new post incorporates many improvements made in Docker 1.12, including the use of the new Docker Compose v2 YAML format. The post’s project was also updated to use Filebeat with ELK, as opposed to Logspout, which was used previously.

In this post, we will demonstrate how to build, test, deploy, and manage a Java Spring web application, hosted on Apache Tomcat, load-balanced by NGINX, monitored by ELK with Filebeat, and all containerized with Docker.

We will use a sample Java Spring application, Spring Music, available on GitHub from Cloud Foundry. The Spring Music sample record album collection application was originally designed to demonstrate the use of database services on Cloud Foundry, using the Spring Framework. Instead of Cloud Foundry, we will host the Spring Music application locally, using Docker on VirtualBox, and optionally on AWS.

All files necessary to build this project are stored on the docker_v2 branch of the garystafford/spring-music-docker repository on GitHub. The Spring Music source code is stored on the springmusic_v2 branch of the garystafford/spring-music repository, also on GitHub.

Spring Music Application

Application Architecture

The Java Spring Music application stack contains the following technologies: JavaSpring Framework, AngularJS, Bootstrap, jQueryNGINXApache TomcatMongoDB, the ELK Stack, and Filebeat. Testing frameworks include the Spring MVC Test Framework, Mockito, Hamcrest, and JUnit.

A few changes were made to the original Spring Music application to make it work for this demonstration, including:

  • Move from Java 1.7 to 1.8 (including newer Tomcat version)
  • Add unit tests for Continuous Integration demonstration purposes
  • Modify MongoDB configuration class to work with non-local, containerized MongoDB instances
  • Add Gradle warNoStatic task to build WAR without static assets
  • Add Gradle zipStatic task to ZIP up the application’s static assets for deployment to NGINX
  • Add Gradle zipGetVersion task with a versioning scheme for build artifacts
  • Add context.xml file and MANIFEST.MF file to the WAR file
  • Add Log4j RollingFileAppender appender to send log entries to Filebeat
  • Update versions of several dependencies, including Gradle, Spring, and Tomcat

We will use the following technologies to build, publish, deploy, and host the Java Spring Music application: GradlegitGitHubTravis CIOracle VirtualBoxDockerDocker ComposeDocker MachineDocker Hub, and optionally, Amazon Web Services (AWS).

NGINX
To increase performance, the Spring Music web application’s static content will be hosted by NGINX. The application’s WAR file will be hosted by Apache Tomcat 8.5.4. Requests for non-static content will be proxied through NGINX on the front-end, to a set of three load-balanced Tomcat instances on the back-end. To further increase application performance, NGINX will also be configured for browser caching of the static content. In many enterprise environments, the use of a Java EE application server, like Tomcat, is still not uncommon.

Reverse proxying and caching are configured thought NGINX’s default.conf file, in the server configuration section:

The three Tomcat instances will be manually configured for load-balancing using NGINX’s default round-robin load-balancing algorithm. This is configured through the default.conf file, in the upstream configuration section:

Client requests are received through port 80 on the NGINX server. NGINX redirects requests, which are not for non-static assets, to one of the three Tomcat instances on port 8080.

MongoDB
The Spring Music application was designed to work with a number of data stores, including MySQL, Postgres, Oracle, MongoDB, Redis, and H2, an in-memory Java SQL database. Given the choice of both SQL and NoSQL databases, we will select MongoDB.

The Spring Music application, hosted by Tomcat, will store and modify record album data in a single instance of MongoDB. MongoDB will be populated with a collection of album data from a JSON file, when the Spring Music application first creates the MongoDB database instance.

ELK
Lastly, the ELK Stack with Filebeat, will aggregate NGINX, Tomcat, and Java Log4j log entries, providing debugging and analytics to our demonstration. A similar method for aggregating logs, using Logspout instead of Filebeat, can be found in this previous post.

Kibana 4 Web Console

Continuous Integration

In this post’s example, two build artifacts, a WAR file for the application and ZIP file for the static web content, are built automatically by Travis CI, whenever source code changes are pushed to the springmusic_v2 branch of the garystafford/spring-music repository on GitHub.

Travis CI Output

Following a successful build and a small number of unit tests, Travis CI pushes the build artifacts to the build-artifacts branch on the same GitHub project. The build-artifacts branch acts as a pseudo binary repository for the project, much like JFrog’s Artifactory. These artifacts are used later by Docker to build the project’s immutable Docker images and containers.

Build Artifact Repository

Build Notifications
Travis CI pushes build notifications to a Slack channel, which eliminates the need to actively monitor Travis CI.

Travis CI Slack Notifications

Automation Scripting
The .travis.yaml file, custom gradle.build Gradle tasks, and the deploy_travisci.sh script handles the Travis CI automation described, above.

Travis CI .travis.yaml file:

Custom gradle.build tasks:

The deploy.sh file:

You can easily replicate the project’s continuous integration automation using your choice of toolchains. GitHub or BitBucket are good choices for distributed version control. For continuous integration and deployment, I recommend Travis CI, Semaphore, Codeship, or Jenkins. Couple those with a good persistent chat application, such as Glider Labs’ Slack or Atlassian’s HipChat.

Building the Docker Environment

Make sure VirtualBox, Docker, Docker Compose, and Docker Machine, are installed and running. At the time of this post, I have the following versions of software installed on my Mac:

  • Mac OS X 10.11.6
  • VirtualBox 5.0.26
  • Docker 1.12.1
  • Docker Compose 1.8.0
  • Docker Machine 0.8.1

To build the project’s VirtualBox VM, Docker images, and Docker containers, execute the build script, using the following command: sh ./build_project.sh. A build script is useful when working with CI/CD automation tools, such as Jenkins CI or ThoughtWorks go. However, to understand the build process, I suggest first running the individual commands, locally.

Deploying to AWS
By simply changing the Docker Machine driver to AWS EC2 from VirtualBox, and providing your AWS credentials, the springmusic environment may also be built on AWS.

Build Process
Docker Machine provisions a single VirtualBox springmusic VM on which host the project’s containers. VirtualBox provides a quick and easy solution that can be run locally for initial development and testing of the application.

Next, the script creates a Docker data volume and project-specific Docker bridge network.

Next, using the project’s individual Dockerfiles, Docker Compose pulls base Docker images from Docker Hub for NGINX, Tomcat, ELK, and MongoDB. Project-specific immutable Docker images are then built for NGINX, Tomcat, and MongoDB. While constructing the project-specific Docker images for NGINX and Tomcat, the latest Spring Music build artifacts are pulled and installed into the corresponding Docker images.

Docker Compose builds and deploys (6) containers onto the VirtualBox VM: (1) NGINX, (3) Tomcat, (1) MongoDB, and (1) ELK.

The NGINX Dockerfile:

The Tomcat Dockerfile:

Docker Compose v2 YAML
This post was recently updated for Docker 1.12, and to use Docker Compose v2 YAML file format. The post’s docker-compose.yml takes advantage of improvements in Docker 1.12 and Docker Compose v2 YAML. Improvements to the YAML file include eliminating the need to link containers and expose ports, and the addition of named networks and volumes.

The Results

Spring Music Infrastructure

Below are the results of building the project.

Testing the Application

Below are partial results of the curl test, hitting the NGINX endpoint. Note the different IP addresses in the Upstream-Address field between requests. This test proves NGINX’s round-robin load-balancing is working across the three Tomcat application instances: music_app_1, music_app_2, and music_app_3.

Also, note the sharp decrease in the Request-Time between the first three requests and subsequent three requests. The Upstream-Response-Time to the Tomcat instances doesn’t change, yet the total Request-Time is much shorter, due to caching of the application’s static assets by NGINX.

Spring Music Application Links

Assuming the springmusic VM is running at 192.168.99.100, the following links can be used to access various project endpoints. Note the (3) Tomcat instances each map to randomly exposed ports. These ports are not required by NGINX, which maps to port 8080 for each instance. The port is only required if you want access to the Tomcat Web Console. The port, shown below, 32771, is merely used as an example.

* The Tomcat user name is admin and the password is t0mcat53rv3r.

Helpful Links

TODOs

  • Automate the Docker image build and publish processes
  • Automate the Docker container build and deploy processes
  • Automate post-deployment verification testing of project infrastructure
  • Add Docker Swarm multi-host capabilities with overlay networking
  • Update Spring Music with latest CF project revisions
  • Include scripting example to stand-up project on AWS
  • Add Consul and Consul Template for NGINX configuration

, , , , , , , , , , , ,

11 Comments