Posts Tagged docker

Docker Log Aggregation and Visualization Options with the ELK Stack

elk

As a Developer and DevOps Engineer, it wasn’t that long ago, I spent a lot of time requesting logs from Operations teams for applications running in Production. Many organizations I’ve worked with have created elaborate systems for requesting, granting, and revoking access to application logs. Requesting and obtaining access to logs typically took hours or days, or simply never got approved. Since most enterprise applications are composed of individual components running on multiple application and web servers, it was necessary to request multiple logs. What was often a simple problem to diagnose and fix, became an unnecessarily time-consuming ordeal.

Hopefully, you are still not in this situation. Given the average complexity of today’s modern, distributed, containerized application platforms, accessing individual logs is simply unrealistic and ineffective. The solution is log aggregation and visualization.

Log Aggregation and Visualization

In the context of this post, log aggregation and visualization is defined as the collection, centralized storage, and ability to simultaneously display application logs from multiple, dissimilar sources. Take a typical modern web application. The frontend UI might be built with Angular, React, or Node. The UI is likely backed by multiple RESTful services, possibly built in Java Spring Boot or Python Flask, and a database or databases, such as MongoDB or MySQL. To support the application, there are auxiliary components, such as API gateways, load-balancers, and messaging brokers. These components are likely deployed as multiple instances, for performance and availability. All instances generate application logs in varying formats.

When troubleshooting an application, such as the one described above, you must often trace a user’s transaction from UI through firewalls and gateways, to the web server, back through the API gateway, to multiple backend services via load-balancers, through message queues, to databases, possibly to external third-party APIs, and back to the client. This is why log aggregation and visualization is essential.

Logging Options

Log aggregation and visualization solutions typically come in three varieties: cloud-hosted by a SaaS provider, a service provided by your Cloud provider, and self-hosted, either on-premises or in the cloud. Cloud-hosted SaaS solutions include Loggly, Splunk, Logentries, and Sumo Logic. Some of these solutions, such as Splunk, are also available as a self-hosted service. Cloud-provider solutions include AWS CloudWatch and Azure Application Insights. Most hosted solutions have reoccurring pricing models based on the volume of logs or the number of server nodes being monitored.

Self-hosted solutions include Graylog 2, Nagios Log Server, Splunk Free, and elastic’s Elastic Stack. The ELK Stack (Elasticsearch, Logstash, and Kibana), as it was previously known, has been re-branded the Elastic Stack, which now includes Beats. Beats is elastic’s lightweight shipper that send data from edge machines to Logstash and Elasticsearch.

Often, you will see other components mentioned in the self-hosted space, such as Fluentd, syslog, and Kafka. These are examples of log aggregators or datastores for logs. They lack the combined abilities to collect, store, and display multiple logs. These components are generally part of a larger log aggregation and visualization solution.

This post will explore self-hosted log aggregation and visualization of a Dockerized application on AWS, using the ELK Stack. The post details three common variations of log collection and routing to ELK, using various Docker logging drivers, along with Logspout, Fluentd, and GELF (Graylog Extended Log Format).

Docker Swarm Cluster

The post’s example application is deployed to a Docker Swarm, built on AWS, using Docker CE for AWS. Docker has automated the creation of a Swarm on AWS using Docker Cloud, right from your desktop. Creating a Swarm is as easy as inputting a few options and clicking build. Docker uses an AWS CloudFormation script to provision all the necessary AWS resources for the Docker Swarm.

swam_mode

For this post’s logging example, I built a minimally configured Docker Swarm cluster, consisting of a single Manager Node and three Worker Nodes. The four Swarm nodes, all EC2 instances, are behind an AWS ELB, inside a new AWS VPC.

Logging Diagram AWS Diagram 3D

As seen with the docker node ls command, the Docker Swarm will look similar to the following.

Sample Application Components

Multiple containerized copies of a simple Java Spring Boot RESTful Hello-World service, available on GitHub, along with the associated logging aggregators, are deployed to Worker Node 1 and Worker Node 2. We will explore each of these application components later in the post. The containerized components consist of the following:

  1. Fluentd (garystafford/custom-fluentd)
  2. Logspout (garystafford/custom-logspout)
  3. NGINX (garystafford/custom-nginx)
  4. Hello-World Service using Docker’s default JSON file logging driver
  5. Hello-World Service using Docker’s GELF logging driver
  6. Hello-World Service using Docker’s Fluentd logging driver

NGINX is used as a simple frontend API gateway, which to routes HTTP requests to each of the three logging variations of the Hello-World service (garystafford/hello-world).

A single container, running the entire ELK Stack (garystafford/custom-elk) is deployed to Worker Node 3. This is to isolate the ELK Stack from the application. Typically, in a real environment, ELK would be running on separate infrastructure for performance and security, not alongside your application. Running a docker service ls, the deployed services appear as follows.

Portainer

A single instance of Portainer (Docker Hub: portainer/portainer) is deployed on the single Manager Node. Portainer, amongst other things, provides a detailed view of Docker Swarm, showing each Swarm Node and the service containers deployed to them.

portainer

In my opinion, Portainer provides a much better user experience than Docker Enterprise Edition’s most recent Universal Control Plane (UCP). In the past, I have also used Visualizer (dockersamples/visualizer), one of the first open source solutions in this space. However, since the Visualizer project moved to Docker, it seems like the development of new features has completely stalled out. A good list of container tools can be found on StackShare.

Deployment

All the Docker service containers are deployed to the AWS-based Docker Swarm using a single Docker Compose file. The order of service startup is critical. ELK should fully startup first, followed by Fluentd and Logspout, then the three sets of Hello-World instances, and finally NGINX.

To deploy and start all the Docker services correctly, there are two scripts in the GitHub repository. First, execute the following command, sh ./stack_deploy.sh. This will deploy the Docker service stack and create an overlay network, containing all the services as configured in the docker-compose.yml file. Then, to ensure the services start in the correct sequence, execute sh ./service_update.sh. This will restart each service in the correct order, with pauses between services to allow time for startup; a bit of a hack, but effective.

Collection and Routing Examples

Below is a diagram showing all the components comprising this post’s examples, and includes the protocols and ports on which they communicate. Following, we will look at three variations of self-hosted log collection and routing options for ELK.

Logging Diagram

Example 1: Fluentd

The first example of log aggregation and visualization uses Fluentd, a Cloud Native Computing Foundation (CNCF) hosted project. Fluentd is described as ‘an open source data collector for unified logging layer.’ A container running Fluentd with a custom configuration runs globally on each Worker Node where the applications are deployed, in this case, the hello-fluentd Docker service. Here is the custom Fluentd configuration file (fluent.conf):

The Hello-World service is configured through the Docker Compose file to use the Fluentd Docker logging driver. The log entries from the Hello-World containers on the Worker Nodes are diverted from being output to JSON files, using the default JSON file logging driver, to the Fluentd container instance on the same host as the Hello-World container. The Fluentd container is listening for TCP traffic on port 24224.

Fluentd then sends the individual log entries to Elasticsearch directly, bypassing Logstash. Fluentd log entries are sent via HTTP to port 9200, Elasticsearch’s JSON interface.

Logging Diagram Fluentd

Using Fluentd as a transport method, log entries appear as JSON documents in ELK, as shown below. This Elasticsearch JSON document is an example of a single line log entry. Note the primary field container identifier, when using Fluentd, is container_id. This field will vary depending on the Docker driver and log collector, as seen in the next two logging examples.

fluentd-log.png

The next example shows a Fluentd multiline log entry. Using the Fluentd Concat filter plugin (fluent-plugin-concat), the individual lines of a stack trace from a Java runtime exception, thrown by the hello-fluentd Docker service, have been recombined into a single Elasticsearch JSON document.

fluentd-multiline

In the above log entries, note the DEPLOY_ENV and SERVICE_NAME fields. These values were injected into the Docker Compose file, as environment variables, during deployment of the Hello-World service. The Fluentd Docker logging driver applies these as env options, as shown in the example Docker Compose snippet, below, lines 5-9.

Example 2: Logspout

The second example of log aggregation and visualization uses GliderLabs’ Logspout. Logspout is described by GliderLabs as ‘a log router for Docker containers that runs inside Docker. It attaches to all containers on a host, then routes their logs wherever you want. It also has an extensible module system.’ In the post’s example, a container running Logspout with a custom configuration runs globally on each Worker Node where the applications are deployed, identical to Fluentd.

The hello-logspout Docker service is configured through the Docker Compose file to use the default JSON file logging driver. According to Docker, ‘by default, Docker captures the standard output (and standard error) of all your containers and writes them in files using the JSON format. The JSON format annotates each line with its origin (stdout or stderr) and its timestamp. Each log file contains information about only one container.

Normally, it is not necessary to explicitly set the default Docker logging driver to JSON files. However, in this case, Docker CE for AWS automatically configured each Swarm Nodes Docker daemon default logging driver to Amazon CloudWatch Logs logging driver. The default drive may be seen by running the docker info command while attached to the Docker daemon. Note line 12 in the snippet below.

The hello-fluentd Docker service containers on the Worker Nodes send log entries to individual JSON files. The Fluentd container on each host then retrieves and routes those JSON log entries to Logstash, within the ELK container running on Worker Node 3, over UDP to port 5000. Logstash, which is explicitly listening for JSON via UDP on port 5000, then outputs those log entries to Elasticsearch, via HTTP to port 9200, Elasticsearch’s JSON interface.

Logging Diagram Logspout

Using Logspout as a transport method, log entries appear as JSON documents in ELK, as shown below. Note the field differences between the Fluentd log entry above and this entry. There are a number significant variations, making it difficult to use both methods, across the same distributed application. For example, the main body of the log entry is contained in the message field using Logspout, but in the log field using Fluentd. The name of the Docker container, which serves as the primary means of identifying the container instance, is the docker.name field with Logspout, but container.name for Fluentd.

Another helpful field, provided by Logspout, is the docker.image field. This is beneficial when associating code issues to a particular code release. In this example, the Hello-World service uses the latest Docker image tag, which is not considered best practice. However, in a real production environment, the Docker tags often represents the incremental build number from the CI/CD system, which is tied to a specific build of the code.

logspout-logThe other challenge I have had with Logspout is passing the env and tag options, such as DEPLOY_ENV and SERVICE_NAME, as seen previously with the Fluentd example. Note they are blank in the above sample. It is possible, but not as straightforward as with Fluentd, and requires interacting directly with the Docker daemon on each Worker node.

Example 3: Graylog Extended Format (GELF)

The third and final example of log aggregation and visualization uses the Docker Graylog Extended Format (GELF) logging driver. According to the GELF website, ‘the Graylog Extended Log Format (GELF) is a log format that avoids the shortcomings of classic plain syslog.’ These syslog shortcomings include a maximum length of 1024 bytes, no data types, multiple dialects making parsing difficult, and no compression.

The GELF format, designed to work with the Graylog Open Source Log Management Server, work equally as well with the ELK Stack. With the GELF logging driver, there is no intermediary logging collector and router, as with Fluentd and Logspout. The hello-gelf Docker service is configured through its Docker Compose file to use the GELF logging driver. The two hello-gelf Docker service containers on the Worker Nodes send log entries directly to Logstash, running within the ELK container, running on Worker Node 3, via UDP to port 12201.

Logstash, which is explicitly listening for UDP traffic on port 12201, then outputs those log entries to Elasticsearch, via HTTP to port 9200, Elasticsearch’s JSON interface.

Logging Diagram GELF

Using the Docker Graylog Extended Format (GELF) logging driver as a transport method, log entries appear as JSON documents in ELK, as shown below. They are the most verbose of the three formats.

gelf-logAgain, note the field differences between the Fluentd and Logspout log entries above, and this GELF entry. Both the field names of the main body of the log entry and the name of the Docker container are different from both previous examples.

Another bonus with GELF, each entry contains the command field, which stores the command used to start the container’s process. This can be helpful when troubleshooting application startup issues. Often, the exact container startup command might have been injected into the Docker Compose file at deploy time by the CI Server and contained variables, as is the case with the Hello-World service. Reviewing the log entry in Kibana for the command is much easier and safer than logging into the container and executing commands to check the running process for the startup command.

Unlike Logspout, and similar to Fluentd, note the DEPLOY_ENV and SERVICE_NAME fields are present in the GELF entry. These were injected into the Docker Compose file as environment variables during deployment of the Hello-World service. The GELF Docker logging driver applies these as env options. With GELF the entry also gets the optional tag, which was passed in the Docker Compose file’s service definition, tag: docker.{{.Name}}.

Unlike Fluentd, GELF and Logspout do not easily handle multiline logs. Below is an example of a multiline Java runtime exception thrown by the hello-gelf Docker service. The stack trace is not recombined into a single JSON document in Elasticsearch, like in the Fluentd example. The stack trace exists as multiple JSON documents, making troubleshooting much more difficult. Logspout entries will look similar to GELF.

gelf-multiline

Pros and Cons

In my opinion, and based on my level of experience with each of the self-hosted logging collection and routing options, the following some of their pros and cons.

Fluentd

  • Pros
    • Part of CNCF, Fluentd is becoming the defacto logging standard for cloud-native applications
    • Easily extensible via a large number of plugins
    • Easily containerized
    • Ability to easily handle multiline log entries (ie. Java stack trace)
    • Ability to use the Fluentd container’s service name as the Fluentd address, not an IP address or DNS resolvable hostname
  • Cons
    • Using Docker’s Fluentd logging driver, if the Fluentd container is not available on the container’s host, the container logging to Fluentd will fail (major con!)

Logspout

  • Pros
    • Doesn’t require a change to the default Docker JSON file logging driver, logs are still viewable via docker logs command (big plus!)
    • Easily to add and remove functionality via Golang modules
    • Easily containerized
  • Cons
    • Inability to easily handle multiline log entries (ie. Java stack trace)
    • Logspout containers must be restarted if ELK is restarted to restart logging
    • To reach Logstash, Logspout must use a DNS resolvable hostname or IP address, not the name of the ELK container on the same overlay network (big con!)

GELF

  • Pros
    • Application containers, using Docker GELF logging driver will not fail if downstream Logspout container is unavailable
    • Docker GELF logging driver allows compression of logs for shipment to Logspout
  • Cons
    • Inability to easily handle multiline log entries (ie. Java stack trace)

Conclusion

Of course, there are other self-hosted logging collection and routing options, including elastic’s Beats, journald, and various syslog servers. Each has their pros and cons, depending on your project’s needs. After building and maintaining several self-hosted mission-critical log aggregation and visualization solutions, it is easy to see the appeal of an off-the-shelf cloud-hosted SaaS solution such as Splunk or Cloud provider solutions such as Application Insights.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , , ,

Leave a comment

Docker Enterprise Edition: Multi-Environment, Single Control Plane Architecture for AWS

Final_DockerEE_21 (1)

Designing a successful, cloud-based containerized application platform requires a balance of performance and security with cost, reliability, and manageability. Ensuring that a platform meets all functional and non-functional requirements, while remaining within budget and is easily maintainable, can be challenging.

As Cloud Architect and DevOps Team Lead, I recently participated in the development of two architecturally similar, lightweight, cloud-based containerized application platforms. From the start, both platforms were architected to maximize security and performance, while minimizing cost and operational complexity. The later platform was built on AWS with Docker Enterprise Edition.

Docker Enterprise Edition

Released in March of this year, Docker Enterprise Edition (Docker EE) is a secure, full-featured container-based management platform. There are currently eight versions of Docker EE, available for Windows Server, Azure, AWS, and multiple Linux distros, including RHEL, CentOS, Ubuntu, SUSE, and Oracle.

Docker EE is one of several production-grade container orchestration Platforms as a Service (PaaS). Some of the other container platforms in this category include:

Docker Community Edition (CE), Kubernetes, and Apache Mesos are free and open-source. Some providers, such as Rancher Labs, offer enterprise support for an additional fee. Cloud-based services, such as Red Hat Openshift Online, AWS, GCE, and ACS, charge the typical usage monthly fee. Docker EE, similar to Mesosphere Enterprise DC/OS and Red Hat OpenShift, is priced on a per node/per year annual subscription model.

Docker EE is currently offered in three subscription tiers, including Basic, Standard, and Advanced. Additionally, Docker offers Business Day and Business Critical support. Docker EE’s Advanced Tier adds several significant features, including secure multi-tenancy with node-based isolation, and image security scanning and continuous vulnerability scanning, as part of Docker EE’s Docker Trusted Registry.

Architecting for Affordability and Maintainability

Building an enterprise-scale application platform, using public cloud infrastructure, such as AWS, and a licensed Containers-as-a-Service (CaaS) platform, such as Docker EE, can quickly become complex and costly to build and maintain. Based on current list pricing, the cost of a single Linux node ranges from USD 75 per month for basic support, up to USD 300 per month for Docker Enterprise Edition Advanced with Business Critical support. Although cost is relative to the value generated by the application platform, none the less, architects should always strive to avoid unnecessary complexity and cost.

Reoccurring operational costs, such as licensed software subscriptions, support contracts, and monthly cloud-infrastructure charges, are often overlooked by project teams during the build phase. Accurately forecasting reoccurring costs of a fully functional Production platform, under expected normal load, is essential. Teams often overlook how Docker image registries, databases, data lakes, and data warehouses, quickly swell, inflating monthly cloud-infrastructure charges to maintain the platform. The need to control cloud costs have led to the growth of third-party cloud management solutions, such as CloudCheckr Cloud Management Platform (CMP).

Shared Docker Environment Model

Most software development projects require multiple environments in which to continuously develop, test, demonstrate, stage, and release code. Creating separate environments, replete with their own Docker EE Universal Control Plane (aka Control Plane or UCP), Docker Trusted Registry (DTR), AWS infrastructure, and third-party components, would guarantee a high-level of isolation and performance. However, replicating all elements in each environment would add considerable build and run costs, as well as unnecessary complexity.

On both recent projects, we choose to create a single AWS Virtual Private Cloud (VPC), which contained all of the non-production environments required by our project teams. In parallel, we built an entirely separate Production VPC for the Production environment. I’ve seen this same pattern repeated with Red Hat OpenStack and Microsoft Azure.

Production

Isolating Production from the lower environments is essential to ensure security, and to eliminate non-production traffic from impacting the performance of Production. Corporate compliance and regulatory policies often dictate complete Production isolation. Having separate infrastructure, security appliances, role-based access controls (RBAC), configuration and secret management, and encryption keys and SSL certificates, are all required.

For complete separation of Production, different AWS accounts are frequently used. Separate AWS accounts provide separate billing, usage reporting, and AWS Identity and Access Management (IAM), amongst other advantages.

Performance and Staging

Unlike Production, there are few reasons to completely isolate lower-environments from one another. The exception I’ve encountered is Performance and Staging. These two environments are frequently separated from other environments to ensure the accuracy of performance testing and release staging activities. Performance testing, in particular, can generate enormous load on systems, which if not isolated, will impair adjacent environments, applications, and monitoring systems.

On a few recent projects, to reduce cost and complexity, we repurposed the UAT environment for performance testing, once user-acceptance testing was complete. Performance testing was conducted during off-peak development and testing periods, with access to adjacent environments blocked.

The multi-purpose UAT environment further served as a Staging environment. Applications were deployed and released to the UAT and Performance environments, following a nearly-identical process used for Production. Hotfixes to Production were also tested in this environment.

Example of Shared Environments

To demonstrate how to architect a shared non-production Docker EE environment, which minimizes cost and complexity, let’s examine the example shown below. In the example, built on AWS with Docker EE, there are four typical non-production environments, CI/CD, Development, Test, and UAT, and one Production environment.

Final_DockerEE_17

In the example, there are two separate VPCs, the Production VPC, and the Non-Production VPC. There is no reason to configure VPC Peering between the two VPCs, as there is no need for direct communication between the two. Within the Non-Production VPC, to the left in the diagram, there is a cluster of three Docker EE UCP Manager EC2 nodes, a cluster of three DTR Worker EC2 nodes, and the four environments, consisting of varying numbers of EC2 Worker nodes. Production, to the right of the diagram, has its own cluster of three UCP Manager EC2 nodes and a cluster of six EC2 Worker nodes.

Single Non-Production UCP

As a primary means of reducing cost and complexity, in the example, a single minimally-sized Docker EE UCP cluster of three Manager nodes orchestrate activities across all four non-production environments. Alternately, you would have to create a UCP cluster for each environment; that means nine more Worker Nodes to configure and maintain.

The UCP users, teams, organizations, access controls, Docker Secrets, overlay networks, and other UCP features, for all non-production environments, are managed through the single Control Plane. All deployments to all the non-production environments, from the CI/CD server, are performed through the single Control Plane. Each UCP Manager node is deployed to a different AWS Availability Zone (AZ) to ensure high-availability.

Shared DTR

As another means of reducing cost and complexity, in the example, a Docker EE DTR cluster of three Worker nodes contain all Docker image repositories. Both the non-production and the Production environments use this DTR as a secure source of all Docker images. Not having to replicate image repositories, access controls, infrastructure, and figuring out how to migrate images between two separate DTR clusters, is a significant time, cost, and complexity savings. Each DTR Worker node is also deployed to a different AZ to ensure high-availability.

Using a shared DTR between non-production and Production is an important security consideration your project team needs to consider. A single DTR, shared between non-production and Production, comes with inherent availability and security risks, which should be understood in advance.

Separate Non-Production Worker Nodes

In the shared non-production environments example, each environment has dedicated AWS EC2 instances configured as Docker EE Worker nodes. The number of Worker nodes is determined by the requirements for each environment, as dictated by the project’s Development, Testing, Security, and DevOps teams. Like the UCP and DTR clusters, each Worker node, within an individual environment, is deployed to a different AZ to ensure high-availability and mimic the Production architecture.

Minimizing the number of Worker nodes in each environment, as well as the type and size of each EC2 node, offers a significant potential cost and administrative savings.

Separate Environment Ingress

In the example, the UCP, DTR, and each of the four environments is accessed through separate URLs, using AWS Hosted Zone CNAME records (subdomains).

Final_DockerEE_16

Encrypted HTTPS traffic is routed through a series of security appliances, depending on traffic type, to individual private AWS Elastic Load Balancers (ELB), one for both UCPs, the DTR, and each of the environments. Each ELB load-balances traffic to the Docker EE nodes associated the specific traffic. All firewalls, ELBs, and the UCP and DTR are secured with a high-grade wildcard SSL certificate.

AWS_ELB

Separate Data Sources

In the shared non-production environments example, there is one Amazon Relational Database Service‎ (RDS) instance in non-Production and one Production. Both RDS instances are replicated across multiple Availability Zones. Within the single shared non-production RDS instance, there are four separate databases, one per non-production environment. This architecture sacrifices the potential database performance of separate RDS instances for additional cost and complexity.

Maintaining Environment Separation

Node Labels

To obtain sufficient environment separation while using a single UCP, each Docker EE Worker node is tagged with an environment node label. The node label indicates which environment the Worker node is associated with. For example, in the screenshot below, a Worker node is assigned to the Development environment by tagging it with the key of environment and the value of dev.

Node_Label

* The Docker EE screens shown here are from UCP 2.1.5, not the recently released 2.2.x, which has an updated UI appearance.Each service’s Docker Compose file uses deployment placement constraints, which indicate where Docker should or should not deploy services. In the hello-world Docker Compose file example below, the node.labels.environment constraint is set to the ENVIRONMENT variable, which is set during container deployment by the CI/CD server. This constraint directs Docker to only deploy the hello-world service to nodes which contain the placement constraint of node.labels.environment, whose value matches the ENVIRONMENT variable value.

Deploying from CI/CD Server

The ENVIRONMENT value is set as an environment variable, which is then used by the CI/CD server, running a docker stack deploy or a docker service update command, within a deployment pipeline. Below is an example of how to use the environment variable as part of a Jenkins pipeline as code Jenkinsfile.

Centralized Logging and Metrics Collection

Centralized logging and metrics collection systems are used for application and infrastructure dashboards, monitoring, and alerting. In the shared non-production environment examples, the centralized logging and metrics collection systems are internal to each VPC, but reside on separate EC2 instances and are not registered with the Control Plane. In this way, the logging and metrics collection systems should not impact the reliability, performance, and security of the applications running within Docker EE. In the example, Worker nodes run a containerized copy of fluentd, which collects and pushes logs to ELK’s Elasticsearch.

Logging and metrics collection systems could also be supplied by external cloud-based SaaS providers, such as LogglySysdig and Datadog, or by the platform’s cloud-provider, such as Amazon CloudWatch.

With four environments running multiple containerized copies of each service, figuring out which log entry came from which service instance, requires multiple data points. As shown in the example Kibana UI below, the environment value, along with the service name and container ID, as well as the git commit hash and branch, are added to each log entry for easier troubleshooting. To include the environment, the value of the ENVIRONMENT variable is passed to Docker’s fluentd log driver as an env option. This same labeling method is used to tag metrics.

ELK

Separate Docker Service Stacks

For further environment separation within the single Control Plane, services are deployed as part of the same Docker service stack. Each service stack contains all services that comprise an application running within a single environment. Multiple stacks may be required to support multiple, distinct applications within the same environment.

For example, in the screenshot below, a hello-world service container, built with a Docker image, tagged with build 59 of the Jenkins continuous integration pipeline, is deployed as part of both the Development (dev) and Test service stacks. The CD and UAT service stacks each contain different versions of the hello-world service.

Hello-World-UCP

Separate Docker Overlay Networks

For additional environment separation within the single non-production UCP, all Docker service stacks associated with an environment, reside on the same Docker overlay network. Overlay networks manage communications among the Docker Worker nodes, enabling service-to-service communication for all services on the same overlay network while isolating services running on one network from services running on another network.

in the example screenshot below, the hello-world service, a member of the test service stack, is running on the test_default overlay network.

Network

Cleaning Up

Having distinct environment-centric Docker service stacks and overlay networks makes it easy to clean up an environment, without impacting adjacent environments. Both service stacks and overlay networks can be removed to clear an environment’s contents.

Separate Performance Environment

In the alternative example below, a Performance environment has been added to the Non-Production VPC. To ensure a higher level of isolation, the Performance environment has its own UPC, RDS, and ELBs. The Performance environment shares the DTR, as well as the security, logging, and monitoring components, with the rest of the non-production environments.

Below, the Performance environment has half the number of Worker nodes as Production. Performance results can be scaled for expected Production performance, given more nodes. Alternately, the number of nodes can be scaled up temporarily to match Production, then scaled back down to a minimum after testing is complete.

Final_DockerEE_20

Shared DevOps Tooling

All environments leverage shared Development and DevOps resources, deployed to a separate VPC. Resources include Agile Application Lifecycle Management (ALM), such as JIRA or CA Agile Central, source control repository management (SCM), such as GitLab or Bitbucket, binary repository management, such as Artifactory or Nexus, and a CI/CD solution, such as Jenkins, TeamCity, or Bamboo.

From the DevOps VPC, Docker images are pushed and pulled from the DTR in the Non-Production VPC. Deployments of container-based application are executed from the DevOps VPC CI/CD server to the non-production, Performance, and Production UCPs. Separate DevOps CI/CD pipelines and access controls are essential in maintaining the separation of the non-production and Production environments.

Final_DockerEE_22

Complete Platform

Several common components found in a Docker EE cloud-based AWS platform were discussed in the post. However, a complete AWS application platform has many more moving parts. Below is a comprehensive list of components, including DevOps tooling, organized into two categories: 1) common components that can be potentially shared across the non-production environments to save cost and complexity, and 2) components that should be replicated in each non-environment for security and performance.

Shared Non-Production Components:

  • AWS
    • Virtual Private Cloud (VPC), Region, Availability Zones
    • Route Tables, Network ACLs, Internet Gateways
    • Subnets
    • Some Security Groups
    • IAM Groups, User, Roles, Policies (RBAC)
    • Relational Database Service‎ (RDS)
    • ElastiCache
    • API Gateway, Lambdas
    • S3 Buckets
    • Bastion Servers, NAT Gateways
    • Route 53 Hosted Zone (Registered Domain)
    • EC2 Key Pairs
    • Hardened Linux AMI
  • Docker EE
    • UCP and EC2 Manager Nodes
    • DTR and EC2 Worker Nodes
    • UCP and DTR Users, Teams, Organizations
    • DTR Image Repositories
    • Secret Management
  • Third-Party Components/Products
    • SSL Certificates
    • Security Components: Firewalls, Virus Scanning, VPN Servers
    • Container Security
    • End-User IAM
    • Directory Service
    • Log Aggregation
    • Metric Collection
    • Monitoring, Alerting
    • Configuration and Secret Management
  • DevOps
    • CI/CD Pipelines as Code
    • Infrastructure as Code
    • Source Code Repositories
    • Binary Artifact Repositories

Isolated Non-Production Components:

  • AWS
    • Route 53 Hosted Zones and Associated Records
    • Elastic Load Balancers (ELB)
    • Elastic Compute Cloud (EC2) Worker Nodes
    • Elastic IPs
    • ELB and EC2 Security Groups
    • RDS Databases (Single RDS Instance with Separate Databases)

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , ,

Leave a comment

Eventual Consistency: Decoupling Microservices with Spring AMQP and RabbitMQ

RabbitMQEnventCons.png

Introduction

In a recent post, Decoupling Microservices using Message-based RPC IPC, with Spring, RabbitMQ, and AMPQ, we moved away from synchronous REST HTTP for inter-process communications (IPC) toward message-based IPC. Moving to asynchronous message-based communications allows us to decouple services from one another. It makes it easier to build, test, and release our individual services. In that post, we did not achieve fully asynchronous communications. Although, we did achieve a higher level of service decoupling using message-based Remote Procedure Call (RPC) IPC.

In this post, we will fully decouple our services using the distributed computing model of eventual consistency. More specifically, we will use a message-based, event-driven, loosely-coupled, eventually consistent architectural approach for communications between services.

What is eventual consistency? One of the best definitions of eventual consistency I have read was posted on microservices.io. To paraphrase, ‘using an event-driven, eventually consistent approach, each service publishes an event whenever it updates its data. Other services subscribe to events. When an event is received, a service updates its data.

Example of Eventual Consistency

Imagine Service A, the Customer service, inserts a new customer record into its database. Based on that ‘customer created’ event, Service A publishes a message containing the new customer object, serialized to JSON, to a lightweight persistent message queue.

Service B, the new customer Onboarding service, a subscriber to that queue, consumes Service A’s message. Service B then executes that same CRUD operation, inserting the same new customer record into its database.

In the above example, it can be said that the customer records in Service B’s database are eventually consistent with the customer records in Service A’s database. Service A makes a change and publishes a message in response to the event. Service B consumes the message and makes the same change. Eventually (likely milliseconds) Service B’s customer records are consistent with Service A’s customer records.

Why Eventual Consistency?

So what does this apparent added complexity and duplication of data buy us? Consider the advantages. Service B, the Onboarding service, requires no knowledge of, or a dependency on, Service A, the Customer service. Still, Service B has a current record of all the customers that Service A maintains. Instead of making repeated and potentially costly REST HTTP call or RPC message-based call to or from Service A to Service B for new customers, Service B queries its database for a list of customers.

The value of eventual consistency increases factorially as you scale a distributed system. Imagine dozens of distinct microservices, many requiring data from other microservices. Further, imagine multiple instances of each of those services all running in parallel. Decoupling services from one another, through asynchronous forms of IPC, messaging, and event-driven eventual consistency greatly simplifies the software development lifecycle and operations.

Demonstration

In this post, we could use a few different architectural patterns to demonstrate message passing with RabbitMQ and Spring AMQP. They including Work Queues, Publish/Subscribe, Routing, or Topics. To keep things as simple as possible, we will have a single Producer, publish messages to a single durable and persistent message queue. We will have a single Subscriber, a Consumer, consume the messages from that queue. We focus on a single type of event message.

Sample Code

To demonstrate Spring AMQP-based messaging with RabbitMQ, we will use a reference set of three Spring Boot microservices. The Election ServiceCandidate Service, and Voter Service are all backed by MongoDB. The services and MongoDB, along with RabbitMQ and Voter API Gateway, are all part of the Voter API.

The Voter API Gateway, based on HAProxy, serves as a common entry point to all three services, as well as serving as a reverse proxy and load balancer. The API Gateway provides round-robin load-balanced access to multiple instances of each service.

Voter_API_Architecture

All the source code found this post’s example is available on GitHub, within a few different project repositories. The Voter Service repository contains the Voter service source code, along with the scripts and Docker Compose files required to deploy the project. The Election Service repository, Candidate Service repository, and Voter API Gateway repository are also available on GitHub. There is also a new AngularJS/Node.js Web Client, to demonstrate how to use the Voter API.

For this post, you only need to clone the Voter Service repository.

Deploying Voter API

All components, including the Spring Boot services, MongoDB, RabbitMQ, API Gateway, and the Web Client, are individually deployed using Docker. Each component is publicly available as a Docker Image, on Docker Hub. The Voter Service repository contains scripts to deploy the entire set of Dockerized components, locally. The repository also contains optional scripts to provision a Docker Swarm, using Docker’s newer swarm mode, and deploy the components. We will only deploy the services locally for this post.

To clone and deploy the components locally, including the Spring Boot services, MongoDB, RabbitMQ, and the API Gateway, execute the following commands. If this is your first time running the commands, it may take a few minutes for your system to download all the required Docker Images from Docker Hub.

If everything was deployed successfully, you should observe six running Docker containers, similar to the output, below.

Using Voter API

The Voter Service, Election Service, and Candidate Service GitHub repositories each contain README files, which detail all the API endpoints each service exposes, and how to call them.

In addition to casting votes for candidates, the Voter service can simulate election results. Calling the /simulation endpoint, and indicating the desired election, the Voter service will randomly generate a number of votes for each candidate in that election. This will save us the burden of casting votes for this demonstration. However, the Voter service has no knowledge of elections or candidates. The Voter service depends on the Candidate service to obtain a list of candidates.

The Candidate service manages electoral candidates, their political affiliation, and the election in which they are running. Like the Voter service, the Candidate service also has a /simulation endpoint. The service will create a list of candidates based on the 2012 and 2016 US Presidential Elections. The simulation capability of the service saves us the burden of inputting candidates for this demonstration.

The Election service manages elections, their polling dates, and the type of election (federal, state, or local). Like the other services, the Election service also has a /simulation endpoint, which will create a list of sample elections. The Election service will not be discussed in this post’s demonstration. We will examine communications between the Candidate and Voter services, only.

REST HTTP Endpoint

As you recall from our previous post, Decoupling Microservices using Message-based RPC IPC, with Spring, RabbitMQ, and AMPQ, the Voter service exposes multiple, almost identical endpoints. Each endpoint uses a different means of IPC to retrieve candidates and generates random votes.

Calling the /voter/simulation/election/{election} endpoint and providing a specific election, prompts the Voter service to request a list of candidates from the Candidate service, based on the election parameter you input. This request is done using synchronous REST HTTP. The Voter service uses the HTTP GET method to request the data from the Candidate service. The Voter service then waits for a response.

The Candidate service receives the HTTP request. The Candidate service responds to the Voter service with a list of candidates in JSON format. The Voter service receives the response payload containing the list of candidates. The Voter service then proceeds to generate a random number of votes for each candidate in the list. Finally, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message-based RPC Endpoint

Similarly, calling the /voter/simulation/rpc/election/{election} endpoint and providing a specific election, prompts the Voter service to request the same list of candidates. However, this time, the Voter service (the client) produces a request message and places in RabbitMQ’s voter.rpc.requests queue. The Voter service then waits for a response. The Voter service has no direct dependency on the Candidate service; it only depends on a response to its request message. In this way, it is still a form of synchronous IPC, but the Voter service is now decoupled from the Candidate service.

The request message is consumed by the Candidate service (the server), who is listening to that queue. In response, the Candidate service produces a message containing the list of candidates serialized to JSON. The Candidate service (the server) sends a response back to the Voter service (the client) through RabbitMQ. This is done using the Direct reply-to feature of RabbitMQ or using a unique response queue, specified in the reply-to header of the request message, sent by the Voter Service.

The Voter service receives the message containing the list of candidates. The Voter service deserializes the JSON payload to candidate objects. The Voter service then proceeds to generate a random number of votes for each candidate in the list. Finally, identical to the previous example, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters database.

New Endpoint

Calling the new /voter/simulation/db/election/{election} endpoint and providing a specific election, prompts the Voter service to query its own MongoDB database for a list of candidates.

But wait, where did the candidates come from? The Voter service didn’t call the Candidate service? The answer is message-based eventual consistency. Whenever a new candidate is created, using a REST HTTP POST request to the Candidate service’s /candidate/candidates endpoint, a Spring Data Rest Repository Event Handler responds. Responding to the candidate created event, the event handler publishes a message, containing a serialized JSON representation of the new candidate object, to a durable and persistent RabbitMQ queue.

The Voter service is listening to that queue. The Voter service consumes messages off the queue, deserializes the candidate object, and saves it to its own voters database, to the candidate collection. For this example, we are saving the incoming candidate object as is, with no transformations. The candidate object model for both services is identical.

When /voter/simulation/db/election/{election} endpoint is called, the Voter service queries its voters database for a list of candidates. They Voter service then proceeds to generate a random number of votes for each candidate in the list. Finally, identical to the previous two examples, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message_Queue_Diagram_Final3B

Exploring the Code

We will not review the REST HTTP or RPC IPC code in this post. It was covered in detail, in the previous post. Instead, we will explore the new code required for eventual consistency.

Spring Dependencies

To use AMQP with RabbitMQ, we need to add a project dependency on org.springframework.boot.spring-boot-starter-amqp. Below is a snippet from the Candidate service’s build.gradle file, showing project dependencies. The Voter service’s dependencies are identical.

AMQP Configuration

Next, we need to add a small amount of RabbitMQ AMQP configuration to both services. We accomplish this by using Spring’s @Configuration annotation on our configuration classes. Below is the abridged configuration class for the Voter service.

And here, the abridged configuration class for the Candidate service.

Event Handler

With our dependencies and configuration in place, we will define the CandidateEventHandler class. This class is annotated with the Spring Data Rest @RepositoryEventHandler and Spring’s @Component. The @Component annotation ensures the event handler is registered.

The class contains the handleCandidateSave method, which is annotated with the Spring Data Rest @HandleAfterCreate. The event handler acts on the Candidate object, which is the first parameter in the method signature.

Responding to the candidate created event, the event handler publishes a message, containing a serialized JSON representation of the new candidate object, to the candidates.queue queue. This was the queue we configured earlier.

Consuming Messages

Next, we let’s switch to the Voter service’s CandidateListService class. Below is an abridged version of the class with two new methods. First, the getCandidateMessage method listens to the candidates.queue queue. This was the queue we configured earlier. The method is annotated with theSpring AMQP Rabbit @RabbitListener annotation.

The getCandidateMessage retrieves the new candidate object from the message, deserializes the message’s JSON payload, maps it to the candidate object model and saves it to the Voter service’s database.

The second method, getCandidatesQueueDb, retrieves the candidates from the Voter service’s database. The method makes use of the Spring Data MongoDB Aggregation package to return a list of candidates from MongoDB.

RabbitMQ Management Console

The easiest way to observe what is happening with the messages is using the RabbitMQ Management Console. To access the console, point your web browser to localhost, on port 15672. The default login credentials for the console are guest/guest. As you successfully produce and consume messages with RabbitMQ, you should see activity on the Overview tab.

RabbitMQ_EC_Durable3.png

Recall we said the queue, in this example, was durable. That means messages will survive the RabbitMQ broker stopping and starting. In the below view of the RabbitMQ Management Console, note the six messages persisted in memory. The Candidate service produced the messages in response to six new candidates being created. However, the Voter service was not running, and therefore, could not consume the messages. In addition, the RabbitMQ server was restarted, after receiving the candidate messages. The messages were persisted and still present in the queue after the successful reboot of RabbitMQ.

RabbitMQ_EC_Durable

Once RabbitMQ and the Voter service instance were back online, the Voter service successfully consumed the six waiting messages from the queue.

RabbitMQ_EC_Durable2.png

Service Logs

In addition to using the RabbitMQ Management Console, we may obverse communications between the two services by looking at the Voter and Candidate service’s logs. I have grabbed a snippet of both service’s logs and added a few comments to show where different processes are being executed.

First the Candidate service logs. We observe a REST HTTP POST request containing a new candidate. We then observe the creation of the new candidate object in the Candidate service’s database, followed by the event handler publishing a message on the queue. Finally, we observe the response is returned in reply to the initial REST HTTP POST request.

Now the Voter service logs. At the exact same second as the message and the response sent by the Candidate service, the Voter service consumes the message off the queue. The Voter service then deserializes the new candidate object and inserts it into its database.

MongoDB

Using the mongo Shell, we can observe six new 2016 Presidential Election candidates in the Candidate service’s database.

Now, looking at the Voter service’s database, we should find the same six 2016 Presidential Election candidates. Note the Object IDs are the same between the two service’s document sets, as are the rest of the fields (first name, last name, political party, and election). However, the class field is different between the two service’s records.

Production Considerations

The post demonstrated a simple example of message-based, event-driven eventual consistency. In an actual Production environment, there are a few things that must be considered.

  • We only addressed a ‘candidate created’ event. We would also have to code for other types of events, such as a ‘candidate deleted’ event and a ‘candidate updated’ event.
  • If a candidate is added, deleted, then re-added, are the events published and consumed in the right order? What about with multiple instances of the Voter service running? Does this pattern guarantee event ordering?
  • How should the Candidate service react on startup if RabbitMQ is not available
  • What if RabbitMQ fails after the Candidate services have started?
  • How should the Candidate service react if a new candidate record is added to the database, but a ‘candidate created’ event message cannot be published to RabbitMQ? The two actions are not wrapped in a single transaction.
  • In all of the above scenarios, what response should be returned to the API end user?

Conclusion

In this post, using eventual consistency, we successfully decoupled our two microservices and achieved asynchronous inter-process communications. Adopting a message-based, event-driven, loosely-coupled architecture, wherever possible, in combination with REST HTTP when it makes sense, will improve the overall manageability and scalability of a microservices-based platform.

References

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , , ,

1 Comment

Streaming Docker Logs to Elastic Stack (ELK) using Fluentd

Kibana

Introduction

Fluentd and Docker’s native logging driver for Fluentd makes it easy to stream Docker logs from multiple running containers to the Elastic Stack. In this post, we will use Fluentd to stream Docker logs from multiple instances of a Dockerized Spring Boot RESTful service and MongoDB, to the Elastic Stack (ELK).

log_message_flow_notype

In a recent post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, we built a Consul cluster using Docker swarm mode, to host distributed configurations for a Spring Boot service. We will use the resulting swarm cluster from the previous post as a foundation for this post.

Fluentd

According to the Fluentd website, Fluentd is described as an open source data collector, which unifies data collection and consumption for a better use and understanding of data. Fluentd combines all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations. Fluentd structures data as JSON as much as possible.

Logging Drivers

Docker includes multiple logging mechanisms to get logs from running containers and services. These mechanisms are called logging drivers. Fluentd is one of the ten current Docker logging drivers. According to Docker, The fluentd logging driver sends container logs to the Fluentd collector as structured log data. Then, users can utilize any of the various output plugins, from Fluentd, to write these logs to various destinations.

Elastic Stack

The ELK Stack, now known as the Elastic Stack, is the combination of Elastic’s very popular products: Elasticsearch, Logstash, and Kibana. According to Elastic, the Elastic Stack provides real-time insights from almost any type of structured and unstructured data source.

Setup

All code for this post has been tested on both MacOS and Linux. For this post, I am provisioning and deploying to a Linux workstation, running the most recent release of Fedora and Oracle VirtualBox. If you want to use AWS or another infrastructure provider instead of VirtualBox to build your swarm, it is fairly easy to switch the Docker Machine driver and change a few configuration items in the vms_create.sh script (see Provisioning, below).

Required Software

If you want to follow along with this post, you will need the latest versions of git, Docker, Docker Machine, Docker Compose, and VirtualBox installed.

Source Code

All source code for this post is located in two GitHub repositories. The first repository contains scripts to provision the VMs, create an overlay network and persistent host-mounted volumes, build the Docker swarm, and deploy Consul, Registrator, Swarm Visualizer, Fluentd, and the Elastic Stack. The second repository contains scripts to deploy two instances of the Widget Spring Boot RESTful service and a single instance of MongoDB. You can execute all scripts manually, from the command-line, or from a CI/CD pipeline, using tools such as Jenkins.

Provisioning the Swarm

To start, clone the first repository, and execute the single run_all.sh script, or execute the seven individual scripts necessary to provision the VMs, create the overlay network and host volumes, build the swarm, and deploy Consul, Registrator, Swarm Visualizer, Fluentd, and the Elastic Stack. Follow the steps below to complete this part.

When the scripts have completed, the resulting swarm should be configured similarly to the diagram below. Consul, Registrator, Swarm Visualizer, Fluentd, and the Elastic Stack containers should be distributed across the three swarm manager nodes and the three swarm worker nodes (VirtualBox VMs).

swarm_fluentd_diagram

Deploying the Application

Next, clone the second repository, and execute the single run_all.sh script, or execute the four scripts necessary to deploy the Widget Spring Boot RESTful service and a single instance of MongoDB. Follow the steps below to complete this part.

When the scripts have completed, the Widget service and MongoDB containers should be distributed across two of the three swarm worker nodes (VirtualBox VMs).

swarm_fluentd_diagram_b

To confirm the final state of the swarm and the running container stacks, use the following Docker commands.

Open the Swarm Visualizer web UI, using any of the swarm manager node IPs, on port 5001, to confirm the swarm health, as well as the running container’s locations.

Visualizer

Lastly, open the Consul Web UI, using any of the swarm manager node IPs, on port 5601, to confirm the running container’s health, as well as their placement on the swarm nodes.

Consul_1

Streaming Logs

Elastic Stack

If you read the previous post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, you will notice we deployed a few additional components this time. First, the Elastic Stack (aka ELK), is deployed to the worker3 swarm worker node, within a single container. I have increased the CPU count and RAM assigned to this VM, to minimally run the Elastic Stack. If you review the docker-compose.yml file, you will note I am using Sébastien Pujadas’ sebp/elk:latest Docker base image from Docker Hub to provision the Elastic Stack. At the time of the post, this was based on the 5.3.0 version of ELK.

Docker Logging Driver

The Widget stack’s docker-compose.yml file has been modified since the last post. The compose file now incorporates a Fluentd logging configuration section for each service. The logging configuration includes the address of the Fluentd instance, on the same swarm worker node. The logging configuration also includes a tag for each log message.

Fluentd

In addition to the Elastic Stack, we have deployed Fluentd to the worker1 and worker2 swarm nodes. This is also where the Widget and MongoDB containers are deployed. Again, looking at the docker-compose.yml file, you will note we are using a custom Fluentd Docker image, garystafford/custom-fluentd:latest, which I created. The custom image is available on Docker Hub.

The custom Fluentd Docker image is based on Fluentd’s official onbuild Docker image, fluent/fluentd:onbuild. Fluentd provides instructions for building your own custom images, from their onbuild base images.

There were two reasons I chose to create a custom Fluentd Docker image. First, I added the Uken Games’ Fluentd Elasticsearch Plugin, to the Docker Image. This highly configurable Fluentd Output Plugin allows us to push Docker logs, processed by Fluentd to the Elasticsearch. Adding additional plugins is a common reason for creating a custom Fluentd Docker image.

The second reason to create a custom Fluentd Docker image was configuration. Instead of bind-mounting host directories or volumes to the multiple Fluentd containers, to provide Fluentd’s configuration, I baked the configuration file into the immutable Docker image. The bare-bones, basicFluentd configuration file defines three processes, which are Input, Filter, and Output. These processes are accomplished using Fluentd plugins. Fluentd has 6 types of plugins: Input, Parser, Filter, Output, Formatter and Buffer. Fluentd is primarily written in Ruby, and its plugins are Ruby gems.

Fluentd listens for input on tcp port 24224, using the forward Input Plugin. Docker logs are streamed locally on each swarm node, from the Widget and MongoDB containers to the local Fluentd container, over tcp port 24224, using Docker’s fluentd logging driver, introduced earlier. Fluentd

Fluentd then filters all input using the stdout Filter Plugin. This plugin prints events to stdout, or logs if launched with daemon mode. This is the most basic method of filtering.

Lastly, Fluentd outputs the filtered input to two destinations, a local log file and Elasticsearch. First, the Docker logs are sent to a local Fluentd log file. This is only for demonstration purposes and debugging. Outputting log files is not recommended for production, nor does it meet the 12-factor application recommendations for logging. Second, Fluentd outputs the Docker logs to Elasticsearch, over tcp port 9200, using the Fluentd Elasticsearch Plugin, introduced above.

log_message_flow

Additional Metadata

In addition to the log message itself, in JSON format, the fluentd log driver sends the following metadata in the structured log message: container_id, container_name, and source. This is helpful in identifying and categorizing log messages from multiple sources. Below is a sample of log messages from the raw Fluentd log file, with the metadata tags highlighted in yellow. At the bottom of the output is a log message parsed with jq, for better readability.

fluentd_logs

Using Elastic Stack

Now that our two Docker stacks are up and running on our swarm, we should be streaming logs to Elasticsearch. To confirm this, open the Kibana web console, which should be available at the IP address of the worker3 swarm worker node, on port 5601.

Kibana

For the sake of this demonstration, I increased the verbosity of the Spring Boot Widget service’s log level, from INFO to DEBUG, in Consul. At this level of logging, the two Widget services and the single MongoDB instance were generating an average of 250-400 log messages every 30 seconds, according to Kibana.

If that seems like a lot, keep in mind, these are Docker logs, which are single-line log entries. We have not aggregated multi-line messages, such as Java exceptions and stack traces messages, into single entries. That is for another post. Also, the volume of debug-level log messages generated by the communications between the individual services and Consul is fairly verbose.

Kibana_3

Inspecting log entries in Kibana, we find the metadata tags contained in the raw Fluentd log output are now searchable fields: container_id, container_name, and source, as well as log. Also, note the _type field, with a value of ‘fluentd’. We injected this field in the output section of our Fluentd configuration, using the Fluentd Elasticsearch Plugin. The _type fiel allows us to differentiate these log entries from other potential data sources.

Kibana_2.png

References

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , ,

2 Comments

Provision and Deploy a Consul Cluster on AWS, using Terraform, Docker, and Jenkins

Cover2

Introduction

Modern DevOps tools, such as HashiCorp’s Packer and Terraform, make it easier to provision and manage complex cloud architecture. Utilizing a CI/CD server, such as Jenkins, to securely automate the use of these DevOps tools, ensures quick and consistent results.

In a recent post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, we built a Consul cluster using Docker swarm mode, to host distributed configurations for a Spring Boot application. The cluster was built locally with VirtualBox. This architecture is fine for development and testing, but not for use in Production.

In this post, we will deploy a highly available three-node Consul cluster to AWS. We will use Terraform to provision a set of EC2 instances and accompanying infrastructure. The instances will be built from a hybrid AMIs containing the new Docker Community Edition (CE). In a recent post, Baking AWS AMI with new Docker CE Using Packer, we provisioned an Ubuntu AMI with Docker CE, using Packer. We will deploy Docker containers to each EC2 host, containing an instance of Consul server.

All source code can be found on GitHub.

Jenkins

I have chosen Jenkins to automate all of the post’s build, provisioning, and deployment tasks. However, none of the code is written specific to Jenkins; you may run all of it from the command line.

For this post, I have built four projects in Jenkins, as follows:

  1. Provision Docker CE AMI: Builds Ubuntu AMI with Docker CE, using Packer
  2. Provision Consul Infra AWS: Provisions Consul infrastructure on AWS, using Terraform
  3. Deploy Consul Cluster AWS: Deploys Consul to AWS, using Docker
  4. Destroy Consul Infra AWS: Destroys Consul infrastructure on AWS, using Terraform

Jenkins UI

We will primarily be using the ‘Provision Consul Infra AWS’, ‘Deploy Consul Cluster AWS’, and ‘Destroy Consul Infra AWS’ Jenkins projects in this post. The fourth Jenkins project, ‘Provision Docker CE AMI’, automates the steps found in the recent post, Baking AWS AMI with new Docker CE Using Packer, to build the AMI used to provision the EC2 instances in this post.

Consul AWS Diagram 2

Terraform

Using Terraform, we will provision EC2 instances in three different Availability Zones within the US East 1 (N. Virginia) Region. Using Terraform’s Amazon Web Services (AWS) provider, we will create the following AWS resources:

  • (1) Virtual Private Cloud (VPC)
  • (1) Internet Gateway
  • (1) Key Pair
  • (3) Elastic Cloud Compute (EC2) Instances
  • (2) Security Groups
  • (3) Subnets
  • (1) Route
  • (3) Route Tables
  • (3) Route Table Associations

The final AWS architecture should resemble the following:

Consul AWS Diagram

Production Ready AWS

Although we have provisioned a fairly complete VPC for this post, it is far from being ready for Production. I have created two security groups, limiting the ingress and egress to the cluster. However, to further productionize the environment would require additional security hardening. At a minimum, you should consider adding public/private subnets, NAT gateways, network access control list rules (network ACLs), and the use of HTTPS for secure communications.

In production, applications would communicate with Consul through local Consul clients. Consul clients would take part in the LAN gossip pool from different subnets, Availability Zones, Regions, or VPCs using VPC peering. Communications would be tightly controlled by IAM, VPC, subnet, IP address, and port.

Also, you would not have direct access to the Consul UI through a publicly exposed IP or DNS address. Access to the UI would be removed altogether or locked down to specific IP addresses, and accessed restricted to secure communication channels.

Consul

We will achieve high availability (HA) by clustering three Consul server nodes across the three Elastic Cloud Compute (EC2) instances. In this minimally sized, three-node cluster of Consul servers, we are protected from the loss of one Consul server node, one EC2 instance, or one Availability Zone(AZ). The cluster will still maintain a quorum of two nodes. An additional level of HA that Consul supports, multiple datacenters (multiple AWS Regions), is not demonstrated in this post.

Docker

Having Docker CE already installed on each EC2 instance allows us to execute remote Docker commands over SSH from Jenkins. These commands will deploy and configure a Consul server node, within a Docker container, on each EC2 instance. The containers are built from HashiCorp’s latest Consul Docker image pulled from Docker Hub.

Getting Started

Preliminary Steps

If you have built infrastructure on AWS with Terraform, these steps should be familiar to you:

  1. First, you will need an AMI with Docker. I suggest reading Baking AWS AMI with new Docker CE Using Packer.
  2. You will need an AWS IAM User with the proper access to create the required infrastructure. For this post, I created a separate Jenkins IAM User with PowerUser level access.
  3. You will need to have an RSA public-private key pair, which can be used to SSH into the EC2 instances and install Consul.
  4. Ensure you have your AWS credentials set. I usually source mine from a .env file, as environment variables. Jenkins can securely manage credentials, using secret text or files.
  5. Fork and/or clone the Consul cluster project from  GitHub.
  6. Change the aws_key_name and public_key_path variable values to your own RSA key, in the variables.tf file
  7. Change the aws_amis_base variable values to your own AMI ID (see step 1)
  8. If you are do not want to use the US East 1 Region and its AZs, modify the variables.tf, network.tf, and instances.tf files.
  9. Disable Terraform’s remote state or modify the resource to match your remote state configuration, in the main.tf file. I am using an Amazon S3 bucket to store my Terraform remote state.

Building an AMI with Docker

If you have not built an Amazon Machine Image (AMI) for use in this post already, you can do so using the scripts provided in the previous post’s GitHub repository. To automate the AMI build task, I built the ‘Provision Docker CE AMI’ Jenkins project. Identical to the other three Jenkins projects in this post, this project has three main tasks, which include: 1) SCM: clone the Packer AMI GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Packer.

The SCM and Bindings tasks are identical to the other projects (see below for details), except for the use of a different GitHub repository. The project’s Build step, which runs the packer_build_ami.sh script looks as follows:

jenkins_13

The resulting AMI ID will need to be manually placed in Terraform’s variables.tf file, before provisioning the AWS infrastructure with Terraform. The new AMI ID will be displayed in Jenkin’s build output.

jenkins_14

Provisioning with Terraform

Based on the modifications you made in the Preliminary Steps, execute the terraform validate command to confirm your changes. Then, run the terraform plan command to review the plan. Assuming are were no errors, finally, run the terraform apply command to provision the AWS infrastructure components.

In Jenkins, I have created the ‘Provision Consul Infra AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Terraform. Those tasks look as follows:

Jenkins_08.png

You will obviously need to use your modified GitHub project, incorporating the configuration changes detailed above, as the SCM source for Jenkins.

Jenkins Credentials

You will also need to configure your AWS credentials.

Jenkins_03.png

The provision_infra.sh script provisions the AWS infrastructure using Terraform. The script also updates Terraform’s remote state. Remember to update the remote state configuration in the script to match your personal settings.

The Jenkins build output should look similar to the following:

jenkins_12.png

Although the build only takes about 90 seconds to complete, the EC2 instances could take a few extra minutes to complete their Status Checks and be completely ready. The final results in the AWS EC2 Management Console should look as follows:

EC2 Management Console

Note each EC2 instance is running in a different US East 1 Availability Zone.

Installing Consul

Once the AWS infrastructure is running and the EC2 instances have completed their Status Checks successfully, we are ready to deploy Consul. In Jenkins, I have created the ‘Deploy Consul Cluster AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run an SSH remote Docker command on each EC2 instance to deploy Consul. The SCM and Bindings tasks are identical to the project above. The project’s Build step looks as follows:

Jenkins_04.png

First, the delete_containers.sh script deletes any previous instances of Consul containers. This is helpful if you need to re-deploy Consul. Next, the deploy_consul.sh script executes a series of SSH remote Docker commands to install and configure Consul on each EC2 instance.

The entire Jenkins build process only takes about 30 seconds. Afterward, the output from a successful Jenkins build should show that all three Consul server instances are running, have formed a quorum, and have elected a Leader.

Jenkins_05.png

Persisting State

The Consul Docker image exposes VOLUME /consul/data, which is a path were Consul will place its persisted state. Using Terraform’s remote-exec provisioner, we create a directory on each EC2 instance, at /home/ubuntu/consul/config. The docker run command bind-mounts the container’s /consul/data path to the EC2 host’s /home/ubuntu/consul/config directory.

According to Consul, the Consul server container instance will ‘store the client information plus snapshots and data related to the consensus algorithm and other state, like Consul’s key/value store and catalog’ in the /consul/data directory. That container directory is now bind-mounted to the EC2 host, as demonstrated below.

jenkins_15

Accessing Consul

Following a successful deployment, you should be able to use the public URL, displayed in the build output of the ‘Deploy Consul Cluster AWS’ project, to access the Consul UI. Clicking on the Nodes tab in the UI, you should see all three Consul server instances, one per EC2 instance, running and healthy.

Consul UI

Destroying Infrastructure

When you are finished with the post, you may want to remove the running infrastructure, so you don’t continue to get billed by Amazon. The ‘Destroy Consul Infra AWS’ project destroys all the AWS infrastructure, provisioned as part of this post, in about 60 seconds. The project’s SCM and Bindings tasks are identical to the both previous projects. The Build step calls the destroy_infra.sh script, which is included in the GitHub project. The script executes the terraform destroy -force command. It will delete all running infrastructure components associated with the post and update Terraform’s remote state.

Jenkins_09

Conclusion

This post has demonstrated how modern DevOps tooling, such as HashiCorp’s Packer and Terraform, make it easy to build, provision and manage complex cloud architecture. Using a CI/CD server, such as Jenkins, to securely automate the use of these tools, ensures quick and consistent results.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , ,

3 Comments

Baking AWS AMI with new Docker CE Using Packer

AWS for Docker

Introduction

On March 2 (less than a week ago as of this post), Docker announced the release of Docker Enterprise Edition (EE), a new version of the Docker platform optimized for business-critical deployments. As part of the release, Docker also renamed the free Docker products to Docker Community Edition (CE). Both products are adopting a new time-based versioning scheme for both Docker EE and CE. The initial release of Docker CE and EE, the 17.03 release, is the first to use the new scheme.

Along with the release, Docker delivered excellent documentation on installing, configuring, and troubleshooting the new Docker EE and CE. In this post, I will demonstrate how to partially bake an existing Amazon Machine Image (Amazon AMI) with the new Docker CE, preparing it as a base for the creation of Amazon Elastic Compute Cloud (Amazon EC2) compute instances.

Adding Docker and similar tooling to an AMI is referred to as partially baking an AMI, often referred to as a hybrid AMI. According to AWS, ‘hybrid AMIs provide a subset of the software needed to produce a fully functional instance, falling in between the fully baked and JeOS (just enough operating system) options on the AMI design spectrum.

Installing Docker CE on an AWS AMI should not be confused with Docker’s also recently announced Docker Community Edition (CE) for AWS. Docker for AWS offers multiple CloudFormation templates for Docker EE and CE. According to Docker, Docker for AWS ‘provides a Docker-native solution that avoids operational complexity and adding unneeded additional APIs to the Docker stack.

Base AMI

Docker provides detailed directions for installing Docker CE and EE onto several major Linux distributions. For this post, we will choose a widely used Linux distro, Ubuntu. According to Docker, currently Docker CE and EE can be installed on three popular Ubuntu releases:

  • Yakkety 16.10
  • Xenial 16.04 (LTS)
  • Trusty 14.04 (LTS)

To provision a small EC2 instance in Amazon’s US East (N. Virginia) Region, I will choose Ubuntu 16.04.2 LTS Xenial Xerus . According to Canonical’s Amazon EC2 AMI Locator website, a Xenial 16.04 LTS AMI is available, ami-09b3691f, for US East 1, as a t2.micro EC2 instance type.

Packer

HashiCorp Packer will be used to partially bake the base Ubuntu Xenial 16.04 AMI with Docker CE 17.03. HashiCorp describes Packer as ‘a tool for creating machine and container images for multiple platforms from a single source configuration.’ The JSON-format Packer file is as follows:

{
  "variables": {
    "aws_access_key": "{{env `AWS_ACCESS_KEY_ID`}}",
    "aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}",
    "us_east_1_ami": "ami-09b3691f",
    "name": "aws-docker-ce-base",
    "us_east_1_name": "ubuntu-xenial-docker-ce-base",
    "ssh_username": "ubuntu"
  },
  "builders": [
    {
      "name": "{{user `us_east_1_name`}}",
      "type": "amazon-ebs",
      "access_key": "{{user `aws_access_key`}}",
      "secret_key": "{{user `aws_secret_key`}}",
      "region": "us-east-1",
      "vpc_id": "",
      "subnet_id": "",
      "source_ami": "{{user `us_east_1_ami`}}",
      "instance_type": "t2.micro",
      "ssh_username": "{{user `ssh_username`}}",
      "ssh_timeout": "10m",
      "ami_name": "{{user `us_east_1_name`}} {{timestamp}}",
      "ami_description": "{{user `us_east_1_name`}} AMI",
      "run_tags": {
        "ami-create": "{{user `us_east_1_name`}}"
      },
      "tags": {
        "ami": "{{user `us_east_1_name`}}"
      },
      "ssh_private_ip": false,
      "associate_public_ip_address": true
    }
  ],
  "provisioners": [
    {
      "type": "file",
      "source": "bootstrap_docker_ce.sh",
      "destination": "/tmp/bootstrap_docker_ce.sh"
    },
    {
          "type": "file",
          "source": "cleanup.sh",
          "destination": "/tmp/cleanup.sh"
    },
    {
      "type": "shell",
      "execute_command": "echo 'packer' | sudo -S sh -c '{{ .Vars }} {{ .Path }}'",
      "inline": [
        "whoami",
        "cd /tmp",
        "chmod +x bootstrap_docker_ce.sh",
        "chmod +x cleanup.sh",
        "ls -alh /tmp",
        "./bootstrap_docker_ce.sh",
        "sleep 10",
        "./cleanup.sh"
      ]
    }
  ]
}

The Packer file uses Packer’s amazon-ebs builder type. This builder is used to create Amazon AMIs backed by Amazon Elastic Block Store (EBS) volumes, for use in EC2.

Bootstrap Script

To install Docker CE on the AMI, the Packer file executes a bootstrap shell script. The bootstrap script and subsequent cleanup script are executed using  Packer’s remote shell provisioner. The bootstrap is like the following:

#!/bin/sh

sudo apt-get remove docker docker-engine

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get install -y docker-ce

sudo groupadd docker
sudo usermod -aG docker ubuntu

sudo systemctl enable docker

This script closely follows directions provided by Docker, for installing Docker CE on Ubuntu. After removing any previous copies of Docker, the script installs Docker CE. To ensure sudo is not required to execute Docker commands on any EC2 instance provisioned from resulting AMI, the script adds the ubuntu user to the docker group.

The bootstrap script also uses systemd to start the Docker daemon. Starting with Ubuntu 15.04, Systemd System and Service Manager is used by default instead of the previous init system, Upstart. Systemd ensures Docker will start on boot.

Cleaning Up

It is best good practice to clean up your activities after baking an AMI. I have included a basic clean up script. The cleanup script is as follows:

#!/bin/sh

set -e

echo 'Cleaning up after bootstrapping...'
sudo apt-get -y autoremove
sudo apt-get -y clean
sudo rm -rf /tmp/*
cat /dev/null > ~/.bash_history
history -c
exit

Partially Baking

Before running Packer to build the Docker CE AMI, I set both my AWS access key and AWS secret access key. The Packer file expects the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Running the packer build ubuntu_docker_ce_ami.json command builds the AMI. The abridged output should look similar to the following:

$ packer build docker_ami.json
ubuntu-xenial-docker-ce-base output will be in this color.

==> ubuntu-xenial-docker-ce-base: Prevalidating AMI Name...
    ubuntu-xenial-docker-ce-base: Found Image ID: ami-09b3691f
==> ubuntu-xenial-docker-ce-base: Creating temporary keypair: packer_58bc7a49-9e66-7f76-ce8e-391a67d94987
==> ubuntu-xenial-docker-ce-base: Creating temporary security group for this instance...
==> ubuntu-xenial-docker-ce-base: Authorizing access to port 22 the temporary security group...
==> ubuntu-xenial-docker-ce-base: Launching a source AWS instance...
    ubuntu-xenial-docker-ce-base: Instance ID: i-0ca883ecba0c28baf
==> ubuntu-xenial-docker-ce-base: Waiting for instance (i-0ca883ecba0c28baf) to become ready...
==> ubuntu-xenial-docker-ce-base: Adding tags to source instance
==> ubuntu-xenial-docker-ce-base: Waiting for SSH to become available...
==> ubuntu-xenial-docker-ce-base: Connected to SSH!
==> ubuntu-xenial-docker-ce-base: Uploading bootstrap_docker_ce.sh => /tmp/bootstrap_docker_ce.sh
==> ubuntu-xenial-docker-ce-base: Uploading cleanup.sh => /tmp/cleanup.sh
==> ubuntu-xenial-docker-ce-base: Provisioning with shell script: /var/folders/kf/637b0qns7xb0wh9p8c4q0r_40000gn/T/packer-shell189662158
    ...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: E: Unable to locate package docker-engine
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: ca-certificates is already the newest version (20160104ubuntu1).
    ubuntu-xenial-docker-ce-base: apt-transport-https is already the newest version (1.2.19).
    ubuntu-xenial-docker-ce-base: curl is already the newest version (7.47.0-1ubuntu2.2).
    ubuntu-xenial-docker-ce-base: software-properties-common is already the newest version (0.96.20.5).
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: OK
    ubuntu-xenial-docker-ce-base: pub   4096R/0EBFCD88 2017-02-22
    ubuntu-xenial-docker-ce-base: Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
    ubuntu-xenial-docker-ce-base: uid                  Docker Release (CE deb) <docker@docker.com>
    ubuntu-xenial-docker-ce-base: sub   4096R/F273FCD8 2017-02-22
    ubuntu-xenial-docker-ce-base:
    ubuntu-xenial-docker-ce-base: Hit:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial InRelease
    ubuntu-xenial-docker-ce-base: Get:2 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
    ...
    ubuntu-xenial-docker-ce-base: Get:27 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [89.5 kB]
    ubuntu-xenial-docker-ce-base: Fetched 10.6 MB in 2s (4,065 kB/s)
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: Calculating upgrade...
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: The following additional packages will be installed:
    ubuntu-xenial-docker-ce-base: aufs-tools cgroupfs-mount libltdl7
    ubuntu-xenial-docker-ce-base: Suggested packages:
    ubuntu-xenial-docker-ce-base: mountall
    ubuntu-xenial-docker-ce-base: The following NEW packages will be installed:
    ubuntu-xenial-docker-ce-base: aufs-tools cgroupfs-mount docker-ce libltdl7
    ubuntu-xenial-docker-ce-base: 0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: Need to get 19.4 MB of archives.
    ubuntu-xenial-docker-ce-base: After this operation, 89.4 MB of additional disk space will be used.
    ubuntu-xenial-docker-ce-base: Get:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/universe amd64 aufs-tools amd64 1:3.2+20130722-1.1ubuntu1 [92.9 kB]
    ...
    ubuntu-xenial-docker-ce-base: Get:4 https://download.docker.com/linux/ubuntu xenial/stable amd64 docker-ce amd64 17.03.0~ce-0~ubuntu-xenial [19.3 MB]
    ubuntu-xenial-docker-ce-base: debconf: unable to initialize frontend: Dialog
    ubuntu-xenial-docker-ce-base: debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
    ubuntu-xenial-docker-ce-base: debconf: falling back to frontend: Readline
    ubuntu-xenial-docker-ce-base: debconf: unable to initialize frontend: Readline
    ubuntu-xenial-docker-ce-base: debconf: (This frontend requires a controlling tty.)
    ubuntu-xenial-docker-ce-base: debconf: falling back to frontend: Teletype
    ubuntu-xenial-docker-ce-base: dpkg-preconfigure: unable to re-open stdin:
    ubuntu-xenial-docker-ce-base: Fetched 19.4 MB in 1s (17.8 MB/s)
    ubuntu-xenial-docker-ce-base: Selecting previously unselected package aufs-tools.
    ubuntu-xenial-docker-ce-base: (Reading database ... 53844 files and directories currently installed.)
    ubuntu-xenial-docker-ce-base: Preparing to unpack .../aufs-tools_1%3a3.2+20130722-1.1ubuntu1_amd64.deb ...
    ubuntu-xenial-docker-ce-base: Unpacking aufs-tools (1:3.2+20130722-1.1ubuntu1) ...
    ...
    ubuntu-xenial-docker-ce-base: Setting up docker-ce (17.03.0~ce-0~ubuntu-xenial) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for libc-bin (2.23-0ubuntu5) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for systemd (229-4ubuntu16) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for ureadahead (0.100.0-19) ...
    ubuntu-xenial-docker-ce-base: groupadd: group 'docker' already exists
    ubuntu-xenial-docker-ce-base: Synchronizing state of docker.service with SysV init with /lib/systemd/systemd-sysv-install...
    ubuntu-xenial-docker-ce-base: Executing /lib/systemd/systemd-sysv-install enable docker
    ubuntu-xenial-docker-ce-base: Cleanup...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
==> ubuntu-xenial-docker-ce-base: Stopping the source instance...
==> ubuntu-xenial-docker-ce-base: Waiting for the instance to stop...
==> ubuntu-xenial-docker-ce-base: Creating the AMI: ubuntu-xenial-docker-ce-base 1288227081
    ubuntu-xenial-docker-ce-base: AMI: ami-e9ca6eff
==> ubuntu-xenial-docker-ce-base: Waiting for AMI to become ready...
==> ubuntu-xenial-docker-ce-base: Modifying attributes on AMI (ami-e9ca6eff)...
    ubuntu-xenial-docker-ce-base: Modifying: description
==> ubuntu-xenial-docker-ce-base: Modifying attributes on snapshot (snap-058a26c0250ee3217)...
==> ubuntu-xenial-docker-ce-base: Adding tags to AMI (ami-e9ca6eff)...
==> ubuntu-xenial-docker-ce-base: Tagging snapshot: snap-043a16c0154ee3217
==> ubuntu-xenial-docker-ce-base: Creating AMI tags
==> ubuntu-xenial-docker-ce-base: Creating snapshot tags
==> ubuntu-xenial-docker-ce-base: Terminating the source AWS instance...
==> ubuntu-xenial-docker-ce-base: Cleaning up any extra volumes...
==> ubuntu-xenial-docker-ce-base: No volumes to clean up, skipping
==> ubuntu-xenial-docker-ce-base: Deleting temporary security group...
==> ubuntu-xenial-docker-ce-base: Deleting temporary keypair...
Build 'ubuntu-xenial-docker-ce-base' finished.

==> Builds finished. The artifacts of successful builds are:
--> ubuntu-xenial-docker-ce-base: AMIs were created:

us-east-1: ami-e9ca6eff

Results

The result is an Ubuntu 16.04 AMI in US East 1 with Docker CE 17.03 installed. To confirm the new AMI is now available, I will use the AWS CLI to examine the resulting AMI:

aws ec2 describe-images \
  --filters Name=tag-key,Values=ami Name=tag-value,Values=ubuntu-xenial-docker-ce-base \
  --query 'Images[*].{ID:ImageId}'

Resulting output:

{
    "Images": [
        {
            "VirtualizationType": "hvm",
            "Name": "ubuntu-xenial-docker-ce-base 1488747081",
            "Tags": [
                {
                    "Value": "ubuntu-xenial-docker-ce-base",
                    "Key": "ami"
                }
            ],
            "Hypervisor": "xen",
            "SriovNetSupport": "simple",
            "ImageId": "ami-e9ca6eff",
            "State": "available",
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/sda1",
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "SnapshotId": "snap-048a16c0250ee3227",
                        "VolumeSize": 8,
                        "VolumeType": "gp2",
                        "Encrypted": false
                    }
                },
                {
                    "DeviceName": "/dev/sdb",
                    "VirtualName": "ephemeral0"
                },
                {
                    "DeviceName": "/dev/sdc",
                    "VirtualName": "ephemeral1"
                }
            ],
            "Architecture": "x86_64",
            "ImageLocation": "931066906971/ubuntu-xenial-docker-ce-base 1488747081",
            "RootDeviceType": "ebs",
            "OwnerId": "931066906971",
            "RootDeviceName": "/dev/sda1",
            "CreationDate": "2017-03-05T20:53:41.000Z",
            "Public": false,
            "ImageType": "machine",
            "Description": "ubuntu-xenial-docker-ce-base AMI"
        }
    ]
}

Finally, here is the new AMI as seen in the AWS EC2 Management Console:

EC2 Management Console - AMI

Terraform

To confirm Docker CE is installed and running, I can provision a new EC2 instance, using HashiCorp Terraform. This post is too short to detail all the Terraform code required to stand up a complete environment. I’ve included the complete code in the GitHub repo for this post. Not, the Terraform code is only used to testing. No security, including the use of a properly configured security groups, public/private subnets, and a NAT server, is configured.

Below is a greatly abridged version of the Terraform code I used to provision a new EC2 instance, using Terraform’s aws_instance resource. The resulting EC2 instance should have Docker CE available.

# test-docker-ce instance
resource "aws_instance" "test-docker-ce" {
  connection {
    user        = "ubuntu"
    private_key = "${file("~/.ssh/test-docker-ce")}"
    timeout     = "${connection_timeout}"
  }

  ami               = "ami-e9ca6eff"
  instance_type     = "t2.nano"
  availability_zone = "us-east-1a"
  count             = "1"

  key_name               = "${aws_key_pair.auth.id}"
  vpc_security_group_ids = ["${aws_security_group.test-docker-ce.id}"]
  subnet_id              = "${aws_subnet.test-docker-ce.id}"

  tags {
    Owner       = "Gary A. Stafford"
    Terraform   = true
    Environment = "test-docker-ce"
    Name        = "tf-instance-test-docker-ce"
  }
}

By using the AWS CLI, once again, we can confirm the new EC2 instance was built using the correct AMI:

aws ec2 describe-instances \
  --filters Name='tag:Name,Values=tf-instance-test-docker-ce' \
  --output text --query 'Reservations[*].Instances[*].ImageId'

Resulting output looks good:

ami-e9ca6eff

Finally, here is the new EC2 as seen in the AWS EC2 Management Console:

EC2 Management Console - EC2

SSHing into the new EC2 instance, I should observe that the operating system is Ubuntu 16.04.2 LTS and that Docker version 17.03.0-ce is installed and running:

Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-64-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.

Last login: Sun Mar  5 22:06:01 2017 from <my_ip_address>

ubuntu@ip-<ec2_local_ip>:~$ docker --version
Docker version 17.03.0-ce, build 3a232c8

Conclusion

Docker EE and CE represent a significant step forward in expanding Docker’s enterprise-grade toolkit. Replacing or installing Docker EE or CE on your AWS AMIs is easy, using Docker’s guide along with HashiCorp Packer.

All source code for this post can be found on GitHub.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , , ,

2 Comments

Distributed Service Configuration with Consul, Spring Cloud, and Docker

Consul UI - Nodes

Introduction

In this post, we will explore the use of HashiCorp Consul for distributed configuration of containerized Spring Boot services, deployed to a Docker swarm cluster. In the first half of the post, we will provision a series of virtual machines, build a Docker swarm on top of those VMs, and install Consul and Registrator on each swarm host. In the second half of the post, we will configure and deploy multiple instances of a containerized Spring Boot service, backed by MongoDB, to the swarm cluster, using Docker Compose.

The final objective of this post is to have all the deployed services registered with Consul, via Registrator, and the service’s runtime configuration provided dynamically by Consul.

Objectives

  1. Provision a series of virtual machine hosts, using Docker Machine and Oracle VirtualBox
  2. Provide distributed and highly available cluster management and service orchestration, using Docker swarm mode
  3. Provide distributed and highly available service discovery, health checking, and a hierarchical key/value store, using HashiCorp Consul
  4. Provide service discovery and automatic registration of containerized services to Consul, using Registrator, Glider Labs’ service registry bridge for Docker
  5. Provide distributed configuration for containerized Spring Boot services using Consul and Pivotal Spring Cloud Consul Config
  6. Deploy multiple instances of a Spring Boot service, backed by MongoDB, to the swarm cluster, using Docker Compose version 3.

Technologies

  • Docker
  • Docker Compose (v3)
  • Docker Hub
  • Docker Machine
  • Docker swarm mode
  • Docker Swarm Visualizer (Mano Marks)
  • Glider Labs Registrator
  • Gradle
  • HashiCorp Consul
  • Java
  • MongoDB
  • Oracle VirtualBox VM Manager
  • Spring Boot
  • Spring Cloud Consul Config
  • Travis CI

Code Sources

All code in this post exists in two GitHub repositories. I have labeled each code snippet in the post with the corresponding file in the repositories. The first repository, consul-docker-swarm-compose, contains all the code necessary for provisioning the VMs and building the Docker swarm and Consul clusters. The repo also contains code for deploying Swarm Visualizer and Registrator. Make sure you clone the swarm-mode branch.

git clone --depth 1 --branch swarm-mode \
  https://github.com/garystafford/microservice-docker-demo-consul.git
cd microservice-docker-demo-consul

The second repository, microservice-docker-demo-widget, contains all the code necessary for configuring Consul and deploying the Widget service stack. Make sure you clone the consul branch.

git clone --depth 1 --branch consul \
  https://github.com/garystafford/microservice-docker-demo-widget.git
cd microservice-docker-demo-widget

Docker Versions

With the Docker toolset evolving so quickly, features frequently change or become outmoded. At the time of this post, I am running the following versions on my Mac.

  • Docker Engine version 1.13.1
  • Boot2Docker version 1.13.1
  • Docker Compose version 1.11.2
  • Docker Machine version 0.10.0
  • HashiCorp Consul version 0.7.5

Provisioning VM Hosts

First, we will provision a series of six virtual machines (aka machines, VMs, or hosts), using Docker Machine and Oracle VirtualBox.

consul-post-1

By switching Docker Machine’s driver, you can easily switch from VirtualBox to other vendors, such as VMware, AWS, GCE, or AWS. I have explicitly set several VirtualBox driver options, using the default values, for better clarification into the VirtualBox VMs being created.

vms=( "manager1" "manager2" "manager3"
      "worker1" "worker2" "worker3" )

for vm in ${vms[@]}
do
  docker-machine create \
    --driver virtualbox \
    --virtualbox-memory "1024" \
    --virtualbox-cpu-count "1" \
    --virtualbox-disk-size "20000" \
    ${vm}
done

Using the docker-machine ls command, we should observe the resulting series of VMs looks similar to the following. Note each of the six VMs has a machine name and an IP address in the range of 192.168.99.1/24.

$ docker-machine ls
NAME       ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER    ERRORS
manager1   -        virtualbox   Running   tcp://192.168.99.109:2376           v1.13.1
manager2   -        virtualbox   Running   tcp://192.168.99.110:2376           v1.13.1
manager3   -        virtualbox   Running   tcp://192.168.99.111:2376           v1.13.1
worker1    -        virtualbox   Running   tcp://192.168.99.112:2376           v1.13.1
worker2    -        virtualbox   Running   tcp://192.168.99.113:2376           v1.13.1
worker3    -        virtualbox   Running   tcp://192.168.99.114:2376           v1.13.1

We may also view the VMs using the Oracle VM VirtualBox Manager application.

Oracle VirtualBox VM Manager

Docker Swarm Mode

Next, we will provide, amongst other capabilities, cluster management and orchestration, using Docker swarm mode. It is important to understand that the relatively new Docker swarm mode is not the same the Docker Swarm. Legacy Docker Swarm was succeeded by Docker’s integrated swarm mode, with the release of Docker v1.12.0, in July 2016.

We will create a swarm (a cluster of Docker Engines or nodes), consisting of three Manager nodes and three Worker nodes, on the six VirtualBox VMs. Using this configuration, the swarm will be distributed and highly available, able to suffer the loss of one of the Manager nodes, without failing.

consul-post-2

Manager Nodes

First, we will create the initial Docker swarm Manager node.

SWARM_MANAGER_IP=$(docker-machine ip manager1)
echo ${SWARM_MANAGER_IP}

docker-machine ssh manager1 \
  "docker swarm init \
  --advertise-addr ${SWARM_MANAGER_IP}"

This initial Manager node advertises its IP address to future swarm members.

Next, we will create two additional swarm Manager nodes, which will join the initial Manager node, by using the initial Manager node’s advertised IP address. The three Manager nodes will then elect a single Leader to conduct orchestration tasks. According to Docker, Manager nodes implement the Raft Consensus Algorithm to manage the global cluster state.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

docker-machine env manager1
eval $(docker-machine env manager1)

MANAGER_SWARM_JOIN=$(docker-machine ssh ${vms[0]} "docker swarm join-token manager")
MANAGER_SWARM_JOIN=$(echo ${MANAGER_SWARM_JOIN} | grep -E "(docker).*(2377)" -o)
MANAGER_SWARM_JOIN=$(echo ${MANAGER_SWARM_JOIN//\\/''})
echo ${MANAGER_SWARM_JOIN}

for vm in ${vms[@]:1:2}
do
  docker-machine ssh ${vm} ${MANAGER_SWARM_JOIN}
done

A quick note on the string manipulation of the MANAGER_SWARM_JOIN variable, above. Running the docker swarm join-token manager command, outputs something similar to the following.

To add a manager to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-1s53oajsj19lgniar3x1gtz3z9x0iwwumlew0h9ism5alt2iic-1qxo1nx24pyd0pg61hr6pp47t \
    192.168.99.109:2377

Using a bit of string manipulation, the resulting value of the MANAGER_SWARM_JOIN variable will be similar to the following command. We then ssh into each host and execute this command, one for the Manager nodes and another similar command, for the Worker nodes.

docker swarm join --token SWMTKN-1-1s53oajsj19lgniar3x1gtz3z9x0iwwumlew0h9ism5alt2iic-1qxo1nx24pyd0pg61hr6pp47t 192.168.99.109:2377

Worker Nodes

Next, we will create three swarm Worker nodes, using a similar method. The three Worker nodes will join the three swarm Manager nodes, as part of the swarm cluster. The main difference, according to Docker, Worker node’s “sole purpose is to execute containers. Worker nodes don’t participate in the Raft distributed state, make in scheduling decisions, or serve the swarm mode HTTP API.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

WORKER_SWARM_JOIN=$(docker-machine ssh manager1 "docker swarm join-token worker")
WORKER_SWARM_JOIN=$(echo ${WORKER_SWARM_JOIN} | grep -E "(docker).*(2377)" -o)
WORKER_SWARM_JOIN=$(echo ${WORKER_SWARM_JOIN//\\/''})
echo ${WORKER_SWARM_JOIN}

for vm in ${vms[@]:3:3}
do
  docker-machine ssh ${vm} ${WORKER_SWARM_JOIN}
done

Using the docker node ls command, we should observe the resulting Docker swarm cluster looks similar to the following:

$ docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
u75gflqvu7blmt5mcr2dcp4g5 *  manager1  Ready   Active        Leader
0b7pkek76dzeedtqvvjssdt2s    manager2  Ready   Active        Reachable
s4mmra8qdgwukjsfg007rvbge    manager3  Ready   Active        Reachable
n89upik5v2xrjjaeuy9d4jybl    worker1   Ready   Active
nsy55qzavzxv7xmanraijdw4i    worker2   Ready   Active
hhn1l3qhej0ajmj85gp8qhpai    worker3   Ready   Active

Note the three swarm Manager nodes, three swarm Worker nodes, and the Manager, which was elected Leader. The other two Manager nodes are marked as Reachable. The asterisk indicates manager1 is the active machine.

I have also deployed Mano Marks’ Docker Swarm Visualizer to each of the swarm cluster’s three Manager nodes. This tool is described as a visualizer for Docker Swarm Mode using the Docker Remote API, Node.JS, and D3. It provides a great visualization the swarm cluster and its running components.

docker service create \
  --name swarm-visualizer \
  --publish 5001:8080/tcp \
  --constraint node.role==manager \
  --mode global \
  --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
  manomarks/visualizer:latest

The Swarm Visualizer should be available on any of the Manager’s IP addresses, on port 5001. We constrained the Visualizer to the Manager nodes, by using the node.role==manager constraint.

Docker Swarm Visualizer

HashiCorp Consul

Next, we will install HashiCorp Consul onto our swarm cluster of VirtualBox hosts. Consul will provide us with service discovery, health checking, and a hierarchical key/value store. Consul will be installed, such that we end up with six Consul Agents. Agents can run as Servers or Clients. Similar to Docker swarm mode, we will install three Consul servers and three Consul clients, all in one Consul datacenter (our set of six local VMs). This clustered configuration will ensure Consul is distributed and highly available, able to suffer the loss of one of the Consul servers instances, without failing.

consul-post-3

Consul Servers

Again, similar to Docker swarm mode, we will install the initial Consul server. Both the Consul servers and clients run inside Docker containers, one per swarm host.

consul_server="consul-server1"
docker-machine env manager1
eval $(docker-machine env manager1)

docker run -d \
  --net=host \
  --hostname ${consul_server} \
  --name ${consul_server} \
  --env "SERVICE_IGNORE=true" \
  --env "CONSUL_CLIENT_INTERFACE=eth0" \
  --env "CONSUL_BIND_INTERFACE=eth1" \
  --volume consul_data:/consul/data \
  --publish 8500:8500 \
  consul:latest \
  consul agent -server -ui -bootstrap-expect=3 -client=0.0.0.0 -advertise=${SWARM_MANAGER_IP} -data-dir="/consul/data"

The first Consul server advertises itself on its host’s IP address. The bootstrap-expect=3option instructs Consul to wait until three Consul servers are available before bootstrapping the cluster. This option also allows an initial Leader to be elected automatically. The three Consul servers form a consensus quorum, using the Raft consensus algorithm.

All Consul server and Consul client Docker containers will have a data directory (/consul/data), mapped to a volume (consul_data) on their corresponding VM hosts.

Consul provides a basic browser-based user interface. By using the -ui option each Consul server will have the UI available on port 8500.

Next, we will create two additional Consul Server instances, which will join the initial Consul server, by using the first Consul server’s advertised IP address.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

consul_servers=( "consul-server2" "consul-server3" )

i=0
for vm in ${vms[@]:1:2}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})

  docker run -d \
    --net=host \
    --hostname ${consul_servers[i]} \
    --name ${consul_servers[i]} \
    --env "SERVICE_IGNORE=true" \
    --env "CONSUL_CLIENT_INTERFACE=eth0" \
    --env "CONSUL_BIND_INTERFACE=eth1" \
    --volume consul_data:/consul/data \
    --publish 8500:8500 \
    consul:latest \
    consul agent -server -ui -client=0.0.0.0 -advertise='{{ GetInterfaceIP "eth1" }}' -retry-join=${SWARM_MANAGER_IP} -data-dir="/consul/data"
  let "i++"
done

Consul Clients

Next, we will install Consul clients on the three swarm worker nodes, using a similar method to the servers. According to Consul, The client is relatively stateless. The only background activity a client performs is taking part in the LAN gossip pool. The lack of the -server option, indicates Consul will install this agent as a client.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

consul_clients=( "consul-client1" "consul-client2" "consul-client3" )

i=0
for vm in ${vms[@]:3:3}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})
  docker rm -f $(docker ps -a -q)

  docker run -d \
    --net=host \
    --hostname ${consul_clients[i]} \
    --name ${consul_clients[i]} \
    --env "SERVICE_IGNORE=true" \
    --env "CONSUL_CLIENT_INTERFACE=eth0" \
    --env "CONSUL_BIND_INTERFACE=eth1" \
    --volume consul_data:/consul/data \
    consul:latest \
    consul agent -client=0.0.0.0 -advertise='{{ GetInterfaceIP "eth1" }}' -retry-join=${SWARM_MANAGER_IP} -data-dir="/consul/data"
  let "i++"
done

From inside the consul-server1 Docker container, on the manager1 host, using the consul members command, we should observe a current list of cluster members along with their current state.

$ docker exec -it consul-server1 consul members
Node            Address              Status  Type    Build  Protocol  DC
consul-client1  192.168.99.112:8301  alive   client  0.7.5  2         dc1
consul-client2  192.168.99.113:8301  alive   client  0.7.5  2         dc1
consul-client3  192.168.99.114:8301  alive   client  0.7.5  2         dc1
consul-server1  192.168.99.109:8301  alive   server  0.7.5  2         dc1
consul-server2  192.168.99.110:8301  alive   server  0.7.5  2         dc1
consul-server3  192.168.99.111:8301  alive   server  0.7.5  2         dc1

Note the three Consul servers, the three Consul clients, and the single datacenter, dc1. Also, note the address of each agent matches the IP address of their swarm hosts.

To further validate Consul is installed correctly, access Consul’s web-based UI using the IP address of any of Consul’s three server’s swarm host, on port 8500. Shown below is the initial view of Consul prior to services being registered. Note the Consol instance’s names. Also, notice the IP address of each Consul instance corresponds to the swarm host’s IP address, on which it resides.

Consul UI - Nodes

Service Container Registration

Having provisioned the VMs, and built the Docker swarm cluster and Consul cluster, there is one final step to prepare our environment for the deployment of containerized services. We want to provide service discovery and registration for Consul, using Glider Labs’ Registrator.

Registrator will be installed on each host, within a Docker container. According to Glider Labs’ website, “Registrator will automatically register and deregister services for any Docker container, as the containers come online.

We will install Registrator on only five of our six hosts. We will not install it on manager1. Since this host is already serving as both the Docker swarm Leader and Consul Leader, we will not be installing any additional service containers on that host. This is a personal choice, not a requirement.

vms=( "manager1" "manager2" "manager3"
      "worker1" "worker2" "worker3" )

for vm in ${vms[@]:1}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})

  HOST_IP=$(docker-machine ip ${vm})
  # echo ${HOST_IP}

  docker run -d \
    --name=registrator \
    --net=host \
    --volume=/var/run/docker.sock:/tmp/docker.sock \
    gliderlabs/registrator:latest \
      -internal consul://${HOST_IP:localhost}:8500
done

Multiple Service Discovery Options

You might be wondering why we are worried about service discovery and registration with Consul, using Registrator, considering Docker swarm mode already has service discovery. Deploying our services using swarm mode, we are relying on swarm mode for service discovery. I chose to include Registrator in this example, to demonstrate an alternative method of service discovery, which can be used with other tools such as Consul Template, for dynamic load-balancer templating.

We actually have a third option for automatic service registration, Spring Cloud Consul Discovery. I chose not use Spring Cloud Consul Discovery in this post, to register the Widget service. Spring Cloud Consul Discovery would have automatically registered the Spring Boot service with Consul. The actual Widget service stack contains MongoDB, as well as other non-Spring Boot service components, which I removed for this post, such as the NGINX load balancer. Using Registrator, all the containerized services, not only the Spring Boot services, are automatically registered.

You will note in the Widget source code, I commented out the @EnableDiscoveryClient annotation on the WidgetApplication class. If you want to use Spring Cloud Consul Discovery, simply uncomment this annotation.

Distributed Configuration

We are almost ready to deploy some services. For this post’s demonstration, we will deploy the Widget service stack. The Widget service stack is composed of a simple Spring Boot service, backed by MongoDB; it is easily deployed as a containerized application. I often use the service stack for testing and training.

Widgets represent inanimate objects, users purchase with points. Widgets have particular physical characteristics, such as product id, name, color, size, and price. The inventory of widgets is stored in the widgets MongoDB database.

Hierarchical Key/Value Store

Before we can deploy the Widget service, we need to store the Widget service’s Spring Profiles in Consul’s hierarchical key/value store. The Widget service’s profiles are sets of configuration the service needs to run in different environments, such as Development, QA, UAT, and Production. Profiles include configuration, such as port assignments, database connection information, and logging and security settings. The service’s active profile will be read by the service at startup, from Consul, and applied at runtime.

As opposed to the traditional Java properties’ key/value format the Widget service uses YAML to specify it’s hierarchical configuration data. Using Spring Cloud Consul Config, an alternative to Spring Cloud Config Server and Client, we will store the service’s Spring Profile in Consul as blobs of YAML.

There are multi ways to store configuration in Consul. Storing the Widget service’s profiles as YAML was the quickest method of migrating the existing application.yml file’s multiple profiles to Consul. I am not insisting YAML is the most effective method of storing configuration in Consul’s k/v store; that depends on the application’s requirements.

Using the Consul KV HTTP API, using the HTTP PUT method, we will place each profile, as a YAML blob, into the appropriate data key, in Consul. For convenience, I have separated the Widget service’s three existing Spring profiles into individual YAML files. We will load each of the YAML file’s contents into Consul, using curl.

Below is the Widget service’s default Spring profile. The default profile is activated if no other profiles are explicitly active when the application context starts.

endpoints:
  enabled: true
  sensitive: false
info:
  java:
    source: "${java.version}"
    target: "${java.version}"
logging:
  level:
    root: DEBUG
management:
  security:
    enabled: false
  info:
    build:
      enabled: true
    git:
      mode: full
server:
  port: 8030
spring:
  data:
    mongodb:
      database: widgets
      host: localhost
      port: 27017

Below is the Widget service’s docker-local Spring profile. This is the profile we will use when we deploy the Widget service to our swarm cluster. The docker-local profile overrides two properties of the default profile — the name of our MongoDB host and the logging level. All other configuration will come from the default profile.

logging:
  level:
    root: INFO
spring:
  data:
    mongodb:
      host: mongodb

To load the profiles into Consul, from the root of the Widget local git repository, we will execute curl commands. We will use the Consul cluster Leader’s IP address for our HTTP PUT methods.

docker-machine env manager1
eval $(docker-machine env manager1)

CONSUL_SERVER=$(docker-machine ip $(docker node ls | grep Leader | awk '{print $3}'))

# default profile
KEY="config/widget-service/data"
VALUE="consul-configs/default.yaml"
curl -X PUT --data-binary @${VALUE} \
  -H "Content-type: text/x-yaml" \
  ${CONSUL_SERVER:localhost}:8500/v1/kv/${KEY}

# docker-local profile
KEY="config/widget-service/docker-local/data"
VALUE="consul-configs/docker-local.yaml"
curl -X PUT --data-binary @${VALUE} \
  -H "Content-type: text/x-yaml" \
  ${CONSUL_SERVER:localhost}:8500/v1/kv/${KEY}

# docker-production profile
KEY="config/widget-service/docker-production/data"
VALUE="consul-configs/docker-production.yaml"
curl -X PUT --data-binary @${VALUE} \
  -H "Content-type: text/x-yaml" \
  ${CONSUL_SERVER:localhost}:8500/v1/kv/${KEY}

Returning to the Consul UI, we should now observe three Spring profiles, in the appropriate data keys, listed under the Key/Value tab. The default Spring profile YAML blob, the value, will be assigned to the config/widget-service/data key.

Consul UI - Docker Default Spring Profile

The docker-local profile will be assigned to the config/widget-service,docker-local/data key. The keys follow default spring.cloud.consul.config conventions.

Consul UI - Docker Local Spring Profile

Spring Cloud Consul Config

In order for our Spring Boot service to connect to Consul and load the requested active Spring Profile, we need to add a dependency to the gradle.build file, on Spring Cloud Consul Config.

dependencies {
  compile group: 'org.springframework.cloud', name: 'spring-cloud-starter-consul-all';
  ...
}

Next, we need to configure the bootstrap.yml file, to connect and properly read the profile. We must properly set the CONSUL_SERVER environment variable. This value is the Consul server instance hostname or IP address, which the Widget service instances will contact to retrieve its configuration.

spring:
  application:
    name: widget-service
  cloud:
    consul:
      host: ${CONSUL_SERVER:localhost}
      port: 8500
      config:
        fail-fast: true
        format: yaml

Deployment Using Docker Compose

We are almost ready to deploy the Widget service instances and the MongoDB instance (also considered a ‘service’ by Docker), in Docker containers, to the Docker swarm cluster. The Docker Compose file is written using version 3 of the Docker Compose specification. It takes advantage of some of the specification’s new features.

version: '3.0'

services:
  widget:
    image: garystafford/microservice-docker-demo-widget:latest
    depends_on:
    - widget_stack_mongodb
    hostname: widget
    ports:
    - 8030:8030/tcp
    networks:
    - widget_overlay_net
    deploy:
      mode: global
      placement:
        constraints: [node.role == worker]
    environment:
    - "CONSUL_SERVER_URL=${CONSUL_SERVER}"
    - "SERVICE_NAME=widget-service"
    - "SERVICE_TAGS=service"
    command: "java -Dspring.profiles.active=${WIDGET_PROFILE} -Djava.security.egd=file:/dev/./urandom -jar widget/widget-service.jar"

  mongodb:
    image: mongo:latest
    command:
    - --smallfiles
    hostname: mongodb
    ports:
    - 27017:27017/tcp
    networks:
    - widget_overlay_net
    volumes:
    - widget_data_vol:/data/db
    deploy:
      replicas: 1
      placement:
        constraints: [node.role == worker]
    environment:
    - "SERVICE_NAME=mongodb"
    - "SERVICE_TAGS=database"

networks:
  widget_overlay_net:
    external: true

volumes:
  widget_data_vol:
    external: true

External Container Volumes and Network

Note the external volume, widget_data_vol, which will be mounted to the MongoDB container, to the /data/db directory. The volume must be created on each host in the swarm cluster, which may contain the MongoDB instance.

Also, note the external overlay network, widget_overlay_net, which will be used by all the service containers in the service stack to communicate with each other. These must be created before deploying our services.

vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

for vm in ${vms[@]}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})
  docker volume create --name=widget_data_vol
  echo "Volume created: ${vm}..."
done

We should be able to view the new volume on one of the swarm Worker host, using the docker volume ls command.

$ docker volume ls
DRIVER              VOLUME NAME
local               widget_data_vol

The network is created once, for the whole swarm cluster.

docker-machine env manager1
eval $(docker-machine env manager1)

docker network create \
  --driver overlay \
  --subnet=10.0.0.0/16 \
  --ip-range=10.0.11.0/24 \
  --opt encrypted \
  --attachable=true \
  widget_overlay_net

We can view the new overlay network using the docker network ls command.

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
03bcf76d3cc4        bridge              bridge              local
533285df0ce8        docker_gwbridge     bridge              local
1f627d848737        host                host                local
6wdtuhwpuy4f        ingress             overlay             swarm
b8a1f277067f        none                null                local
4hy77vlnpkkt        widget_overlay_net  overlay             swarm

Deploying the Services

With the profiles loaded into Consul, and the overlay network and data volumes created on the hosts, we can deploy the Widget service instances and MongoDB service instance. We will assign the Consul cluster Leader’s IP address as the CONSUL_SERVER variable. We will assign the Spring profile, docker-local, to the WIDGET_PROFILE variable. Both these variables are used by the Widget service’s Docker Compose file.

docker-machine env manager1
eval $(docker-machine env manager1)

export CONSUL_SERVER=$(docker-machine ip $(docker node ls | grep Leader | awk '{print $3}'))
export WIDGET_PROFILE=docker-local

docker stack deploy --compose-file=docker-compose.yml widget_stack

The Docker images, used to instantiate the service’s Docker containers, must be pulled from Docker Hub, by each swarm host. Consequently, the initial deployment process can take up to several minutes, depending on your Internet connection.

After deployment, using the docker stack ls command, we should observe that the widget_stack stack is deployed to the swarm cluster with two services.

$ docker stack ls
NAME          SERVICES
widget_stack  2

After deployment, using the docker stack ps widget_stack command, we should observe that all the expected services in the widget_stack stack are running. Note this command also shows us where the services are running.

$ docker service ls
20:37:24-gstafford:~/Documents/projects/widget-docker-demo/widget-service$ docker stack ps widget_stack
ID            NAME                                           IMAGE                                                NODE     DESIRED STATE  CURRENT STATE          ERROR  PORTS
qic4j1pl6n4l  widget_stack_widget.hhn1l3qhej0ajmj85gp8qhpai  garystafford/microservice-docker-demo-widget:latest  worker3  Running        Running 4 minutes ago
zrbnyoikncja  widget_stack_widget.nsy55qzavzxv7xmanraijdw4i  garystafford/microservice-docker-demo-widget:latest  worker2  Running        Running 4 minutes ago
81z3ejqietqf  widget_stack_widget.n89upik5v2xrjjaeuy9d4jybl  garystafford/microservice-docker-demo-widget:latest  worker1  Running        Running 4 minutes ago
nx8dxlib3wyk  widget_stack_mongodb.1                         mongo:latest                                         worker1  Running        Running 4 minutes ago

Using the docker service ls command, we should observe that all the expected service instances (replicas) running, including all the widget_stack’s services and the three instances of the swarm-visualizer service.

$ docker service ls
ID            NAME                  MODE        REPLICAS  IMAGE
almmuqqe9v55  swarm-visualizer      global      3/3       manomarks/visualizer:latest
i9my74tl536n  widget_stack_widget   global      0/3       garystafford/microservice-docker-demo-widget:latest
ju2t0yjv9ily  widget_stack_mongodb  replicated  1/1       mongo:latest

Since it may take a few moments for the widget_stack’s services to come up, you may need to re-run the command until you see all expected replicas running.

$ docker service ls
ID            NAME                  MODE        REPLICAS  IMAGE
almmuqqe9v55  swarm-visualizer      global      3/3       manomarks/visualizer:latest
i9my74tl536n  widget_stack_widget   global      3/3       garystafford/microservice-docker-demo-widget:latest
ju2t0yjv9ily  widget_stack_mongodb  replicated  1/1       mongo:latest

If you see 3 of 3 replicas of the Widget service running, this is a good sign everything is working! We can confirm this, by checking back in with the Consul UI. We should see the three Widget services and the single instance of MongoDB. They were all registered with Registrator. For clarity, we have purposefully did not register the Swarm Visualizer instances with Consul, using Registrator’s "SERVICE_IGNORE=true" environment variable. Registrator will also not register itself with Consul.

Consul UI - Services

We can also view the Widget stack’s services, by revisiting the Swarm Visualizer, on any of the Docker swarm cluster Manager’s IP address, on port 5001.

Docker Swarm Visualizer - Deployed Services

Spring Boot Actuator

The most reliable way to confirm that the Widget service instances did indeed load their configuration from Consul, is to check the Spring Boot Actuator Environment endpoint for any of the Widget instances. As shown below, all configuration is loaded from the default profile, except for two configuration items, which override the default profile values and are loaded from the active docker-local Spring profile.

Spring Actuator Environment Endpoint

Consul Endpoint

There is yet another way to confirm Consul is working properly with our services, without accessing the Consul UI. This ability is provided by Spring Boot Actuator’s Consul endpoint. If the Spring Boot service has a dependency on Spring Cloud Consul Config, this endpoint will be available. It displays useful information about the Consul cluster and the services which have been registered with Consul.

Spring Actuator Consul Endpoint

Cleaning Up the Swarm

If you want to start over again, without destroying the Docker swarm cluster, you may use the following commands to delete all the stacks, services, stray containers, and unused images, networks, and volumes. The Docker swarm cluster and all Docker images pulled to the VMs will be left intact.

# remove all stacks and services
docker-machine env manager1
eval $(docker-machine env manager1)
for stack in $(docker stack ls | awk '{print $1}'); do docker stack rm ${stack}; done
for service in $(docker service ls | awk '{print $1}'); do docker service rm ${service}; done

# remove all containers, networks, and volumes
vms=( "manager1" "manager2" "manager3"
 "worker1" "worker2" "worker3" )

for vm in ${vms[@]}
do
  docker-machine env ${vm}
  eval $(docker-machine env ${vm})
  docker system prune -f
  docker stop $(docker ps -a -q)
  docker rm -f $(docker ps -a -q)
done

Conclusion

We have barely scratched the surface of the features and capabilities of Docker swarm mode or Consul. However, the post did demonstrate several key concepts, critical to configuring, deploying, and managing a modern, distributed, containerized service application platform. These concepts included cluster management, service discovery, service orchestration, distributed configuration, hierarchical key/value stores, and distributed and highly-available systems.

This post’s example was designed for demonstration purposes only and meant to simulate a typical Development or Test environment. Although the swarm cluster, Consul cluster, and the Widget service instances were deployed in a distributed and highly available configuration, this post’s example is far from being production-ready. Some things that would be considered, if you were to make this more production-like, include:

  • MongoDB database is a single point of failure. It should be deployed in a sharded and clustered configuration.
  • The swarm nodes were deployed to a single datacenter. For redundancy, the nodes should be spread across multiple physical hypervisors, separate availability zones and/or geographically separate datacenters (regions).
  • The example needs centralized logging, monitoring, and alerting, to better understand and react to how the swarm, Docker containers, and services are performing.
  • Most importantly, we made was no attempt to secure the services, containers, data, network, or hosts. Security is a critical component for moving this example to Production.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , ,

6 Comments