Archive for category Bash Scripting

Using the Google Cloud Dataproc WorkflowTemplates API to Automate Spark and Hadoop Workloads on GCP

In the previous post, Big Data Analytics with Java and Python, using Cloud Dataproc, Google’s Fully-Managed Spark and Hadoop Service, we explored Google Cloud Dataproc using the Google Cloud Console as well as the Google Cloud SDK and Cloud Dataproc API. We created clusters, then uploaded and ran Spark and PySpark jobs, then deleted clusters, each as discrete tasks. Although each task could be done via the Dataproc API and therefore automatable, they were independent tasks, without awareness of the previous task’s state.

Screen Shot 2018-12-15 at 11.39.26 PM.png

In this brief follow-up post, we will examine the Cloud Dataproc WorkflowTemplates API to more efficiently and effectively automate Spark and Hadoop workloads. According to Google, the Cloud Dataproc WorkflowTemplates API provides a flexible and easy-to-use mechanism for managing and executing Dataproc workflows. A Workflow Template is a reusable workflow configuration. It defines a graph of jobs with information on where to run those jobs. A Workflow is an operation that runs a Directed Acyclic Graph (DAG) of jobs on a cluster. Shown below, we see one of the Workflows that will be demonstrated in this post, displayed in Spark History Server Web UI.

screen-shot-2018-12-16-at-11.07.29-am.png

Here we see a four-stage DAG of one of the three jobs in the workflow, displayed in Spark History Server Web UI.

screen-shot-2018-12-16-at-11.18.45-am

Workflows are ideal for automating large batches of dynamic Spark and Hadoop jobs, and for long-running and unattended job execution, such as overnight.

Demonstration

Using the Python and Java projects from the previous post, we will first create workflow templates using the just the WorkflowTemplates API. We will create the template, set a managed cluster, add jobs to the template, and instantiate the workflow. Next, we will further optimize and simplify our workflow by using a YAML-based workflow template file. The YAML-based template file eliminates the need to make API calls to set the template’s cluster and add the jobs to the template. Finally, to further enhance the workflow and promote re-use of the template, we will incorporate parameterization. Parameters will allow us to pass parameters (key/value) pairs from the command line to workflow template, and on to the Python script as input arguments.

It is not necessary to use the Google Cloud Console for this post. All steps will be done using Google Cloud SDK shell commands. This means all steps may be automated using CI/CD DevOps tools, like Jenkins and Spinnaker on GKE.

Source Code

All open-sourced code for this post can be found on GitHub within three repositories: dataproc-java-demodataproc-python-demo, and dataproc-workflow-templates. Source code samples are displayed as GitHub Gists, which may not display correctly on all mobile and social media browsers.

WorkflowTemplates API

Always start by ensuring you have the latest Google Cloud SDK updates and are working within the correct Google Cloud project.

gcloud components update

export PROJECT_ID=your-project-id 
gcloud config set project $PROJECT

Set the following variables based on your Google environment. The variables will be reused throughout the post for multiple commands.

export REGION=your-region
export ZONE=your-zone
export BUCKET_NAME=your-bucket

The post assumes you still have the Cloud Storage bucket we created in the previous post. In the bucket, you will need the two Kaggle IBRD CSV files, available on Kaggle, the compiled Java JAR file from the dataproc-java-demo project, and a new Python script, international_loans_dataproc.py, from the dataproc-python-demo project.

screen-shot-2018-12-16-at-12.03.51-pm

Use gsutil with the copy (cp) command to upload the four files to your Storage bucket.

gsutil cp data/ibrd-statement-of-loans-*.csv $BUCKET_NAME
gsutil cp build/libs/dataprocJavaDemo-1.0-SNAPSHOT.jar $BUCKET_NAME
gsutil cp international_loans_dataproc.py $BUCKET_NAME

Following Google’s suggested process, we create a workflow template using the workflow-templates create command.

export TEMPLATE_ID=template-demo-1
  
gcloud dataproc workflow-templates create \
  $TEMPLATE_ID --region $REGION

Adding a Cluster

Next, we need to set a cluster for the workflow to use, in order to run the jobs. Cloud Dataproc will create and use a Managed Cluster for your workflow or use an existing cluster. If the workflow uses a managed cluster, it creates the cluster, runs the jobs, and then deletes the cluster when the jobs are finished. This means, for many use cases, there is no need to maintain long-lived clusters, they become just an ephemeral part of the workflow.

We set a managed cluster for our Workflow using the workflow-templates set-managed-cluster command. We will re-use the same cluster specifications we used in the previous post, the Standard, 1 master node and 2 worker nodes, cluster type.

gcloud dataproc workflow-templates set-managed-cluster \
  $TEMPLATE_ID \
  --region $REGION \
  --zone $ZONE \
  --cluster-name three-node-cluster \
  --master-machine-type n1-standard-4 \
  --master-boot-disk-size 500 \
  --worker-machine-type n1-standard-4 \
  --worker-boot-disk-size 500 \
  --num-workers 2 \
  --image-version 1.3-deb9

Alternatively, if we already had an existing cluster, we would use the workflow-templates set-cluster-selector command, to associate that cluster with the workflow template.

gcloud dataproc workflow-templates set-cluster-selector \
  $TEMPLATE_ID \
  --region $REGION \
  --cluster-labels goog-dataproc-cluster-uuid=$CLUSTER_UUID

To get the existing cluster’s UUID label value, you could use a command similar to the following.

CLUSTER_UUID=$(gcloud dataproc clusters describe $CLUSTER_2 \
  --region $REGION \
  | grep 'goog-dataproc-cluster-uuid:' \
  | sed 's/.* //')

echo $CLUSTER_UUID

1c27efd2-f296-466e-b14e-c4263d0d7e19

Adding Jobs

Next, we add the jobs we want to run to the template. Each job is considered a step in the template, each step requires a unique step id. We will add three jobs to the template, two Java-based Spark jobs from the previous post, and a new Python-based PySpark job.

First, we add the two Java-based Spark jobs, using the workflow-templates add-job spark command. This command’s flags are nearly identical to the dataproc jobs submit spark command, used in the previous post.

export STEP_ID=ibrd-small-spark
  
gcloud dataproc workflow-templates add-job spark \
  --region $REGION \
  --step-id $STEP_ID \
  --workflow-template $TEMPLATE_ID \
  --class org.example.dataproc.InternationalLoansAppDataprocSmall \
  --jars $BUCKET_NAME/dataprocJavaDemo-1.0-SNAPSHOT.jar

export STEP_ID=ibrd-large-spark
  
gcloud dataproc workflow-templates add-job spark \
  --region $REGION \
  --step-id $STEP_ID \
  --workflow-template $TEMPLATE_ID \
  --class org.example.dataproc.InternationalLoansAppDataprocLarge \
  --jars $BUCKET_NAME/dataprocJavaDemo-1.0-SNAPSHOT.jar

Next, we add the Python-based PySpark job, international_loans_dataproc.py, as the second job in the template. This Python script requires three input arguments, on lines 15–17, which are the bucket where the data is located and the and results are placed, the name of the data file, and the directory in the bucket where the results will be placed (gist).

We pass the arguments to the Python script as part of the PySpark job, using the workflow-templates add-job pyspark command.

export STEP_ID=ibrd-large-pyspark
  
gcloud dataproc workflow-templates add-job pyspark \
  $BUCKET_NAME/international_loans_dataproc.py \
  --step-id $STEP_ID \
  --workflow-template $TEMPLATE_ID \
  --region $REGION \
  -- $BUCKET_NAME \
     ibrd-statement-of-loans-historical-data.csv \
     ibrd-summary-large-python

That’s it, we have created our first Cloud Dataproc Workflow Template using the Dataproc WorkflowTemplate API. To view our template we can use the following two commands. First, use the workflow-templates list command to display a list of available templates. The list command output displays the version of the workflow template and how many jobs are in the template.

gcloud dataproc workflow-templates list --region $REGION
  
ID               JOBS  UPDATE_TIME               VERSION
template-demo-1  3     2018-12-15T16:32:06.508Z  5

Then, we use the workflow-templates describe command to show the details of a specific template.

gcloud dataproc workflow-templates describe \
  $TEMPLATE_ID --region $REGION

Using the workflow-templates describe command, we should see output similar to the following (gist).

In the template description, notice the template’s id, the managed cluster in the placement section, and the three jobs, all which we added using the above series of workflow-templates commands. Also, notice the creation and update timestamps and version number, which were automatically generated by Dataproc. Lastly, notice the name, which refers to the GCP project and region where this copy of the template is located. Had we used an existing cluster with our workflow, as opposed to a managed cluster, the placement section would have looked as follows.

placement:
  clusterSelector:
    clusterLabels:
      goog-dataproc-cluster-uuid: your_clusters_uuid_label_value

To instantiate the workflow, we use the workflow-templates instantiate command. This command will create the managed cluster, run all the steps (jobs), then delete the cluster. I have added the time command to see how fast the workflow will take to complete.

time gcloud dataproc workflow-templates instantiate \
  $TEMPLATE_ID --region $REGION #--async

We can observe the progress from the Google Cloud Dataproc Console, or from the command line by omitting the --async flag. Below we see the three jobs completed successfully on the managed cluster.

Waiting on operation [projects/dataproc-demo-224523/regions/us-east1/operations/e720bb96-9c87-330e-b1cd-efa4612b3c57].
WorkflowTemplate [template-demo-1] RUNNING
Creating cluster: Operation ID [projects/dataproc-demo-224523/regions/us-east1/operations/e1fe53de-92f2-4f8c-8b3a-fda5e13829b6].
Created cluster: three-node-cluster-ugdo4ygpl52bo.
Job ID ibrd-small-spark-ugdo4ygpl52bo RUNNING
Job ID ibrd-large-spark-ugdo4ygpl52bo RUNNING
Job ID ibrd-large-pyspark-ugdo4ygpl52bo RUNNING
Job ID ibrd-small-spark-ugdo4ygpl52bo COMPLETED
Job ID ibrd-large-spark-ugdo4ygpl52bo COMPLETED
Job ID ibrd-large-pyspark-ugdo4ygpl52bo COMPLETED
Deleting cluster: Operation ID [projects/dataproc-demo-224523/regions/us-east1/operations/f2a40c33-3cdf-47f5-92d6-345463fbd404].
WorkflowTemplate [template-demo-1] DONE
Deleted cluster: three-node-cluster-ugdo4ygpl52bo.

1.02s user 0.35s system 0% cpu 5:03.55 total

In the output, you see the creation of the cluster, the three jobs running and completing successfully, and finally the cluster deletion. The entire workflow took approximately 5 minutes to complete. Below is the view of the workflow’s results from the Dataproc Clusters Console Jobs tab.

screen_shot_2018-12-15_at_11.42.44_am

Below we see the output from the PySpark job, run as part of the workflow template, shown in the Dataproc Clusters Console Output tab. Notice the three input arguments we passed to the Python script from the workflow template, listed in the output.

screen_shot_2018-12-15_at_11.43.56_am

We see the arguments passed to the job, from the Jobs Configuration tab.

screen_shot_2018-12-15_at_1.11.11_pm.png

Examining the Google Cloud Dataproc Jobs Console, we will observe that the WorkflowTemplate API automatically adds a unique alphanumeric extension to both the name of the managed clusters we create, as well as to the name of each job that is run. The extension on the cluster name matches the extension on the jobs ran on that cluster.

screen_shot_2018-12-15_at_1.05.41_pm

YAML-based Workflow Template

Although, the above WorkflowTemplates API-based workflow was certainly more convenient than using the individual Cloud Dataproc API commands. At a minimum, we don’t have to remember to delete our cluster when the jobs are complete, as I often do. To further optimize the workflow, we will introduce YAML-based Workflow Template. According to Google, you can define a workflow template in a YAML file, then instantiate the template to run the workflow. You can also import and export a workflow template YAML file to create and update a Cloud Dataproc workflow template resource.

We can export our first workflow template to create our YAML-based template file.

gcloud dataproc workflow-templates export template-demo-1 \
  --destination template-demo-2.yaml \
  --region $REGION

Below is our first YAML-based template, template-demo-2.yaml. You will need to replace the values in the template with your own values, based on your environment (gist).

Note the template looks almost similar to the template we just created previously using the WorkflowTemplates API. The YAML-based template requires the placement and jobs fields. All the available fields are detailed, here.

To run the template we use the workflow-templates instantiate-from-file command. Again, I will use the time command to measure performance.

time gcloud dataproc workflow-templates instantiate-from-file \
  --file template-demo-2.yaml \
  --region $REGION

Running the workflow-templates instantiate-from-file command will run a workflow, nearly identical to the workflow we ran in the previous example, with a similar timing. Below we see the three jobs completed successfully on the managed cluster, in approximately the same time as the previous workflow.

Waiting on operation [projects/dataproc-demo-224523/regions/us-east1/operations/7ba3c28e-ebfa-32e7-9dd6-d938a1cfe23b].
WorkflowTemplate RUNNING
Creating cluster: Operation ID [projects/dataproc-demo-224523/regions/us-east1/operations/8d05199f-ed36-4787-8a28-ae784c5bc8ae].
Created cluster: three-node-cluster-5k3bdmmvnna2y.
Job ID ibrd-small-spark-5k3bdmmvnna2y RUNNING
Job ID ibrd-large-spark-5k3bdmmvnna2y RUNNING
Job ID ibrd-large-pyspark-5k3bdmmvnna2y RUNNING
Job ID ibrd-small-spark-5k3bdmmvnna2y COMPLETED
Job ID ibrd-large-spark-5k3bdmmvnna2y COMPLETED
Job ID ibrd-large-pyspark-5k3bdmmvnna2y COMPLETED
Deleting cluster: Operation ID [projects/dataproc-demo-224523/regions/us-east1/operations/a436ae82-f171-4b0a-9b36-5e16406c75d5].
WorkflowTemplate DONE
Deleted cluster: three-node-cluster-5k3bdmmvnna2y.

1.16s user 0.44s system 0% cpu 4:48.84 total

Parameterization of Templates

To further optimize the workflow template process for re-use, we have the option of passing parameters to our template. Imagine you now receive new loan snapshot data files every night. Imagine you need to run the same data analysis on the financial transactions of thousands of your customers, nightly. Parameterizing templates makes it more flexible and reusable. By removing hard-codes values, such as Storage bucket paths and data file names, a single template may be re-used for multiple variations of the same job. Parameterization allows you to automate hundreds or thousands of Spark and Hadoop jobs in a workflow or workflows, each with different parameters, programmatically.

To demonstrate the parameterization of a workflow template, we create another YAML-based template with just the Python/PySpark job, template-demo-3.yaml. If you recall from our first example, the Python script, international_loans_dataproc.py, requires three input arguments: the bucket where the data is located and the and results are placed, the name of the data file, and the directory in the bucket, where the results will be placed.

We will replace four of the values in the template with parameters. We will inject those parameter’s values when we instantiate the workflow. Below is the new parameterized template. The template now has a parameters section from lines 26–46. They define parameters that will be used to replace the four values on lines 3–7 (gist).

Note the PySpark job’s three arguments and the location of the Python script have been parameterized. Parameters may include validation. As an example of validation, the template uses regex to validate the format of the Storage bucket path. The regex follows Google’s RE2 regular expression library syntax. If you need help with regex, the Regex Tester – Golang website is a convenient way to test your parameter’s regex validations.

First, we import the new parameterized YAML-based workflow template, using the workflow-templates import command. Then, we instantiate the template using the workflow-templates instantiate command. The workflow-templates instantiate command will run the single PySpark job, analyzing the smaller IBRD data file, and placing the resulting Parquet-format file in a directory within the Storage bucket. We pass the Python script location, bucket link, smaller IBRD data file name, and output directory, as parameters to the template, and therefore indirectly, three of these, as input arguments to the Python script.

export TEMPLATE_ID=template-demo-3

gcloud dataproc workflow-templates import $TEMPLATE_ID \
   --region $REGION --source template-demo-3.yaml
  
gcloud dataproc workflow-templates instantiate \
  $TEMPLATE_ID --region $REGION --async \
  --parameters MAIN_PYTHON_FILE="$BUCKET_NAME/international_loans_dataproc.py",STORAGE_BUCKET=$BUCKET_NAME,IBRD_DATA_FILE="ibrd-statement-of-loans-latest-available-snapshot.csv",RESULTS_DIRECTORY="ibrd-summary-small-python"

Next, we will analyze the larger historic data file, using the same parameterized YAML-based workflow template, but changing two of the four parameters we are passing to the template with the workflow-templates instantiate command. This will run a single PySpark job on the larger IBRD data file and place the resulting Parquet-format file in a different directory within the Storage bucket.

time gcloud dataproc workflow-templates instantiate \
  $TEMPLATE_ID --region $REGION \
  --parameters MAIN_PYTHON_FILE="$BUCKET_NAME/international_loans_dataproc.py",STORAGE_BUCKET=$BUCKET_NAME,IBRD_DATA_FILE="ibrd-statement-of-loans-historical-data.csv",RESULTS_DIRECTORY="ibrd-summary-large-python"

This is the power of parameterization—one workflow template and one job script, but two different datasets and two different results.

Below we see the single PySpark job ran on the managed cluster.

Waiting on operation [projects/dataproc-demo-224523/regions/us-east1/operations/b3c5063f-e3cf-3833-b613-83db12b82f32].
WorkflowTemplate [template-demo-3] RUNNING
Creating cluster: Operation ID [projects/dataproc-demo-224523/regions/us-east1/operations/896b7922-da8e-49a9-bd80-b1ac3fda5105].
Created cluster: three-node-cluster-j6q2al2mkkqck.
Job ID ibrd-pyspark-j6q2al2mkkqck RUNNING
Job ID ibrd-pyspark-j6q2al2mkkqck COMPLETED
Deleting cluster: Operation ID [projects/dataproc-demo-224523/regions/us-east1/operations/fe4a263e-7c6d-466e-a6e2-52292cbbdc9b].
WorkflowTemplate [template-demo-3] DONE
Deleted cluster: three-node-cluster-j6q2al2mkkqck.

0.98s user 0.40s system 0% cpu 4:19.42 total

Using the workflow-templates list command again, should display a list of two workflow templates.

gcloud dataproc workflow-templates list --region $REGION
  
ID               JOBS  UPDATE_TIME               VERSION
template-demo-3  1     2018-12-15T17:04:39.064Z  2
template-demo-1  3     2018-12-15T16:32:06.508Z  5

Looking within the Google Cloud Storage bucket, we should now see four different folders, the results of the workflows.

screen-shot-2018-12-16-at-11.58.32-am.png

Job Results and Testing

To check on the status of a job, we use the dataproc jobs wait command. This returns the standard output (stdout) and standard error (stderr) for that specific job.

export SET_ID=ibrd-large-dataset-pyspark-cxzzhr2ro3i54
  
gcloud dataproc jobs wait $SET_ID \
  --project $PROJECT_ID \
  --region $REGION

The dataproc jobs wait command is frequently used for automated testing of jobs, often within a CI/CD pipeline. Assume we have expected part of the job output that indicates success, such as a string, boolean, or numeric value. We could any number of test frameworks or other methods to confirm the existence of that expected value or values. Below is a simple example of using the grep command to check for the existence of the expected line ‘ state: FINISHED’ in the standard output of the dataproc jobs wait command.

command=$(gcloud dataproc jobs wait $SET_ID \
--project $PROJECT_ID \
--region $REGION) &>/dev/null

if grep -Fqx "  state: FINISHED" <<< $command &>/dev/null; then
  echo "Job Success!"
else
  echo "Job Failure?"
fi

# single line alternative
if grep -Fqx "  state: FINISHED" <<< $command &>/dev/null;then echo "Job Success!";else echo "Job Failure?";fi

Job Success!

Individual Operations

To view individual workflow operations, use the operations list and operations describe commands. The operations list command will list all operations.

Notice the three distinct series of operations within each workflow, shown with the operations list command: WORKFLOW, CREATE, and DELETE. In the example below, I’ve separated the operations by workflow, for better clarity.

gcloud dataproc operations list --region $REGION

NAME                                  TIMESTAMP                 TYPE      STATE  ERROR  WARNINGS
fe4a263e-7c6d-466e-a6e2-52292cbbdc9b  2018-12-15T17:11:45.178Z  DELETE    DONE
896b7922-da8e-49a9-bd80-b1ac3fda5105  2018-12-15T17:08:38.322Z  CREATE    DONE
b3c5063f-e3cf-3833-b613-83db12b82f32  2018-12-15T17:08:37.497Z  WORKFLOW  DONE
---
be0e5293-275f-46ad-b1f4-696ba44c222e  2018-12-15T17:07:26.305Z  DELETE    DONE
6784078c-cbe3-4c1e-a56e-217149f555a4  2018-12-15T17:04:40.613Z  CREATE    DONE
fcd8039e-a260-3ab3-ad31-01abc1a524b4  2018-12-15T17:04:40.007Z  WORKFLOW  DONE
---
b4b23ca6-9442-4ffb-8aaf-460bac144dd8  2018-12-15T17:02:16.744Z  DELETE    DONE
89ef9c7c-f3c9-4d01-9091-61ed9e1f085d  2018-12-15T17:01:45.514Z  CREATE    DONE
243fa7c1-502d-3d7a-aaee-b372fe317570  2018-12-15T17:01:44.895Z  WORKFLOW  DONE

We use the results of the operations list command to execute the operations describe command to describe a specific operation.

gcloud dataproc operations describe \
  projects/$PROJECT_ID/regions/$REGION/operations/896b7922-da8e-49a9-bd80-b1ac3fda5105

Each type of operation contains different details. Note the fine-grain of detail we get from Dataproc using the operations describe command for a CREATE operation (gist).

Conclusion

In this brief, follow-up post to the previous post, Big Data Analytics with Java and Python, using Cloud Dataproc, Google’s Fully-Managed Spark and Hadoop Service, we have seen how easy the WorkflowTemplates API and YAML-based workflow templates make automating our analytics jobs. This post only scraped the surface of the complete functionality of the WorkflowTemplates API and parameterization of templates.

In a future post, we leverage the automation capabilities of the Google Cloud Platform, the WorkflowTemplates API, YAML-based workflow templates, and parameterization, to develop a fully-automated DevOps for Big Data workflow, capable of running hundreds of Spark and Hadoop jobs.

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , , ,

1 Comment

Streaming Docker Logs to the Elastic Stack (ELK Stack) using Fluentd

Kibana

Introduction

Fluentd and Docker’s native logging driver for Fluentd makes it easy to stream Docker logs from multiple running containers to the Elastic Stack. In this post, we will use Fluentd to stream Docker logs from multiple instances of a Dockerized Spring Boot RESTful service and MongoDB, to the Elastic Stack (ELK).

log_message_flow_notype

In a recent post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, we built a Consul cluster using Docker swarm mode, to host distributed configurations for a Spring Boot service. We will use the resulting swarm cluster from the previous post as a foundation for this post.

Fluentd

According to the Fluentd website, Fluentd is described as an open source data collector, which unifies data collection and consumption for a better use and understanding of data. Fluentd combines all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations. Fluentd structures data as JSON as much as possible.

Logging Drivers

Docker includes multiple logging mechanisms to get logs from running containers and services. These mechanisms are called logging drivers. Fluentd is one of the ten current Docker logging drivers. According to Docker, The fluentd logging driver sends container logs to the Fluentd collector as structured log data. Then, users can utilize any of the various output plugins, from Fluentd, to write these logs to various destinations.

Elastic Stack

The ELK Stack, now known as the Elastic Stack, is the combination of Elastic’s very popular products: Elasticsearch, Logstash, and Kibana. According to Elastic, the Elastic Stack provides real-time insights from almost any type of structured and unstructured data source.

Setup

All code for this post has been tested on both MacOS and Linux. For this post, I am provisioning and deploying to a Linux workstation, running the most recent release of Fedora and Oracle VirtualBox. If you want to use AWS or another infrastructure provider instead of VirtualBox to build your swarm, it is fairly easy to switch the Docker Machine driver and change a few configuration items in the vms_create.sh script (see Provisioning, below).

Required Software

If you want to follow along with this post, you will need the latest versions of git, Docker, Docker Machine, Docker Compose, and VirtualBox installed.

Source Code

All source code for this post is located in two GitHub repositories. The first repository contains scripts to provision the VMs, create an overlay network and persistent host-mounted volumes, build the Docker swarm, and deploy Consul, Registrator, Swarm Visualizer, Fluentd, and the Elastic Stack. The second repository contains scripts to deploy two instances of the Widget Spring Boot RESTful service and a single instance of MongoDB. You can execute all scripts manually, from the command-line, or from a CI/CD pipeline, using tools such as Jenkins.

Provisioning the Swarm

To start, clone the first repository, and execute the single run_all.sh script, or execute the seven individual scripts necessary to provision the VMs, create the overlay network and host volumes, build the swarm, and deploy Consul, Registrator, Swarm Visualizer, Fluentd, and the Elastic Stack. Follow the steps below to complete this part.

When the scripts have completed, the resulting swarm should be configured similarly to the diagram below. Consul, Registrator, Swarm Visualizer, Fluentd, and the Elastic Stack containers should be distributed across the three swarm manager nodes and the three swarm worker nodes (VirtualBox VMs).

swarm_fluentd_diagram

Deploying the Application

Next, clone the second repository, and execute the single run_all.sh script, or execute the four scripts necessary to deploy the Widget Spring Boot RESTful service and a single instance of MongoDB. Follow the steps below to complete this part.

When the scripts have completed, the Widget service and MongoDB containers should be distributed across two of the three swarm worker nodes (VirtualBox VMs).

swarm_fluentd_diagram_b

To confirm the final state of the swarm and the running container stacks, use the following Docker commands.

Open the Swarm Visualizer web UI, using any of the swarm manager node IPs, on port 5001, to confirm the swarm health, as well as the running container’s locations.

Visualizer

Lastly, open the Consul Web UI, using any of the swarm manager node IPs, on port 5601, to confirm the running container’s health, as well as their placement on the swarm nodes.

Consul_1

Streaming Logs

Elastic Stack

If you read the previous post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, you will notice we deployed a few additional components this time. First, the Elastic Stack (aka ELK), is deployed to the worker3 swarm worker node, within a single container. I have increased the CPU count and RAM assigned to this VM, to minimally run the Elastic Stack. If you review the docker-compose.yml file, you will note I am using Sébastien Pujadas’ sebp/elk:latest Docker base image from Docker Hub to provision the Elastic Stack. At the time of the post, this was based on the 5.3.0 version of ELK.

Docker Logging Driver

The Widget stack’s docker-compose.yml file has been modified since the last post. The compose file now incorporates a Fluentd logging configuration section for each service. The logging configuration includes the address of the Fluentd instance, on the same swarm worker node. The logging configuration also includes a tag for each log message.

Fluentd

In addition to the Elastic Stack, we have deployed Fluentd to the worker1 and worker2 swarm nodes. This is also where the Widget and MongoDB containers are deployed. Again, looking at the docker-compose.yml file, you will note we are using a custom Fluentd Docker image, garystafford/custom-fluentd:latest, which I created. The custom image is available on Docker Hub.

The custom Fluentd Docker image is based on Fluentd’s official onbuild Docker image, fluent/fluentd:onbuild. Fluentd provides instructions for building your own custom images, from their onbuild base images.

There were two reasons I chose to create a custom Fluentd Docker image. First, I added the Uken Games’ Fluentd Elasticsearch Plugin, to the Docker Image. This highly configurable Fluentd Output Plugin allows us to push Docker logs, processed by Fluentd to the Elasticsearch. Adding additional plugins is a common reason for creating a custom Fluentd Docker image.

The second reason to create a custom Fluentd Docker image was configuration. Instead of bind-mounting host directories or volumes to the multiple Fluentd containers, to provide Fluentd’s configuration, I baked the configuration file into the immutable Docker image. The bare-bones, basicFluentd configuration file defines three processes, which are Input, Filter, and Output. These processes are accomplished using Fluentd plugins. Fluentd has 6 types of plugins: Input, Parser, Filter, Output, Formatter and Buffer. Fluentd is primarily written in Ruby, and its plugins are Ruby gems.

Fluentd listens for input on tcp port 24224, using the forward Input Plugin. Docker logs are streamed locally on each swarm node, from the Widget and MongoDB containers to the local Fluentd container, over tcp port 24224, using Docker’s fluentd logging driver, introduced earlier. Fluentd

Fluentd then filters all input using the stdout Filter Plugin. This plugin prints events to stdout, or logs if launched with daemon mode. This is the most basic method of filtering.

Lastly, Fluentd outputs the filtered input to two destinations, a local log file and Elasticsearch. First, the Docker logs are sent to a local Fluentd log file. This is only for demonstration purposes and debugging. Outputting log files is not recommended for production, nor does it meet the 12-factor application recommendations for logging. Second, Fluentd outputs the Docker logs to Elasticsearch, over tcp port 9200, using the Fluentd Elasticsearch Plugin, introduced above.

log_message_flow

Additional Metadata

In addition to the log message itself, in JSON format, the fluentd log driver sends the following metadata in the structured log message: container_id, container_name, and source. This is helpful in identifying and categorizing log messages from multiple sources. Below is a sample of log messages from the raw Fluentd log file, with the metadata tags highlighted in yellow. At the bottom of the output is a log message parsed with jq, for better readability.

fluentd_logs

Using Elastic Stack

Now that our two Docker stacks are up and running on our swarm, we should be streaming logs to Elasticsearch. To confirm this, open the Kibana web console, which should be available at the IP address of the worker3 swarm worker node, on port 5601.

Kibana

For the sake of this demonstration, I increased the verbosity of the Spring Boot Widget service’s log level, from INFO to DEBUG, in Consul. At this level of logging, the two Widget services and the single MongoDB instance were generating an average of 250-400 log messages every 30 seconds, according to Kibana.

If that seems like a lot, keep in mind, these are Docker logs, which are single-line log entries. We have not aggregated multi-line messages, such as Java exceptions and stack traces messages, into single entries. That is for another post. Also, the volume of debug-level log messages generated by the communications between the individual services and Consul is fairly verbose.

Kibana_3

Inspecting log entries in Kibana, we find the metadata tags contained in the raw Fluentd log output are now searchable fields: container_id, container_name, and source, as well as log. Also, note the _type field, with a value of ‘fluentd’. We injected this field in the output section of our Fluentd configuration, using the Fluentd Elasticsearch Plugin. The _type fiel allows us to differentiate these log entries from other potential data sources.

Kibana_2.png

References

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , ,

3 Comments

Using Weave to Network a Docker Multi-Container Java Application

Use the latest version of Weaveworks’ Weave Net to network a multi-container, Dockerized Java Spring web application.

Introduction Weave Image

Introduction

The last post demonstrated how to build and deploy the Java Spring Music application to a VirtualBox, multi-container test environment. The environment contained (1) NGINX container, (2) load-balanced Tomcat containers, (1) MongoDB container, (1) ELK Stack container, and (1) Logspout container, all on one VM.

Spring Music

In that post, we used Docker’s links option. The links options, which modifies the container’s /etc/hosts file, allows two Docker containers to communicate with each other. For example, the NGINX container is linked to both Tomcat containers:

proxy:
  build: nginx/
  ports: "80:80"
  links:
   - app01
   - app02

Although container linking works, links are not very practical beyond a small number of static containers or a single container host. With linking, you must explicitly define each service-to-container relationship you want Docker to configure. Linking is not an option with Docker Swarm to link containers across multiple virtual machine container hosts. With Docker Networking in its early ‘experimental’ stages and the Swarm limitation, it’s hard to foresee the use of linking for any uses beyond limited development and test environments.

Weave Net

Weave Net, aka Weave, is one of a trio of products developed by Weaveworks. The other two members of the trio include Weave Run and Weave Scope. According to Weaveworks’ website, ‘Weave Net connects all your containers into a transparent, dynamic and resilient mesh. This is one of the easiest ways to set up clustered applications that run anywhere.‘ Weave allows us to eliminate the dependency on the links connect our containers. Weave does all the linking of containers for us automatically.

Weave v1.1.0

If you worked with previous editions of Weave, you will appreciate that Weave versions v1.0.x and v1.1.0 are significant steps forward in the evolution of Weave. Weaveworks’ GitHub Weave Release page details the many improvements. I also suggest reading Weave ‘Gossip’ DNS, on Weavework’s blog, before continuing. The post details the improvements of Weave v1.1.0. Some of those key new features include:

  • Completely redesigned weaveDNS, dubbed ‘Gossip DNS’
  • Registrations are broadcast to all weaveDNS instances
  • Registered entries are stored in-memory and handle lookups locally
  • Weave router’s gossip implementation periodically synchronizes DNS mappings between peers
  • Ability to recover from network partitions and other transient failures
  • Each peer is aware of the hostnames and IP address of all containers in the Weave network.
  • weave launch now launches all weave components, including the router, weaveDNS and the proxy, greatly simplifying setup
  • weaveDNS is now embedded in the Weave router

Weave-based Network

In this post, we will reuse the Java Spring Music application from the last post. However, we will replace the project’s static dependencies on Docker links with Weave. This post will demonstrate the most basic features of Weave, using a single cluster. In a future post, we will demonstrate how easily Weave also integrates with multiple clusters.

All files for this post can be found in the swarm-weave branch of the GitHub Repository. Instructions to clone are below.

Configuration

If you recall from the previous post, the Docker Compose YAML file (docker-compose.yml) looked similar to this:

proxy:
  build: nginx/
  ports: "80:80"
  links:
   - app01
   - app02
  hostname: "proxy"

app01:
  build: tomcat/
  expose: "8080"
  ports: "8180:8080"
  links:
   - nosqldb
   - elk
  hostname: "app01"

app02:
  build: tomcat/
  expose: "8080"
  ports: "8280:8080"
  links:
   - nosqldb
   - elk
  hostname: "app01"

nosqldb:
  build: mongo/
  hostname: "nosqldb"
  volumes: "/opt/mongodb:/data/db"

elk:
  build: elk/
  ports:
   - "8081:80"
   - "8082:9200"
  expose: "5000/upd"

logspout:
  build: logspout/
  volumes: "/var/run/docker.sock:/tmp/docker.sock"
  links: elk
  ports: "8083:80"
  environment: ROUTE_URIS=logstash://elk:5000

Implementing Weave simplifies the docker-compose.yml, considerably. Below is the new Weave version of the docker-compose.yml. The links option have been removed from all containers. Additionally, the hostnames have been removed, as they serve no real purpose moving forward. The logspout service’s environment option has been modified to use the elk container’s full name as opposed to the hostname.

The only addition is the volumes_from option to the proxy service. We must ensure that the two Tomcat containers start before the NGINX containers. The links option indirectly provided this functionality, previously.

proxy:
  build: nginx/
  ports:
   - "80:80"
  volumes_from:
   - app01
   - app02

app01:
  build: tomcat/
  expose:
   - "8080"
  ports:
   - "8180:8080"

app02:
  build: tomcat/
  expose:
   - "8080"
  ports:
   - "8280:8080"

nosqldb:
  build: mongo/
  volumes:
   - "/opt/mongodb:/data/db"

elk:
  build: elk/
  ports:
   - "8081:80"
   - "8082:9200"
  expose:
   - "5000/upd"

logspout:
  build: logspout/
  volumes:
   - "/var/run/docker.sock:/tmp/docker.sock"
  ports:
   - "8083:80"
  environment:
    - ROUTE_URIS=logstash://music_elk_1:5000

Next, we need to modify the NGINX configuration, slightly. In the previous post we referenced the Tomcat service names, as shown below.

upstream backend {
  server app01:8080;
  server app02:8080;
}

Weave will automatically add the two Tomcat container names to the NGINX container’s /etc/hosts file. We will add these Tomcat container names to NGINX’s configuration file.

upstream backend {
  server music_app01_1:8080;
  server music_app02_1:8080;
}

In an actual Production environment, we would use a template, along with a service discovery tool, such as Consul, to automatically populate the container names, as containers are dynamically created or destroyed.

Installing and Running Weave

After cloning this post’s GitHub repository, I recommend first installing and configuring Weave. Next, build the container host VM using Docker Machine. Lastly, build the containers using Docker Compose. The build_project.sh script below will take care of all the necessary steps.

#!/bin/sh

########################################################################
#
# title:          Build Complete Project
# author:         Gary A. Stafford (https://programmaticponderings.com)
# url:            https://github.com/garystafford/sprint-music-docker  
# description:    Clone and build complete Spring Music Docker project
#
# to run:         sh ./build_project.sh
#
########################################################################

# install latest weave
curl -L git.io/weave -o /usr/local/bin/weave && 
chmod a+x /usr/local/bin/weave && 
weave version

# clone project
git clone -b swarm-weave \
  --single-branch --branch swarm-weave \
  https://github.com/garystafford/spring-music-docker.git && 
cd spring-music-docker

# build VM
docker-machine create --driver virtualbox springmusic --debug

# create diectory to store mongo data on host
docker ssh springmusic mkdir /opt/mongodb

# set new environment
docker-machine env springmusic && 
eval "$(docker-machine env springmusic)"

# launch weave and weaveproxy/weaveDNS containers
weave launch &&
tlsargs=$(docker-machine ssh springmusic \
  "cat /proc/\$(pgrep /usr/local/bin/docker)/cmdline | tr '\0' '\n' | grep ^--tls | tr '\n' ' '")
weave launch-proxy $tlsargs &&
eval "$(weave env)" &&

# test/confirm weave status
weave status &&
docker logs weaveproxy

# pull and build images and containers
# this step will take several minutes to pull images first time
docker-compose -f docker-compose.yml -p music up -d

# wait for container apps to fully start
sleep 15

# test weave (should list entries for all containers)
docker exec -it music_proxy_1 cat /etc/hosts 

# run quick test of Spring Music application
for i in {1..10}
do
  curl -I --url $(docker-machine ip springmusic)
done

One last test, to ensure that MongoDB is using the host’s volume, and not storing data in the MongoDB container’s /data/db directory, execute the following command: docker-machine ssh springmusic ls -Alh /opt/mongodb. You should see MongoDB-related content being stored here.

Testing Weave

Running the weave status command, we should observe that Weave returned a status similar to the example below:

gstafford@gstafford-X555LA:$ weave status

       Version: v1.1.0

       Service: router
      Protocol: weave 1..2
          Name: 6a:69:11:1b:b4:e3(springmusic)
    Encryption: disabled
 PeerDiscovery: enabled
       Targets: 0
   Connections: 0
         Peers: 1

       Service: ipam
     Consensus: achieved
         Range: [10.32.0.0-10.48.0.0)
 DefaultSubnet: 10.32.0.0/12

       Service: dns
        Domain: weave.local.
           TTL: 1
       Entries: 2

       Service: proxy
       Address: tcp://192.168.99.100:12375

Running the docker exec -it music_proxy_1 cat /etc/hosts command, we should observe that WeaveDNS has automatically added entries for all containers to the music_proxy_1 container’s /etc/hosts file. WeaveDNS will also remove the addresses of any containers that die. This offers a simple way to implement redundancy.

gstafford@gstafford-X555LA:$ docker exec -it music_proxy_1 cat /etc/hosts

# modified by weave
10.32.0.6       music_proxy_1
127.0.0.1       localhost

172.17.0.131    weave weave.bridge
172.17.0.133    music_elk_1 music_elk_1.bridge
172.17.0.134    music_nosqldb_1 music_nosqldb_1.bridge
172.17.0.138    music_app02_1 music_app02_1.bridge
172.17.0.139    music_logspout_1 music_logspout_1.bridge
172.17.0.140    music_app01_1 music_app01_1.bridge

::1             ip6-localhost ip6-loopback localhost
fe00::0         ip6-localnet
ff00::0         ip6-mcastprefix
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters

Weave resolves the container’s name to eth0 IP address, created by Docker’s docker0 Ethernet bridge. Each container can now communicate with all other containers in the cluster.

Weave eth0 Network

Results

Resulting virtual machines, network, images, and containers:

gstafford@gstafford-X555LA:$ docker-machine ls
NAME            ACTIVE   DRIVER       STATE     URL                         SWARM
springmusic     *        virtualbox   Running   tcp://192.168.99.100:2376   


gstafford@gstafford-X555LA:$ docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
music_app02            latest              632c782010ac        3 days ago          370.4 MB
music_app01            latest              632c782010ac        3 days ago          370.4 MB
music_proxy            latest              171624a31920        3 days ago          144.5 MB
music_nosqldb          latest              2b3b46af5ef3        3 days ago          260.8 MB
music_elk              latest              5c18dae84b26        3 days ago          1.05 GB
weaveworks/weaveexec   v1.1.0              69c6bfa7934f        5 days ago          58.18 MB
weaveworks/weave       v1.1.0              5dccf0533147        5 days ago          17.53 MB
music_logspout         latest              fe64597ab0c4        8 days ago          24.36 MB
gliderlabs/logspout    master              40a52d6ca462        9 days ago          14.75 MB
willdurand/elk         latest              04cd7334eb5d        2 weeks ago         1.05 GB
tomcat                 latest              6fe1972e6b08        2 weeks ago         347.7 MB
mongo                  latest              5c9464760d54        2 weeks ago         260.8 MB
nginx                  latest              cd3cf76a61ee        2 weeks ago         132.9 MB


gstafford@gstafford-X555LA:$ weave ps
weave:expose 6a:69:11:1b:b4:e3
2bce66e3b33b fa:07:7e:85:37:1b 10.32.0.5/12
604dbbc4473f 6a:73:8d:54:cc:fe 10.32.0.4/12
ea64b42cf5a1 c2:69:73:84:67:69 10.32.0.3/12
85b1e8a9b8d0 aa:f7:12:cd:b7:13 10.32.0.6/12
81041fc97d1f 2e:1e:82:67:89:5d 10.32.0.2/12
e80c04bdbfaf 1e:95:a5:b2:9d:30 10.32.0.1/12
18c22e7f1c33 7e:43:54:db:8d:b8


gstafford@gstafford-X555LA:$ docker ps -a
CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS              PORTS                                                                                            NAMES
2bce66e3b33b        music_app01                   "/w/w catalina.sh run"   3 days ago          Up 3 days           0.0.0.0:8180->8080/tcp                                                                           music_app01_1
604dbbc4473f        music_logspout                "/w/w /bin/logspout"     3 days ago          Up 3 days           8000/tcp, 0.0.0.0:8083->80/tcp                                                                   music_logspout_1
ea64b42cf5a1        music_app02                   "/w/w catalina.sh run"   3 days ago          Up 3 days           0.0.0.0:8280->8080/tcp                                                                           music_app02_1
85b1e8a9b8d0        music_proxy                   "/w/w nginx -g 'daemo"   3 days ago          Up 3 days           0.0.0.0:80->80/tcp, 443/tcp                                                                      music_proxy_1
81041fc97d1f        music_nosqldb                 "/w/w /entrypoint.sh "   3 days ago          Up 3 days           27017/tcp                                                                                        music_nosqldb_1
e80c04bdbfaf        music_elk                     "/w/w /usr/bin/superv"   3 days ago          Up 3 days           5000/0, 0.0.0.0:8081->80/tcp, 0.0.0.0:8082->9200/tcp                                             music_elk_1
8eafc6225fc1        weaveworks/weaveexec:v1.1.0   "/home/weave/weavepro"   3 days ago          Up 3 days                                                                                                            weaveproxy
18c22e7f1c33        weaveworks/weave:v1.1.0       "/home/weave/weaver -"   3 days ago          Up 3 days           172.17.42.1:53->53/udp, 0.0.0.0:6783->6783/tcp, 0.0.0.0:6783->6783/udp, 172.17.42.1:53->53/tcp   weave

Spring Music Application Links

Assuming springmusic VM is running at 192.168.99.100, these are the accessible URL for each of the environment’s major components:

* The Tomcat user name is admin and the password is t0mcat53rv3r.

Helpful Links

, , , , , , , , ,

Leave a comment

Build and Deploy a Java-Spring-MongoDB Application using Docker

Build a multi-container, MongoDB-backed, Java Spring web application, and deploy to a test environment using Docker.

Spring Music Diagram

Introduction
Application Architecture
Spring Music Environment
Building the Environment
Spring Music Application Links
Helpful Links

Introduction

In this post, we will demonstrate how to build, deploy, and host a multi-tier Java application using Docker. For the demonstration, we will use a sample Java Spring application, available on GitHub from Cloud Foundry. Cloud Foundry’s Spring Music sample record album collection application was originally designed to demonstrate the use of database services on Cloud Foundry and Spring Framework. Instead of Cloud Foundry, we will host the Spring Music application using Docker with VirtualBox and optionally, AWS.

All files required to build this post’s demonstration are located in the master branch of this GitHub repository. Instructions to clone the repository are below. The Java Spring Music application’s source code, used in this post’s demonstration, is located in the master branch of this GitHub repository.

Spring Music

A few changes were necessary to the original Spring Music application to make it work for the this demonstration. At a high-level, the changes included:

  • Modify MongoDB configuration class to work with non-local MongoDB instances
  • Add Gradle warNoStatic task to build WAR file without the static assets, which will be host separately in NGINX
  • Create new Gradle task, zipStatic, to ZIP up the application’s static assets for deployment to NGINX
  • Add versioning scheme for build artifacts
  • Add context.xml file and MANIFEST.MF file to the WAR file
  • Add log4j syslog appender to send log entries to Logstash
  • Update versions of several dependencies, including Gradle to 2.6

Application Architecture

The Java Spring Music application stack contains the following technologies:

The Spring Music web application’s static content will be hosted by NGINX for increased performance. The application’s WAR file will be hosted by Apache Tomcat. Requests for non-static content will be proxied through a single instance of NGINX on the front-end, to one of two load-balanced Tomcat instances on the back-end. NGINX will also be configured to allow for browser caching of the static content, to further increase application performance. Reverse proxying and caching are configured thought NGINX’s default.conf file’s server configuration section:

server {
  listen        80;
  server_name   localhost;

  location ~* \/assets\/(css|images|js|template)\/* {
    root          /usr/share/nginx/;
    expires       max;
    add_header    Pragma public;
    add_header    Cache-Control "public, must-revalidate, proxy-revalidate";
    add_header    Vary Accept-Encoding;
    access_log    off;
  }

The two Tomcat instances will be configured on NGINX, in a load-balancing pool, using NGINX’s default round-robin load-balancing algorithm. This is configured through NGINX’s default.conf file’s upstream configuration section:

upstream backend {
  server app01:8080;
  server app02:8080;
}

The Spring Music application can be run with MySQL, Postgres, Oracle, MongoDB, Redis, or H2, an in-memory Java SQL database. Given the choice of both SQL and NoSQL databases available for use with the Spring Music application, we will select MongoDB.

The Spring Music application, hosted by Tomcat, will store and modify record album data in a single instance of MongoDB. MongoDB will be populated with a collection of album data when the Spring Music application first creates the MongoDB database instance.

Lastly, the ELK Stack with Logspout, will aggregate both Docker and Java Log4j log entries, providing debugging and analytics to our demonstration. I’ve used the same method for Docker and Java Log4j log entries, as detailed in this previous post.

Kibana Spring Music

Spring Music Environment

To build, deploy, and host the Java Spring Music application, we will use the following technologies:

All files necessary to build this project are stored in the garystafford/spring-music-docker repository on GitHub. The Spring Music source code and build artifacts are stored in a seperate garystafford/spring-music repository, also on GitHub.

Build artifacts are automatically built by Travis CI when changes are checked into the garystafford/spring-music repository on GitHub. Travis CI then overwrites the build artifacts back to a build artifact branch of that same project. The build artifact branch acts as a pseudo binary repository for the project. The .travis.yaml file, gradle.build file, and deploy.sh script handles these functions.

.travis.yaml file:

language: java
jdk: oraclejdk7
before_install:
- chmod +x gradlew
before_deploy:
- chmod ugo+x deploy.sh
script:
- bash ./gradlew clean warNoStatic warCopy zipGetVersion zipStatic
- bash ./deploy.sh
env:
  global:
  - GH_REF: github.com/garystafford/spring-music.git
  - secure: <secure hash here>

gradle.build file snippet:

// new Gradle build tasks

task warNoStatic(type: War) {
  // omit the version from the war file name
  version = ''
  exclude '**/assets/**'
  manifest {
    attributes 
      'Manifest-Version': '1.0', 
      'Created-By': currentJvm, 
      'Gradle-Version': GradleVersion.current().getVersion(), 
      'Implementation-Title': archivesBaseName + '.war', 
      'Implementation-Version': artifact_version, 
      'Implementation-Vendor': 'Gary A. Stafford'
  }
}

task warCopy(type: Copy) {
  from 'build/libs'
  into 'build/distributions'
  include '**/*.war'
}

task zipGetVersion (type: Task) {
  ext.versionfile = 
    new File("${projectDir}/src/main/webapp/assets/buildinfo.properties")
  versionfile.text = 'build.version=' + artifact_version
}

task zipStatic(type: Zip) {
  from 'src/main/webapp/assets'
  appendix = 'static'
  version = ''
}

deploy.sh file:

#!/bin/bash

# reference: https://gist.github.com/domenic/ec8b0fc8ab45f39403dd

set -e # exit with nonzero exit code if anything fails

# go to the distributions directory and create a *new* Git repo
cd build/distributions && git init

# inside this git repo we'll pretend to be a new user
git config user.name "travis-ci"
git config user.email "auto-deploy@travis-ci.com"

# The first and only commit to this new Git repo contains all the
# files present with the commit message.
git add .
git commit -m "Deploy Travis CI build #${TRAVIS_BUILD_NUMBER} artifacts to GitHub"

# Force push from the current repo's master branch to the remote
# repo's build-artifacts branch. (All previous history on the gh-pages branch
# will be lost, since we are overwriting it.) We redirect any output to
# /dev/null to hide any sensitive credential data that might otherwise be exposed. Environment variables pre-configured on Travis CI.
git push --force --quiet "https://${GH_TOKEN}@${GH_REF}" master:build-artifacts > /dev/null 2>&1

Base Docker images, such as NGINX, Tomcat, and MongoDB, used to build the project’s images and subsequently the containers, are all pulled from Docker Hub.

This NGINX and Tomcat Dockerfiles pull the latest build artifacts down to build the project-specific versions of the NGINX and Tomcat Docker images used for this project. For example, the NGINX Dockerfile looks like:

# NGINX image with build artifact

FROM nginx:latest

MAINTAINER Gary A. Stafford <garystafford@rochester.rr.com>

ENV REFRESHED_AT 2015-09-20
ENV GITHUB_REPO https://github.com/garystafford/spring-music/raw/build-artifacts
ENV STATIC_FILE spring-music-static.zip

RUN apt-get update -y && 
  apt-get install wget unzip nano -y && 
  wget -O /tmp/${STATIC_FILE} ${GITHUB_REPO}/${STATIC_FILE} && 
  unzip /tmp/${STATIC_FILE} -d /usr/share/nginx/assets/

COPY default.conf /etc/nginx/conf.d/default.conf

Docker Machine builds a single VirtualBox VM. After building the VM, Docker Compose then builds and deploys (1) NGINX container, (2) load-balanced Tomcat containers, (1) MongoDB container, (1) ELK container, and (1) Logspout container, onto the VM. Docker Machine’s VirtualBox driver provides a basic solution that can be run locally for testing and development. The docker-compose.yml for the project is as follows:

proxy:
  build: nginx/
  ports: "80:80"
  links:
   - app01
   - app02
  hostname: "proxy"

app01:
  build: tomcat/
  expose: "8080"
  ports: "8180:8080"
  links:
   - nosqldb
   - elk
  hostname: "app01"

app02:
  build: tomcat/
  expose: "8080"
  ports: "8280:8080"
  links:
   - nosqldb
   - elk
  hostname: "app01"

nosqldb:
  build: mongo/
  hostname: "nosqldb"
  volumes: "/opt/mongodb:/data/db"

elk:
  build: elk/
  ports:
   - "8081:80"
   - "8082:9200"
  expose: "5000/upd"

logspout:
  build: logspout/
  volumes: "/var/run/docker.sock:/tmp/docker.sock"
  links: elk
  ports: "8083:80"
  environment: ROUTE_URIS=logstash://elk:5000

Building the Environment

Before continuing, ensure you have nothing running on ports 80, 8080, 8081, 8082, and 8083. Also, make sure VirtualBox, Docker, Docker Compose, Docker Machine, VirtualBox, cURL, and git are all pre-installed and running.

docker --version && 
docker-compose --version && 
docker-machine --version && 
echo "VirtualBox $(vboxmanage --version)" && 
curl --version && git --version

All of the below commands may be executed with the following single command (sh ./build_project.sh). This is useful for working with Jenkins CI, ThoughtWorks go, or similar CI tools. However, I suggest building the project step-by-step, as shown below, to better understand the process.

# clone project
git clone -b master 
  --single-branch https://github.com/garystafford/spring-music-docker.git && 
cd spring-music-docker

# build VM
docker-machine create --driver virtualbox springmusic --debug

# create directory to store mongo data on host
docker-machine ssh springmusic mkdir /opt/mongodb

# set new environment
docker-machine env springmusic && 
eval "$(docker-machine env springmusic)"

# build images and containers
docker-compose -f docker-compose.yml -p music up -d

# wait for container apps to start
sleep 15

# run quick test of project
for i in {1..10}
do
  curl -I --url $(docker-machine ip springmusic)
done

By simply changing the driver to AWS EC2 and providing your AWS credentials, the same environment can be built on AWS within a single EC2 instance. The ‘springmusic’ environment has been fully tested both locally with VirtualBox, as well as on AWS.

Results
Resulting Docker images and containers:

gstafford@gstafford-X555LA:$ docker images
REPOSITORY            TAG                 IMAGE ID            CREATED              VIRTUAL SIZE
music_proxy           latest              46af4c1ffee0        52 seconds ago       144.5 MB
music_logspout        latest              fe64597ab0c4        About a minute ago   24.36 MB
music_app02           latest              d935211139f6        2 minutes ago        370.1 MB
music_app01           latest              d935211139f6        2 minutes ago        370.1 MB
music_elk             latest              b03731595114        2 minutes ago        1.05 GB
gliderlabs/logspout   master              40a52d6ca462        14 hours ago         14.75 MB
willdurand/elk        latest              04cd7334eb5d        9 days ago           1.05 GB
tomcat                latest              6fe1972e6b08        10 days ago          347.7 MB
mongo                 latest              5c9464760d54        10 days ago          260.8 MB
nginx                 latest              cd3cf76a61ee        10 days ago          132.9 MB

gstafford@gstafford-X555LA:$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                                                  NAMES
facb6eddfb96        music_proxy         "nginx -g 'daemon off"   46 seconds ago       Up 46 seconds       0.0.0.0:80->80/tcp, 443/tcp                            music_proxy_1
abf9bb0821e8        music_app01         "catalina.sh run"        About a minute ago   Up About a minute   0.0.0.0:8180->8080/tcp                                 music_app01_1
e4c43ed84bed        music_logspout      "/bin/logspout"          About a minute ago   Up About a minute   8000/tcp, 0.0.0.0:8083->80/tcp                         music_logspout_1
eca9a3cec52f        music_app02         "catalina.sh run"        2 minutes ago        Up 2 minutes        0.0.0.0:8280->8080/tcp                                 music_app02_1
b7a7fd54575f        mongo:latest        "/entrypoint.sh mongo"   2 minutes ago        Up 2 minutes        27017/tcp                                              music_nosqldb_1
cbfe43800f3e        music_elk           "/usr/bin/supervisord"   2 minutes ago        Up 2 minutes        5000/0, 0.0.0.0:8081->80/tcp, 0.0.0.0:8082->9200/tcp   music_elk_1

Partial result of the curl test, calling NGINX. Note the two different upstream addresses for Tomcat. Also, note the sharp decrease in request times, due to caching.

HTTP/1.1 200 OK
Server: nginx/1.9.4
Date: Mon, 07 Sep 2015 17:56:11 GMT
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 2090
Connection: keep-alive
Accept-Ranges: bytes
ETag: W/"2090-1441648256000"
Last-Modified: Mon, 07 Sep 2015 17:50:56 GMT
Content-Language: en
Request-Time: 0.521
Upstream-Address: 172.17.0.121:8080
Upstream-Response-Time: 1441648570.774

HTTP/1.1 200 OK
Server: nginx/1.9.4
Date: Mon, 07 Sep 2015 17:56:11 GMT
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 2090
Connection: keep-alive
Accept-Ranges: bytes
ETag: W/"2090-1441648256000"
Last-Modified: Mon, 07 Sep 2015 17:50:56 GMT
Content-Language: en
Request-Time: 0.326
Upstream-Address: 172.17.0.123:8080
Upstream-Response-Time: 1441648571.506

HTTP/1.1 200 OK
Server: nginx/1.9.4
Date: Mon, 07 Sep 2015 17:56:12 GMT
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 2090
Connection: keep-alive
Accept-Ranges: bytes
ETag: W/"2090-1441648256000"
Last-Modified: Mon, 07 Sep 2015 17:50:56 GMT
Content-Language: en
Request-Time: 0.006
Upstream-Address: 172.17.0.121:8080
Upstream-Response-Time: 1441648572.050

HTTP/1.1 200 OK
Server: nginx/1.9.4
Date: Mon, 07 Sep 2015 17:56:12 GMT
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 2090
Connection: keep-alive
Accept-Ranges: bytes
ETag: W/"2090-1441648256000"
Last-Modified: Mon, 07 Sep 2015 17:50:56 GMT
Content-Language: en
Request-Time: 0.006
Upstream-Address: 172.17.0.123:8080
Upstream-Response-Time: 1441648572.266

Assuming springmusic VM is running at 192.168.99.100:

* The Tomcat user name is admin and the password is t0mcat53rv3r.

Helpful Links

, , , , ,

2 Comments

Continuous Integration and Delivery of Microservices using Jenkins CI, Docker Machine, and Docker Compose

Continuously integrate and deploy and test a RestExpress microservices-based, multi-container, Java EE application to a virtual test environment, using Docker, Docker Hub, Docker Machine, Docker Compose, Jenkins CI, Maven, and VirtualBox.

Docker Machine with Ambassador

Introduction

In the last post, we learned how to use Jenkins CI, Maven, and Docker Compose to take a set of microservices all the way from source control on GitHub, to a fully tested and running set of integrated Docker containers. We built the microservices, Docker images, and Docker containers. We deployed the containers directly onto the Jenkins CI Server machine. Finally, we performed integration tests to ensure the services were functioning as expected, within the containers.

In a more mature continuous delivery model, we would have deployed the running containers to a fresh ‘production-like’ environment to be more accurately tested, not the Jenkins CI Server host machine. In this post, we will learn how to use the recently released Docker Machine to create a fresh test environment in which to build and host our project’s ten Docker containers. We will couple Docker Machine with Oracle’s VirtualBoxJenkins CI, and Docker Compose to automatically build and test the services within their containers, within the virtual ‘test’ environment.

Update: All code for this post is available on GitHub, release version v2.1.0 on the ‘master’ branch (after running git clone …, run a ‘git checkout tags/v2.1.0’ command).

Docker Machine

If you recall in the last post, after compiling and packaging the microservices, Jenkins was used to deploy the build artifacts to the Virtual-Vehicles Docker GitHub project, as shown below.

Build and Deploy Results

We then used Jenkins, with the Docker CLI and the Docker Compose CLI, to automatically build and test the images and containers. This step will not change, however first we will use Docker Machine to automatically build a test environment, in which we will build the Docker images and containers.

Docker Machine with Ambassador

I’ve copied and modified the second Jenkins job we used in the last post, as shown below. The new job is titled, ‘Virtual-Vehicles_Docker_Machine’. This will replace the previous job, ‘Virtual-Vehicles_Docker_Compose’.

Jenkins CI Jobs Machine

The first step in the new Jenkins job is to clone the Virtual-Vehicles Docker GitHub repository.

Jenkins CI Machine Config 1

Next, Jenkins run a bash script to automatically build the test VM with Docker Machine, build the Docker images and containers with Docker Compose within the new VM, and finally test the services.

Jenkins CI Machine Config 2

The bash script executed by Jenkins contains the following commands:

# optional: record current versions of docker apps with each build
docker -v && docker-compose -v && docker-machine -v

# set-up: clean up any previous machine failures
docker-machine stop test || echo "nothing to stop" && \
docker-machine rm test   || echo "nothing to remove"

# use docker-machine to create and configure 'test' environment
# add a -D (debug) if having issues
docker-machine create --driver virtualbox test
eval "$(docker-machine env test)"

# use docker-compose to pull and build new images and containers
docker-compose -p jenkins up -d

# optional: list machines, images, and containers
docker-machine ls && docker images && docker ps -a

# wait for containers to fully start before tests fire up
sleep 30

# test the services
sh tests.sh $(docker-machine ip test)

# tear down: stop and remove 'test' environment
docker-machine stop test && docker-machine rm test

As the above script shows, first Jenkins uses the Docker Machine CLI to build and activate the ‘test’ virtual machine, using the VirtualBox driver. As of docker-machine version 0.3.0, the VirtualBox driver requires at least VirtualBox 4.3.28 to be installed.

docker-machine create --driver virtualbox test
eval "$(docker-machine env test)"

Once this step is complete you will have the following VirtualBox VM created, running, and active.

NAME   ACTIVE   DRIVER       STATE     URL                         SWARM
test   *        virtualbox   Running   tcp://192.168.99.100:2376

Next, Jenkins uses the Docker Compose CLI to execute the project’s Docker Compose YAML file.

docker-compose -p jenkins up -d

The YAML file directs Docker Compose to pull and build the required Docker images, and to build and configure the Docker containers.

########################################################################
#
# title:       Docker Compose YAML file for Virtual-Vehicles Project
# author:      Gary A. Stafford (https://programmaticponderings.com)
# url:         https://github.com/garystafford/virtual-vehicles-docker  
# description: Pulls (5) images, builds (5) images, and builds (11) containers,
#              for the Virtual-Vehicles Java microservices example REST API
# to run:      docker-compose -p <your_project_name_here> up -d
#
########################################################################

graphite:
  image: hopsoft/graphite-statsd:latest
  ports:
   - "8500:80"

mongoAuthentication:
  image: mongo:latest

mongoValet:
  image: mongo:latest

mongoMaintenance:
  image: mongo:latest

mongoVehicle:
  image: mongo:latest

authentication:
  build: authentication/
  links:
   - graphite
   - mongoAuthentication
   - "ambassador:nginx"
  expose:
   - "8587"

valet:
  build: valet/
  links:
   - graphite
   - mongoValet
   - "ambassador:nginx"
  expose:
   - "8585"

maintenance:
  build: maintenance/
  links:
   - graphite
   - mongoMaintenance
   - "ambassador:nginx"
  expose:
   - "8583"

vehicle:
  build: vehicle/
  links:
   - graphite
   - mongoVehicle
   - "ambassador:nginx"
  expose:
   - "8581"

nginx:
  build: nginx/
  ports:
   - "80:80"
  links:
   - "ambassador:vehicle"
   - "ambassador:valet"
   - "ambassador:authentication"
   - "ambassador:maintenance"

ambassador:
  image: cpuguy83/docker-grand-ambassador
  volumes:
   - "/var/run/docker.sock:/var/run/docker.sock"
  command: "-name jenkins_nginx_1 -name jenkins_authentication_1 -name jenkins_maintenance_1 -name jenkins_valet_1 -name jenkins_vehicle_1"

Running the docker-compose.yaml file, will pull these (5) Docker Hub images:

REPOSITORY                           TAG          IMAGE ID
==========                           ===          ========
java                                 8u45-jdk     1f80eb0f8128
nginx                                latest       319d2015d149
mongo                                latest       66b43e3cae49
hopsoft/graphite-statsd              latest       b03e373279e8
cpuguy83/docker-grand-ambassador     latest       c635b1699f78

And, build these (5) Docker images from Dockerfiles:

REPOSITORY                  TAG          IMAGE ID
==========                  ===          ========
jenkins_nginx               latest       0b53a9adb296
jenkins_vehicle             latest       d80f79e605f4
jenkins_valet               latest       cbe8bdf909b8
jenkins_maintenance         latest       15b8a94c00f4
jenkins_authentication      latest       ef0345369079

And, build these (11) Docker containers from corresponding image:

CONTAINER ID     IMAGE                                NAME
============     =====                                ====
17992acc6542     jenkins_nginx                        jenkins_nginx_1
bcbb2a4b1a7d     jenkins_vehicle                      jenkins_vehicle_1
4ac1ac69f230     mongo:latest                         jenkins_mongoVehicle_1
bcc8b9454103     jenkins_valet                        jenkins_valet_1
7c1794ca7b8c     jenkins_maintenance                  jenkins_maintenance_1
2d0e117fa5fb     jenkins_authentication               jenkins_authentication_1
d9146a1b1d89     hopsoft/graphite-statsd:latest       jenkins_graphite_1
56b34cee9cf3     cpuguy83/docker-grand-ambassador     jenkins_ambassador_1
a72199d51851     mongo:latest                         jenkins_mongoAuthentication_1
307cb2c01cc4     mongo:latest                         jenkins_mongoMaintenance_1
4e0807431479     mongo:latest                         jenkins_mongoValet_1

Since we are connected to the brand new Docker Machine ‘test’ VM, there are no locally cached Docker images. All images required to build the containers must be pulled from Docker Hub. The build time will be 3-4x as long as the last post’s build, which used the cached Docker images on the Jenkins CI machine.

Integration Testing

As in the last post, once the containers are built and configured, we run a series of expanded integration tests to confirm the containers and services are working. One difference, this time we will pass a parameter to the test bash script file:

sh tests.sh $(docker-machine ip test)

The parameter is the hostname used in the test’s RESTful service calls. The parameter, $(docker-machine ip test), is translated to the IP address of the ‘test’ VM. In our example, 192.168.99.100. If a parameter is not provided, the test script’s hostname variable will use the default value of localhost, ‘hostname=${1-'localhost'}‘.

Another change since the last post, the project now uses the open source version of Nginx, the free, open-source, high-performance HTTP server and reverse proxy, as a pseudo-API gateway. Instead calling each microservice directly, using their individual ports (i.e. port 8581 for the Vehicle microservice), all traffic is sent through Nginx on default http port 80, for example:

http://192.168.99.100/vehicles/utils/ping.json
http://192.168.99.100/jwts?apiKey=Z1nXG8JGKwvGlzQgPLwQdndW&secret=ODc4OGNiNjE5ZmI
http://192.168.99.100/vehicles/558f3042e4b0e562c03329ad

Internal traffic between the microservices and MongoDB, and between the microservices and Graphite is still direct, using Docker container linking. Traffic between the microservices and Nginx, in both directions, is handled by an ambassador container, a common pattern. Nginx acts as a reverse proxy for the microservices. Using Nginx brings us closer to a truer production-like experience for testing the services.

#!/bin/sh

########################################################################
#
# title:          Virtual-Vehicles Project Integration Tests
# author:         Gary A. Stafford (https://programmaticponderings.com)
# url:            https://github.com/garystafford/virtual-vehicles-docker  
# description:    Performs integration tests on the Virtual-Vehicles
#                 microservices
# to run:         sh tests.sh
# docker-machine: sh tests.sh $(docker-machine ip test)
#
########################################################################

echo --- Integration Tests ---
echo

### VARIABLES ###
hostname=${1-'localhost'} # use input param or default to localhost
application="Test API Client $(date +%s)" # randomized
secret="$(date +%s | sha256sum | base64 | head -c 15)" # randomized
make="Test"
model="Foo"

echo hostname: ${hostname}
echo application: ${application}
echo secret: ${secret}
echo make: ${make}
echo model: ${model}
echo


### TESTS ###
echo "TEST: GET request should return 'true' in the response body"
url="http://${hostname}/vehicles/utils/ping.json"
echo ${url}
curl -X GET -H 'Accept: application/json; charset=UTF-8' \
--url "${url}" \
| grep true > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"
echo


echo "TEST: POST request should return a new client in the response body with an 'id'"
url="http://${hostname}/clients"
echo ${url}
curl -X POST -H "Cache-Control: no-cache" -d "{
    \"application\": \"${application}\",
    \"secret\": \"${secret}\"
}" --url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"
echo


echo "SETUP: Get the new client's apiKey for next test"
url="http://${hostname}/clients"
echo ${url}
apiKey=$(curl -X POST -H "Cache-Control: no-cache" -d "{
    \"application\": \"${application}\",
    \"secret\": \"${secret}\"
}" --url "${url}" \
| grep -o '"apiKey":"[a-zA-Z0-9]\{24\}"' \
| grep -o '[a-zA-Z0-9]\{24\}' \
| sed -e 's/^"//'  -e 's/"$//')
echo apiKey: ${apiKey}
echo


echo "TEST: GET request should return a new jwt in the response body"
url="http://${hostname}/jwts?apiKey=${apiKey}&secret=${secret}"
echo ${url}
curl -X GET -H "Cache-Control: no-cache" \
--url "${url}" \
| grep '[a-zA-Z0-9_-]\{1,\}\.[a-zA-Z0-9_-]\{1,\}\.[a-zA-Z0-9_-]\{1,\}' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"
echo


echo "SETUP: Get a new jwt using the new client for the next test"
url="http://${hostname}/jwts?apiKey=${apiKey}&secret=${secret}"
echo ${url}
jwt=$(curl -X GET -H "Cache-Control: no-cache" \
--url "${url}" \
| grep '[a-zA-Z0-9_-]\{1,\}\.[a-zA-Z0-9_-]\{1,\}\.[a-zA-Z0-9_-]\{1,\}' \
| sed -e 's/^"//'  -e 's/"$//')
echo jwt: ${jwt}
echo


echo "TEST: POST request should return a new vehicle in the response body with an 'id'"
url="http://${hostname}/vehicles"
echo ${url}
curl -X POST -H "Cache-Control: no-cache" \
-H "Authorization: Bearer ${jwt}" \
-d "{
    \"year\": 2015,
    \"make\": \"${make}\",
    \"model\": \"${model}\",
    \"color\": \"White\",
    \"type\": \"Sedan\",
    \"mileage\": 250
}" --url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"
echo


echo "SETUP: Get id from new vehicle for the next test"
url="http://${hostname}/vehicles?filter=make::${make}|model::${model}&limit=1"
echo ${url}
id=$(curl -X GET -H "Cache-Control: no-cache" \
-H "Authorization: Bearer ${jwt}" \
--url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' \
| grep -o '[a-zA-Z0-9]\{24\}' \
| tail -1 \
| sed -e 's/^"//'  -e 's/"$//')
echo vehicle id: ${id}
echo


echo "TEST: GET request should return a vehicle in the response body with the requested 'id'"
url="http://${hostname}/vehicles/${id}"
echo ${url}
curl -X GET -H "Cache-Control: no-cache" \
-H "Authorization: Bearer ${jwt}" \
--url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"
echo


echo "TEST: POST request should return a new maintenance record in the response body with an 'id'"
url="http://${hostname}/maintenances"
echo ${url}
curl -X POST -H "Cache-Control: no-cache" \
-H "Authorization: Bearer ${jwt}" \
-d "{
    \"vehicleId\": \"${id}\",
    \"serviceDateTime\": \"2015-27-00T15:00:00.400Z\",
    \"mileage\": 1000,
    \"type\": \"Test Maintenance\",
    \"notes\": \"This is a test notes.\"
}" --url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"
echo


echo "TEST: POST request should return a new valet transaction in the response body with an 'id'"
url="http://${hostname}/valets"
echo ${url}
curl -X POST -H "Cache-Control: no-cache" \
-H "Authorization: Bearer ${jwt}" \
-d "{
    \"vehicleId\": \"${id}\",
    \"dateTimeIn\": \"2015-27-00T15:00:00.400Z\",
    \"parkingLot\": \"Test Parking Ramp\",
    \"parkingSpot\": 10,
    \"notes\": \"This is a test notes.\"
}" --url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"
echo

Tear Down

In true continuous integration fashion, once the integration tests have completed, we tear down the project by removing the VirtualBox ‘test’ VM. This also removed all images and containers.

docker-machine stop test && \
docker-machine rm test

Jenkins CI Console Output

Below is an abridged sample of what the Jenkins CI console output will look like from a successful ‘build’.

Started by user anonymous
Building in workspace /var/lib/jenkins/jobs/Virtual-Vehicles_Docker_Machine/workspace
> git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url https://github.com/garystafford/virtual-vehicles-docker.git # timeout=10
Fetching upstream changes from https://github.com/garystafford/virtual-vehicles-docker.git
> git --version # timeout=10
using GIT_SSH to set credentials
using .gitcredentials to set credentials
> git config --local credential.helper store --file=/tmp/git7588068314920923143.credentials # timeout=10
> git -c core.askpass=true fetch --tags --progress https://github.com/garystafford/virtual-vehicles-docker.git +refs/heads/*:refs/remotes/origin/*
> git config --local --remove-section credential # timeout=10
> git rev-parse refs/remotes/origin/master^{commit} # timeout=10
> git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
Checking out Revision f473249f0f70290b75cb320909af1f57cdaf2aa5 (refs/remotes/origin/master)
> git config core.sparsecheckout # timeout=10
> git checkout -f f473249f0f70290b75cb320909af1f57cdaf2aa5
> git rev-list f473249f0f70290b75cb320909af1f57cdaf2aa5 # timeout=10
[workspace] $ /bin/sh -xe /tmp/hudson8587699987350884629.sh

+ docker -v
Docker version 1.7.0, build 0baf609
+ docker-compose -v
docker-compose version: 1.3.1
CPython version: 2.7.9
OpenSSL version: OpenSSL 1.0.1e 11 Feb 2013
+ docker-machine -v
docker-machine version 0.3.0 (0a251fe)

+ docker-machine stop test
+ docker-machine rm test
Successfully removed test

+ docker-machine create --driver virtualbox test
Creating VirtualBox VM...
Creating SSH key...
Starting VirtualBox VM...
Starting VM...
To see how to connect Docker to this machine, run: docker-machine env test
+ docker-machine env test
+ eval export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.100:2376"
export DOCKER_CERT_PATH="/var/lib/jenkins/.docker/machine/machines/test"
export DOCKER_MACHINE_NAME="test"
# Run this command to configure your shell:
# eval "$(docker-machine env test)"
+ export DOCKER_TLS_VERIFY=1
+ export DOCKER_HOST=tcp://192.168.99.100:2376
+ export DOCKER_CERT_PATH=/var/lib/jenkins/.docker/machine/machines/test
+ export DOCKER_MACHINE_NAME=test
+ docker-compose -p jenkins up -d
Pulling mongoValet (mongo:latest)...
latest: Pulling from mongo

...Abridged output...

+ docker-machine ls
NAME   ACTIVE   DRIVER       STATE     URL                         SWARM
test   *        virtualbox   Running   tcp://192.168.99.100:2376
+ docker images
REPOSITORY                         TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
jenkins_vehicle                    latest              fdd7f9d02ff7        2 seconds ago       837.1 MB
jenkins_valet                      latest              8a592e0fe69a        4 seconds ago       837.1 MB
jenkins_maintenance                latest              5a4a44e136e5        5 seconds ago       837.1 MB
jenkins_authentication             latest              e521e067a701        7 seconds ago       838.7 MB
jenkins_nginx                      latest              085d183df8b4        25 minutes ago      132.8 MB
java                               8u45-jdk            1f80eb0f8128        12 days ago         816.4 MB
nginx                              latest              319d2015d149        12 days ago         132.8 MB
mongo                              latest              66b43e3cae49        12 days ago         260.8 MB
hopsoft/graphite-statsd            latest              b03e373279e8        4 weeks ago         740 MB
cpuguy83/docker-grand-ambassador   latest              c635b1699f78        5 months ago        525.7 MB

+ docker ps -a
CONTAINER ID        IMAGE                              COMMAND                CREATED             STATUS              PORTS                                      NAMES
4ea39fa187bf        jenkins_vehicle                    "java -classpath .:c   2 seconds ago       Up 1 seconds        8581/tcp                                   jenkins_vehicle_1
b248a836546b        mongo:latest                       "/entrypoint.sh mong   3 seconds ago       Up 3 seconds        27017/tcp                                  jenkins_mongoVehicle_1
0c94e6409afc        jenkins_valet                      "java -classpath .:c   4 seconds ago       Up 3 seconds        8585/tcp                                   jenkins_valet_1
657f8432004b        jenkins_maintenance                "java -classpath .:c   5 seconds ago       Up 5 seconds        8583/tcp                                   jenkins_maintenance_1
8ff6de1208e3        jenkins_authentication             "java -classpath .:c   7 seconds ago       Up 6 seconds        8587/tcp                                   jenkins_authentication_1
c799d5f34a1c        hopsoft/graphite-statsd:latest     "/sbin/my_init"        12 minutes ago      Up 12 minutes       2003/tcp, 8125/udp, 0.0.0.0:8500->80/tcp   jenkins_graphite_1
040872881b25        jenkins_nginx                      "nginx -g 'daemon of   25 minutes ago      Up 25 minutes       0.0.0.0:80->80/tcp, 443/tcp                jenkins_nginx_1
c6a2dc726abc        mongo:latest                       "/entrypoint.sh mong   26 minutes ago      Up 26 minutes       27017/tcp                                  jenkins_mongoAuthentication_1
db22a44239f4        mongo:latest                       "/entrypoint.sh mong   26 minutes ago      Up 26 minutes       27017/tcp                                  jenkins_mongoMaintenance_1
d5fd655474ba        cpuguy83/docker-grand-ambassador   "/usr/bin/grand-amba   26 minutes ago      Up 26 minutes                                                  jenkins_ambassador_1
2b46bd6f8cfb        mongo:latest                       "/entrypoint.sh mong   31 minutes ago      Up 31 minutes       27017/tcp                                  jenkins_mongoValet_1

+ sleep 30

+ docker-machine ip test
+ sh tests.sh 192.168.99.100

--- Integration Tests ---

hostname: 192.168.99.100
application: Test API Client 1435585062
secret: NGM5OTI5ODAxMTZ
make: Test
model: Foo

TEST: GET request should return 'true' in the response body
http://192.168.99.100/vehicles/utils/ping.json
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     4    0     4    0     0     26      0 --:--:-- --:--:-- --:--:--    25
100     4    0     4    0     0     26      0 --:--:-- --:--:-- --:--:--    25
RESULT: pass

TEST: POST request should return a new client in the response body with an 'id'
http://192.168.99.100/clients
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   399    0   315  100    84    847    225 --:--:-- --:--:-- --:--:--   849
RESULT: pass

SETUP: Get the new client's apiKey for next test
http://192.168.99.100/clients
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   399    0   315  100    84  20482   5461 --:--:-- --:--:-- --:--:-- 21000
apiKey: sv1CA9NdhmXh72NrGKBN3Abb

TEST: GET request should return a new jwt in the response body
http://192.168.99.100/jwts?apiKey=sv1CA9NdhmXh72NrGKBN3Abb&secret=NGM5OTI5ODAxMTZ
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   222    0   222    0     0    686      0 --:--:-- --:--:-- --:--:--   687
RESULT: pass

SETUP: Get a new jwt using the new client for the next test
http://192.168.99.100/jwts?apiKey=sv1CA9NdhmXh72NrGKBN3Abb&secret=NGM5OTI5ODAxMTZ
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   222    0   222    0     0  16843      0 --:--:-- --:--:-- --:--:-- 17076
jwt: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJhcGkudmlydHVhbC12ZWhpY2xlcy5jb20iLCJhcGlLZXkiOiJzdjFDQTlOZGhtWGg3Mk5yR0tCTjNBYmIiLCJleHAiOjE0MzU2MjEwNjMsImFpdCI6MTQzNTU4NTA2M30.WVlhIhUcTz6bt3iMVr6MWCPIDd6P0aDZHl_iUd6AgrM

TEST: POST request should return a new vehicle in the response body with an 'id'
http://192.168.99.100/vehicles
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   123    0     0  100   123      0    612 --:--:-- --:--:-- --:--:--   611
100   419    0   296  100   123    649    270 --:--:-- --:--:-- --:--:--   649
RESULT: pass

SETUP: Get id from new vehicle for the next test
http://192.168.99.100/vehicles?filter=make::Test|model::Foo&limit=1
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   377    0   377    0     0   5564      0 --:--:-- --:--:-- --:--:--  5626
vehicle id: 55914a28e4b04658471dc03a

TEST: GET request should return a vehicle in the response body with the requested 'id'
http://192.168.99.100/vehicles/55914a28e4b04658471dc03a
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   296    0   296    0     0   7051      0 --:--:-- --:--:-- --:--:--  7219
RESULT: pass

TEST: POST request should return a new maintenance record in the response body with an 'id'
http://192.168.99.100/maintenances
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   565    0   376  100   189    506    254 --:--:-- --:--:-- --:--:--   506
100   565    0   376  100   189    506    254 --:--:-- --:--:-- --:--:--   506
RESULT: pass

TEST: POST request should return a new valet transaction in the response body with an 'id'
http://192.168.99.100/valets
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   561    0   368  100   193    514    269 --:--:-- --:--:-- --:--:--   514
RESULT: pass

+ docker-machine stop test
+ docker-machine rm test
Successfully removed test

Finished: SUCCESS

Graphite and Statsd

If you’ve chose to build the Virtual-Vehicles Docker project outside of Jenkins CI, then in addition running the test script and using applications like Postman to test the Virtual-Vehicles RESTful API, you may also use Graphite and StatsD. RestExpress comes fully configured out of the box with Graphite integration, through the Metrics plugin. The Virtual-Vehicles RESTful API example is configured to use port 8500 to access the Graphite UI. The Virtual-Vehicles RESTful API example uses the hopsoft/graphite-statsd Docker image to build the Graphite/StatsD Docker container.

Graphite Dashboard

The Complete Process

The below diagram show the entire Virtual-Vehicles continuous integration and delivery process, start to finish, using Docker, Docker Hub, Docker Machine, Docker Compose, Jenkins CI, Maven, RestExpress, and VirtualBox.

Docker Machine Full Process

, , , , , , , , , , , , , ,

6 Comments

Continuous Integration and Delivery of Microservices using Jenkins CI, Maven, and Docker Compose

Continuously build, test, package and deploy a microservices-based, multi-container, Java EE application using Jenkins CI, Maven, Docker, and Docker Compose

IntroDockerCompose

Previous Posts

In the previous 3-part series, Building a Microservices-based REST API with RestExpress, Java EE, and MongoDB, we developed a set of Java EE-based microservices, which formed the Virtual-Vehicles REST API. In Part One of this series, we introduced the concepts of a RESTful API and microservices, using the vehicle-themed Virtual-Vehicles REST API example. In Part Two, we gained a basic understanding of how RestExpress works to build microservices, and discovered how to get the microservices example up and running. Lastly, in Part Three, we explored how to use tools such as Postman, along with the API documentation, to test our microservices.

Introduction

In this post, we will demonstrate how to use Jenkins CI, Maven, and Docker Compose to take our set of microservices all the way from source control on GitHub, to a fully tested and running set of integrated and orchestrated Docker containers. We will build and test the microservices, Docker images, and Docker containers. We will deploy the containers and perform integration tests to ensure the services are functioning as expected, within the containers. The milestones in our process will be:

  1. Continuous Integration: Using Jenkins CI and Maven, automatically compile, test, and package the individual microservices
  2. Deployment: Using Jenkins, automatically deploy the build artifacts to the new Virtual-Vehicles Docker project
  3. Containerization: Using Jenkins and Docker Compose, automatically build the Docker images and containers from the build artifacts and a set of Dockerfiles
  4. Integration Testing: Using Jenkins, perform automated integration tests on the containerized services
  5. Tear Down: Using Jenkins, automatically stop and remove the containers and images

For brevity, we will deploy the containers directly to the Jenkins CI Server, where they were built. In an upcoming post, I will demonstrate how to use the recently released Docker Machine to host the containers within an isolated VM.

Note: All code for this post is available on GitHub, release version v1.0.0 on the ‘master’ branch (after running git clone …, run a ‘git checkout tags/v1.0.0’ command).

Build the Microservices

In order to host the Virtual-Vehicles microservices, we must first compile the source code and produce build artifacts. In the case of the Virtual-Vehicles example, the build artifacts are a JAR file and at least one environment-specific properties file. In Part Two of our previous series, we compiled and produced JAR files for our microservices from the command line using Maven.

Build and Deploy

To automatically build our Maven-based microservices project in this post, we will use Jenkins CI and the Jenkins Maven Project Plugin. The Virtual-Vehicles microservices are bundled together into what Maven considers a multi-module project, which is defined by a parent POM referring to one or more sub-modules. Using the concept of project inheritance, Jenkins will compile each of the four microservices from the project’s single parent POM file. Note the four modules at the end of the pom.xml below, corresponding to each microservice.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <name>Virtual-Vehicles API</name>
    <description>Virtual-Vehicles API
        https://maven.apache.org/guides/introduction/introduction-to-the-pom.html#Example_3
    </description>
    <url>https://github.com/garystafford/virtual-vehicle-demo</url>
    <groupId>com.example</groupId>
    <artifactId>Virtual-Vehicles-API</artifactId>
    <version>1</version>
    <packaging>pom</packaging>

    <modules>
        <module>Maintenance</module>
        <module>Valet</module>
        <module>Vehicle</module>
        <module>Authentication</module>
    </modules>
</project>

Below is the view of the four individual Maven modules, within the single Jenkins Maven job.

Maven Modules In Jenkins

Each microservice module contains a Maven POM files. The POM files use the Apache Maven Compiler Plugin to compile code, and the Apache Maven Shade Plugin to create ‘uber-jars’ from the compiled code. The Shade plugin provides the capability to package the artifact in an uber-jar, including its dependencies. This will allow us to independently host the service in its own container, without external dependencies. Lastly, using the Apache Maven Resources Plugin, Maven will copy the environment properties files from the source directory to the ‘target’ directory, which contains the JAR file. To accomplish these Maven tasks, all Jenkins needs to do is a series of Maven life-cycle goals: ‘clean install package validate‘.

Once the code is compiled and packaged into uber-jars, Jenkins uses the Artifact Deployer Plugin to deploy the build artifacts from Jenkins’ workspace to a remote location. In our example, we will copy the artifacts to a second GitHub project, from which we will containerize our microservices.

Shown below are the two Jenkins jobs. The first one compiles, packages, and deploys the build artifacts. The second job containerizes the services, databases, and monitoring application.

Jenkins CI Main Page

Shown below are two screen grabs showing how we clone the Virtual-Vehicles GitHub repository and build the project using the main parent pom.xml file. Building the parent POM, in-turn builds all the microservice modules, using their POM files.

Build and Deploy Config 1

Build and Deploy Config 2

Deploy Build Artifacts

Once we have successfully compiled, tested (if we had unit tests with RestExpress), and packages the build artifacts as uber-jars, we deploy each set of build artifacts to a subfolder within the Virtual-Vehicles Docker GitHub project, using Jenkins’ Artifact Deployer Plugin. Shown below is the deployment configuration for just the Vehicles microservice. This deployment pattern is repeated for each service, within the Jenkins job configuration.

Build and Deploy Config 3

The Jenkins’ Artifact Deployer Plugin also provides the convenient ability to view and to redeploy the artifacts. Below, you see a list of the microservice artifacts deployed to the Docker project by Jenkins.

Build and Deploy Results

Build and Compose the Containers

IntroDockerCompose

The second Jenkins job clones the Virtual-Vehicles Docker GitHub repository.

Docker Compose Config 1

The second Jenkins job executes commands from the shell prompt. The first commands use the Docker CLI to removes any existing images and containers, which might have been left over from previous job failures. The second commands use the Docker Compose CLI to execute the project’s Docker Compose YAML file. The YAML file directs Docker Compose to pull and build the required Docker images, and to build and configure the Docker containers.

Docker Compose Config 2

# remove all images and containers from this build
docker ps -a --no-trunc  | grep 'jenkins' \
| awk '{print $1}' | xargs -r --no-run-if-empty docker stop && \
docker ps -a --no-trunc  | grep 'jenkins' \
| awk '{print $1}' | xargs -r --no-run-if-empty docker rm && \
docker images --no-trunc | grep 'jenkins' \
| awk '{print $3}' | xargs -r --no-run-if-empty docker rmi
# set DOCKER_HOST environment variable
export DOCKER_HOST=tcp://localhost:4243

# record installed version of Docker and Maven with each build
mvn --version && \
docker --version && \
docker-compose --version

# use docker-compose to build new images and containers
docker-compose -p jenkins up -d

# list virtual-vehicles related images
docker images | grep 'jenkins' | awk '{print $0}'

# list all containers
docker ps -a | grep 'jenkins\|mongo_\|graphite' | awk '{print $0}'
########################################################################
#
# title:       Docker Compose YAML file for Virtual-Vehicles Project
# author:      Gary A. Stafford (https://programmaticponderings.com)
# url:         https://github.com/garystafford/virtual-vehicles-docker  
# description: Builds (4) images, pulls (2) images, and builds (9) containers,
#              for the Virtual-Vehicles Java microservices example REST API
# to run:      docker-compose -p virtualvehicles up -d
#
########################################################################

graphite:
  image: hopsoft/graphite-statsd:latest
  ports:
   - "8481:80"

mongoAuthentication:
  image: mongo:latest

mongoValet:
  image: mongo:latest

mongoMaintenance:
  image: mongo:latest

mongoVehicle:
  image: mongo:latest

authentication:
  build: authentication/
  ports:
   - "8587:8587"
  links:
   - graphite
   - mongoAuthentication

valet:
  build: valet/
  ports:
   - "8585:8585"
  links:
   - graphite
   - mongoValet
   - authentication

maintenance:
  build: maintenance/
  ports:
   - "8583:8583"
  links:
   - graphite
   - mongoMaintenance
   - authentication

vehicle:
  build: vehicle/
  ports:
   - "8581:8581"
  links:
   - graphite
   - mongoVehicle
   - authentication

Running the docker-compose.yaml file, produces the following images:

REPOSITORY                TAG        IMAGE ID
==========                ===        ========
jenkins_vehicle           latest     a6ea4dfe7cf5
jenkins_valet             latest     162d3102d43c
jenkins_maintenance       latest     0b6f530cc968
jenkins_authentication    latest     45b50487155e

And, these containers:

CONTAINER ID     IMAGE                              NAME
============     =====                              ====
2b4d5a918f1f     jenkins_vehicle                    jenkins_vehicle_1
492fbd88d267     mongo:latest                       jenkins_mongoVehicle_1
01f410bb1133     jenkins_valet                      jenkins_valet_1
6a63a664c335     jenkins_maintenance                jenkins_maintenance_1
00babf484cf7     jenkins_authentication             jenkins_authentication_1
548a31034c1e     hopsoft/graphite-statsd:latest     jenkins_graphite_1
cdc18bbb51b4     mongo:latest                       jenkins_mongoAuthentication_1
6be5c0558e92     mongo:latest                       jenkins_mongoMaintenance_1
8b71d50a4b4d     mongo:latest                       jenkins_mongoValet_1

Integration Testing

Once the containers have been successfully built and configured, we run a series of integration tests to confirm the services are up and running. We refer to these tests as integration tests because they test the interaction of multiple components. Integration tests were covered in the last post, Building a Microservices-based REST API with RestExpress, Java EE, and MongoDB: Part 3.

Note the short pause I have inserted before running the tests. Docker Compose does an excellent job of accounting for the required start-up order of the containers to avoid race conditions (see my previous post). However, depending on the speed of the host box, there is still a start-up period for the container’s processes to be up, running, and ready to receive traffic. Apache Log4j 2 and MongoDB startup, in particular, take extra time. I’ve seen the containers take as long as 1-2 minutes on a slow box to fully start. Without the pause, the tests fail with various errors, since the container’s processes are not all running.

Docker Compose Config 3

sleep 15
sh tests.sh -v

The bash-based tests below just scratch the surface as a complete set of integration tests. However, they demonstrate an effective multi-stage testing pattern for handling the complex nature of RESTful service request requirements. The tests build upon each other. After setting up some variables, the tests register a new API client. Then, they use the new client’s API key to obtain a JWT. The tests then use the JWT to authenticate themselves, and create a new vehicle. Finally, they use the new vehicle’s id and the JWT to verify the existence for the new vehicle.

Although some may consider using bash to test somewhat primitive, the script demonstrates the effectiveness of bash’s curl, grep, sed, awk, along with regular expressions, to test our RESTful services.

#!/bin/sh

########################################################################
#
# title:       Virtual-Vehicles Project Integration Tests
# author:      Gary A. Stafford (https://programmaticponderings.com)
# url:         https://github.com/garystafford/virtual-vehicles-docker  
# description: Performs integration tests on the Virtual-Vehicles
#              microservices
# to run:      sh tests.sh -v
#
########################################################################

echo --- Integration Tests ---

### VARIABLES ###
hostname="localhost"
application="Test API Client $(date +%s)" # randomized
secret="$(date +%s | sha256sum | base64 | head -c 15)" # randomized

echo hostname: ${hostname}
echo application: ${application}
echo secret: ${secret}


### TESTS ###
echo "TEST: GET request should return 'true' in the response body"
url="http://${hostname}:8581/vehicles/utils/ping.json"
echo ${url}
curl -X GET -H 'Accept: application/json; charset=UTF-8' \
--url "${url}" \
| grep true > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"


echo "TEST: POST request should return a new client in the response body with an 'id'"
url="http://${hostname}:8587/clients"
echo ${url}
curl -X POST -H "Cache-Control: no-cache" -d "{
    \"application\": \"${application}\",
    \"secret\": \"${secret}\"
}" --url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"


echo "SETUP: Get the new client's apiKey for next test"
url="http://${hostname}:8587/clients"
echo ${url}
apiKey=$(curl -X POST -H "Cache-Control: no-cache" -d "{
    \"application\": \"${application}\",
    \"secret\": \"${secret}\"
}" --url "${url}" \
| grep -o '"apiKey":"[a-zA-Z0-9]\{24\}"' \
| grep -o '[a-zA-Z0-9]\{24\}' \
| sed -e 's/^"//'  -e 's/"$//')
echo apiKey: ${apiKey}
echo

echo "TEST: GET request should return a new jwt in the response body"
url="http://${hostname}:8587/jwts?apiKey=${apiKey}&secret=${secret}"
echo ${url}
curl -X GET -H "Cache-Control: no-cache" \
--url "${url}" \
| grep '[a-zA-Z0-9_-]\{1,\}\.[a-zA-Z0-9_-]\{1,\}\.[a-zA-Z0-9_-]\{1,\}' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"


echo "SETUP: Get a new jwt using the new client for the next test"
url="http://${hostname}:8587/jwts?apiKey=${apiKey}&secret=${secret}"
echo ${url}
jwt=$(curl -X GET -H "Cache-Control: no-cache" \
--url "${url}" \
| grep '[a-zA-Z0-9_-]\{1,\}\.[a-zA-Z0-9_-]\{1,\}\.[a-zA-Z0-9_-]\{1,\}' \
| sed -e 's/^"//'  -e 's/"$//')
echo jwt: ${jwt}


echo "TEST: POST request should return a new vehicle in the response body with an 'id'"
url="http://${hostname}:8581/vehicles"
echo ${url}
curl -X POST -H "Cache-Control: no-cache" \
-H "Authorization: Bearer ${jwt}" \
-d '{
    "year": 2015,
    "make": "Test",
    "model": "Foo",
    "color": "White",
    "type": "Sedan",
    "mileage": 250
}' --url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"


echo "SETUP: Get id from new vehicle for the next test"
url="http://${hostname}:8581/vehicles?filter=make::Test|model::Foo&limit=1"
echo ${url}
id=$(curl -X GET -H "Cache-Control: no-cache" \
-H "Authorization: Bearer ${jwt}" \
--url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' \
| grep -o '[a-zA-Z0-9]\{24\}' \
| tail -1 \
| sed -e 's/^"//'  -e 's/"$//')
echo vehicle id: ${id}


echo "TEST: GET request should return a vehicle in the response body with the requested 'id'"
url="http://${hostname}:8581/vehicles/${id}"
echo ${url}
curl -X GET -H "Cache-Control: no-cache" \
-H "Authorization: Bearer ${jwt}" \
--url "${url}" \
| grep '"id":"[a-zA-Z0-9]\{24\}"' > /dev/null
[ "$?" -ne 0 ] && echo "RESULT: fail" && exit 1
echo "RESULT: pass"

Since our tests are just a bash script, they can also be ran separately from the command line, as in the screen grab below. The output, except for the colored text, is identical to what appears in the Jenkins console output.

Running Integration Tests

Tear Down

Once the integration tests have completed, we ‘tear down’ the project by removing the Virtual-Vehicle images and containers. We simply repeat the first commands we ran at the start of the Jenkins build phase. You could choose to remove the tear down step, and use this job as a way to simply build and start your multi-container application.

# remove all images and containers from this build
docker ps -a --no-trunc  | grep 'jenkins' \
| awk '{print $1}' | xargs -r --no-run-if-empty docker stop && \
docker ps -a --no-trunc  | grep 'jenkins' \
| awk '{print $1}' | xargs -r --no-run-if-empty docker rm && \
docker images --no-trunc | grep 'jenkins' \
| awk '{print $3}' | xargs -r --no-run-if-empty docker rmi

The Complete Process

The below diagram show the entire process, start to finish.

Full Process

, , , , , , , , , , , , ,

15 Comments

Automate the Provisioning and Configuration of HAProxy and an Apache Web Server Cluster Using Foreman

Use Vagrant, Foreman, and Puppet to provision and configure HAProxy as a reverse proxy, load-balancer for a cluster of Apache web servers.

Simple Load Balanced 2

Introduction

In this post, we will use several technologies, including VagrantForeman, and Puppet, to provision and configure a basic load-balanced web server environment. In this environment, a single node with HAProxy will act as a reverse proxy and load-balancer for two identical Apache web server nodes. All three nodes will be provisioned and bootstrapped using Vagrant, from a Linux CentOS 6.5 Vagrant Box. Afterwards, Foreman, with Puppet, will then be used to install and configure the nodes with HAProxy and Apache, using a series of Puppet modules.

For this post, I will assume you already have running instances of Vagrant with the vagrant-hostmanager plugin, VirtualBox, and Foreman. If you are unfamiliar with Vagrant, the vagrant-hostmanager plugin, VirtualBox, Foreman, or Puppet, review my recent post, Installing Foreman and Puppet Agent on Multiple VMs Using Vagrant and VirtualBox. This post demonstrates how to install and configure Foreman. In addition, the post also demonstrates how to provision and bootstrap virtual machines using Vagrant and VirtualBox. Basically, we will be repeating many of this same steps in this post, with the addition of HAProxy, Apache, and some custom configuration Puppet modules.

All code for this post is available on GitHub. However, it been updated as of 8/23/2015. Changes were required to fix compatibility issues with the latest versions of Puppet 4.x and Foreman. Additionally, the version of CentOS on all VMs was updated from 6.6 to 7.1 and the version of Foreman was updated from 1.7 to 1.9.

Steps

Here is a high-level overview of our steps in this post:

  1. Provision and configure the three CentOS-based virtual machines (‘nodes’) using Vagrant and VirtualBox
  2. Install the HAProxy and Apache Puppet modules, from Puppet Forge, onto the Foreman server
  3. Install the custom HAProxy and Apache Puppet configuration modules, from GitHub, onto the Foreman server
  4. Import the four new module’s classes to Foreman’s Puppet class library
  5. Add the three new virtual machines (‘hosts’) to Foreman
  6. Configure the new hosts in Foreman, assigning the appropriate Puppet classes
  7. Apply the Foreman Puppet configurations to the new hosts
  8. Test HAProxy is working as a reverse and proxy load-balancer for the two Apache web server nodes

In this post, I will use the terms ‘virtual machine’, ‘machine’, ‘node’, ‘agent node’, and ‘host’, interchangeable, based on each software’s own nomenclature.

Provisioning

First, using the process described in the previous post, provision and bootstrap the three new virtual machines. The new machine’s Vagrant configuration is shown below. This should be added to the JSON configuration file. All code for the earlier post is available on GitHub.

{
  "nodes": {
    "haproxy.example.com": {
      ":ip": "192.168.35.101",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    },
    "node01.example.com": {
      ":ip": "192.168.35.121",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    },
    "node02.example.com": {
      ":ip": "192.168.35.122",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    }
  }
}

After provisioning and bootstrapping, observe the three machines running in Oracle’s VM VirtualBox Manager.

Oracle VM VirtualBox Manager View of New Nodes

Oracle VM VirtualBox Manager View of New Nodes

Installing Puppet Forge Modules

The next task is to install the HAProxy and Apache Puppet modules on the Foreman server. This allows Foreman to have access to them. I chose the puppetlabs-haproxy HAProxy module and the puppetlabs-apache Apache modules. Both modules were authored by Puppet Labs, and are available on Puppet Forge.

The exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory.

sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-haproxy
sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-apache

# confirm module installation
puppet module list --modulepath /etc/puppet/environments/production/modules

Installing Configuration Modules

Next, install the HAProxy and Apache configuration Puppet modules on the Foreman server. Both modules are hosted on my GitHub repository. Both modules can be downloaded directly from GitHub and installed on the Foreman server, from the command line. Again, the exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory. Also, notice I am currently downloading version 0.1.0 of both modules at the time of writing this post. Make sure to double-check for the latest versions of both modules before running the commands. Modify the commands if necessary.

# apache config module
wget -N https://github.com/garystafford/garystafford-apache_example_config/archive/v0.1.0.tar.gz && \
sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force

# haproxy config module
wget -N https://github.com/garystafford/garystafford-haproxy_node_config/archive/v0.1.0.tar.gz && \
sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force

# confirm module installation
puppet module list --modulepath /etc/puppet/environments/production/modules
GitHub Repository for Apache Config Example

GitHub Repository for Apache Config Example

HAProxy Configuration
The HAProxy configuration module configures HAProxy’s /etc/haproxy/haproxy.cfg file. The single class in the module’s init.pp manifest is as follows:

class haproxy_node_config () inherits haproxy {
  haproxy::listen { 'puppet00':
    collect_exported => false,
    ipaddress        => '*',
    ports            => '80',
    mode             => 'http',
    options          => {
      'option'  => ['httplog'],
      'balance' => 'roundrobin',
    },
  }

  Haproxy::Balancermember <<| listening_service == 'puppet00' |>>

  haproxy::balancermember { 'haproxy':
    listening_service => 'puppet00',
    server_names      => ['node01.example.com', 'node02.example.com'],
    ipaddresses       => ['192.168.35.121', '192.168.35.122'],
    ports             => '80',
    options           => 'check',
  }
}

The resulting /etc/haproxy/haproxy.cfg file will have the following configuration added. It defines the two Apache web server node’s hostname, ip addresses, and http port. The configuration also defines the load-balancing method, ‘round-robin‘ in our example. In this example, we are using layer 7 load-balancing (application layer – http), as opposed to layer 4 load-balancing (transport layer – tcp). Either method will work for this example. The Puppet Labs’ HAProxy module’s documentation on Puppet Forge and HAProxy’s own documentation are both excellent starting points to understand how to configure HAProxy. We are barely scraping the surface of HAProxy’s capabilities in this brief example.

listen puppet00
  bind *:80
  mode  http
  balance  roundrobin
  option  httplog
  server node01.example.com 192.168.35.121:80 check
  server node02.example.com 192.168.35.122:80 check

Apache Configuration
The Apache configuration module creates default web page in Apache’s docroot directory, /var/www/html/index.html. The single class in the module’s init.pp manifest is as follows:
ApacheConfigClass
The resulting /var/www/html/index.html file will look like the following. Observe that the facter variables shown in the module manifest above have been replaced by the individual node’s hostname and ip address during application of the configuration by Puppet (ie. ${fqdn} became node01.example.com).

ApacheConfigClass

Both of these Puppet modules were created specifically to configure HAProxy and Apache for this post. Unlike published modules on Puppet Forge, these two modules are very simple, and don’t necessarily represent the best practices and patterns for authoring Puppet Forge modules.

Importing into Foreman

After installing the new modules onto the Foreman server, we need to import them into Foreman. This is accomplished from the ‘Puppet classes’ tab, using the ‘Import from theforeman.example.com’ button. Once imported, the module classes are available to assign to host machines.

Importing Puppet Classes into Foreman

Importing Puppet Classes into Foreman

Add Host to Foreman

Next, add the three new hosts to Foreman. If you have questions on how to add the nodes to Foreman, start Puppet’s Certificate Signing Request (CSR) process on the hosts, signing the certificates, or other first time tasks, refer to the previous post. That post explains this process in detail.

Foreman Hosts Tab Showing New Nodes

Foreman Hosts Tab Showing New Nodes

Configure the Hosts

Next, configure the HAProxy and Apache nodes with the necessary Puppet classes. In addition to the base module classes and configuration classes, I recommend adding git and ntp modules to each of the new nodes. These modules were explained in the previous post. Refer to the screen-grabs below for correct module classes to add, specific to HAProxy and Apache.

HAProxy Node Puppet Classes Tab

HAProxy Node Puppet Classes Tab

Apache Nodes Puppet Classes Tab

Apache Nodes Puppet Classes Tab

Agent Configuration and Testing the System

Once configurations are retrieved and applied by Puppet Agent on each node, we can test our reverse proxy load-balanced environment. To start, open a browser and load haproxy.paychex.com. You should see one of the two pages below. Refresh the page a few times. You should observe HAProxy re-directing you to one Apache web server node, and then the other, using HAProxy’s round-robin algorithm. You can differentiate the Apache web servers by the hostname and ip address displayed on the web page.

Load Balancer Directing Traffic to Node01

Load Balancer Directing Traffic to Node01

Load Balancer Directing Traffic to Node02

Load Balancer Directing Traffic to Node02

After hitting HAProxy’s URL several times successfully, view HAProxy’s built-in Statistics Report page at http://haproxy.example.com/haproxy?stats. Note below, each of the two Apache node has been hit 44 times each from HAProxy. This demonstrates the effectiveness of the reverse proxy and load-balancing features of HAProxy.

Statistics Report for HAProxy

Statistics Report for HAProxy

Accessing Apache Directly
If you are testing HAProxy from the same machine on which you created the virtual machines (VirtualBox host), you will likely be able to directly access either of the Apache web servers (ei. node02.example.com). The VirtualBox host file contains the ip addresses and hostnames of all three hosts. This DNS configuration was done automatically by the vagrant-hostmanager plugin. However, in an actual Production environment, only the HAProxy server’s hostname and ip address would be publicly accessible to a user. The two Apache nodes would sit behind a firewall, accessible only by the HAProxy server. HAProxy acts as a façade to public side of the network.

Testing Apache Host Failure
The main reason you would likely use a load-balancer is high-availability. With HAProxy acting as a load-balancer, we should be able to impair one of the two Apache nodes, without noticeable disruption. HAProxy will continue to serve content from the remaining Apache web server node.

Log into node01.example.com, using the following command, vagrant ssh node01.example.com. To simulate an impairment on ‘node01’, run the following command to stop Apache, sudo service httpd stop. Now, refresh the haproxy.example.com URL in your web browser. You should notice HAProxy is now redirecting all traffic to node02.example.com.

Troubleshooting

While troubleshooting HAProxy configuration issues for this demonstration, I discovered logging is not configured by default on CentOS. No worries, I recommend HAProxy: Give me some logs on CentOS 6.5!, by Stephane Combaudon, to get logging running. Once logging is active, you can more easily troubleshoot HAProxy and Apache configuration issues. Here are some example commands you might find useful:

# haproxy
sudo more -f /var/log/haproxy.log
sudo haproxy -f /etc/haproxy/haproxy.cfg -c # check/validate config file

# apache
sudo ls -1 /etc/httpd/logs/
sudo tail -50 /etc/httpd/logs/error_log
sudo less /etc/httpd/logs/access_log

Redundant Proxies

In this simple example, the system’s weakest point is obviously the single HAProxy instance. It represents a single-point-of-failure (SPOF) in our environment. In an actual production environment, you would likely have more than one instance of HAProxy. They may both be in a load-balanced pool, or one active and on standby as a failover, should one instance become impaired. There are several techniques for building in proxy redundancy, often with the use of Virtual IP and Keepalived. Below is a list of articles that might help you take this post’s example to the next level.

, , , , , , , , , , , , ,

Leave a comment