Archive for category Continuous Delivery

Deploying Spring Boot Apps to AWS with Netflix Nebula and Spinnaker: Part 2 of 2

Part One of this post examined enterprise deployment tools and introduced two of Netflix’s open-source deployment tools, the Nebula Gradle plugins, and Spinnaker. In Part Two, we will deploy a production-ready Spring Boot application, the Election microservice, to multiple Amazon EC2 instances, behind an Elastic Load Balancer (ELB). We will use a fully automated DevOps workflow. The build, test, package, bake, deploy process will be handled by the Netflix Nebula Gradle Linux Packaging Plugin, Jenkins, and Spinnaker. The high-level process will involve the following steps:

  • Configure Gradle to build a production-ready fully executable application for Unix systems (executable JAR)
  • Using deb-s3 and GPG Suite, create a secure, signed APT (Debian) repository on Amazon S3
  • Using Jenkins and the Netflix Nebula plugin, build a Debian package, containing the executable JAR and configuration files
  • Using Jenkins and deb-s3, publish the package to the S3-based APT repository
  • Using Spinnaker (HashiCorp Packer under the covers), bake an Ubuntu Amazon Machine Image (AMI), replete with the executable JAR installed from the Debian package
  • Deploy an auto-scaling set of Amazon EC2 instances from the baked AMI, behind an ELB, running the Spring Boot application using both the Red/Black and Highlander deployment strategies
  • Be able to repeat the entire automated build, test, package, bake, deploy process, triggered by a new code push to GitHub

The overall build, test, package, bake, deploy process will look as follows.

DebianPackageWorkflow12.png

DevOps Architecture

Spinnaker’s modern architecture is comprised of several independent microservices. The codebase is written in Java and Groovy, and leverages the Spring Boot framework¹. Spinnaker’s configuration, startup, updates, and rollbacks are centrally managed by Halyard. Halyard provides a single point of contact for command line interaction with Spinnaker’s microservices.

Spinnaker can be installed on most private or public infrastructure, either containerized or virtualized. Spinnaker has links to a number of Quickstart installations on their website. For this demonstration, I deployed and configured Spinnaker on Azure, starting with one of the Azure Spinnaker quick-start ARM templates. The template provisions all the necessary Azure resources. For better performance, I chose upgraded the default VM to a larger Standard D4 v3, which contains 4 vCPUs and 16 GB of memory. I would recommend at least 2 vCPUs and 8 GB of memory at a minimum for Spinnaker.

Another Azure VM, in the same virtual network as the Spinnaker VM, already hosts Jenkins, SonarQube, and Nexus Repository OSS.

From Spinnaker on Azure, Debian Packages are uploaded to the APT package repository on AWS S3. Spinnaker also bakes Amazon Machine Images (AMI) on AWS. Spinnaker provisions the AWS resources, including EC2 instances, Load Balancers, Auto Scaling Groups, Launch Configurations, and Security Groups. The only resources you need on AWS to get started with Spinnaker are a VPC and Subnets. There are some minor, yet critical prerequisites for naming your VPC and Subnets.

Other external tools include GitHub for source control and Slack for notifications. I have built and managed everything from a Mac, however, all tools are platform agnostic. The Spring Boot application was developed in JetBrains IntelliJ.

Spinnaker Architecture 2.png

Source Code

All source code for this post can be found on GitHub. The project’s README file contains a list of the Election service’s endpoints.

Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.

APT Repository

After setting up Spinnaker on Azure, I created an APT repository on Amazon S3, using the instructions provided by Netflix, in their Code Lab, An Introduction to Spinnaker: Hello Deployment. The setup involves creating an Amazon S3 bucket to serve as an APT (Debian) repository, creating a GPG key for signing, and using deb-s3 to manage the repository. The Code Lab also uses Aptly, a great tool, which I skipped for brevity.

spin19

GPG Key

On the Mac, I used GPG Suite to create a GPG (GNU Privacy Guard or GnuPG) automatic signing key for my APT repository. The key is required by Spinnaker to verify the Debian packages in the repository, before installation.

The Ruby Gem, deb-s3, makes management of the Debian packages easy and automatable with Jenkins. Jenkins uploads the Debian packages, using a deb-s3 command, such as the following (gist). In this post, Jenkins calls the command from the shell script, upload-deb-package.sh, which is included in the GitHub project.

The Jenkins user requires access to the signing key, to build and upload the Debian packages. I created my GPG key on my Mac, securely copied the key to my Ubuntu-based Jenkins VM, and then imported the key for the Jenkins user. You could also create your key on Ubuntu, directly. Make sure you backup your private key in a secure location!

Nebula Packaging Plugin

Next, I set up a Gradle task in my build.gradle file to build my Debian packages using the Netflix Nebula Gradle Linux Packaging Plugin. Although Debian packaging tasks could become complex for larger application installations, this task for this post is pretty simple. I used many of the best-practices suggested by Spring for Production-grade deployments. The best-practices guide recommends file location, file modes, and file user and group ownership. I create the JAR as a fully executable JAR, meaning it is started like any other executable and does not have to be started with the standard java -jar command.

In the task, shown below (gist), the JAR and the external configuration file (optional) are copied to specific locations during the deployment and symlinked, as required. I used the older SysVInit system (init.d) to enable the application to automatically starts on boot. You should probably use systemctl for your services with Ubuntu 16.04.

You can use the ar (archive) command (i.e., ar -x spring-postgresql-demo_4.5.0_all.deb), to extract and inspect the structure of a Debian package. The data.tar.gz file, displayed below in Atom, shows the final package structure.

spin47.png

Base AMI

Next, I baked a base AMI for Spinnaker to use. This base AMI is used by Spinnaker to bake (re-bake) the final AMI(s) used for provisioning the EC2 instances, containing the Spring Boot Application. The Spinnaker base AMI is built from another base AMI, the official Ubuntu 16.04 LTS image. I installed the OpenJDK 8 package on the AMI, which is required to run the Java-based Election service. Lastly and critically, I added information about the location of my S3-based APT Debian package repository to the list of configured APT data sources, and the GPG key required for package verification. This information and key will be used later by Spinnaker to bake AMIʼs, using this base AMI. The set-up script, base_ubuntu_ami_setup.sh, which is included in the GitHub project.

Jenkins

This post uses a single Jenkins CI/CD pipeline. Using a Webhook, the pipeline is automatically triggered by every git push to the GitHub project. The pipeline pulls the source code, builds the application, and performs unit-tests and static code analysis with SonarQube. If the build succeeds and the tests pass, the build artifact (JAR file) is bundled into a Debian package using the Nebula Packaging plugin, uploaded to the S3 APT repository using s3-deb, and archived locally for Spinnaker to reference. Once the pipeline is completed, on success or on failure, a Slack notification is sent. The Jenkinsfile, used for this post is available in the project on Github.

Below is a traditional Jenkins view of the CI/CD pipeline, with links to unit test reports, SonarQube results, build artifacts, and GitHub source code.

spin01

Below is the same pipeline viewed using the Jenkins Blue Ocean plugin.

spin02

It is important to perform sufficient testing before building the Debian package. You donʼt want to bake an AMI and deploy EC2 instances, at a cost, before finding out the application has bugs.

spin03

Spinnaker Setup

First, I set up a new Spinnaker Slack channel and a custom bot user. Spinnaker details the Slack set up in their Notifications and Events Guide. You can configure what type of Spinnaker events trigger Slack notifications.

spin46.png

AWS Spinnaker User

Next, I added the required Spinnaker User, Policy, and Roles to AWS. Spinnaker uses this access to query and provision infrastructure on your behalf. The Spinnaker User requires Power User level access to perform all their necessary tasks. AWS IAM set up is detailed by Spinnaker in their Cloud Providers Setup for AWS. They also describe the setup of other cloud providers. You need to be reasonably familiar with AWS IAM, including the PassRole permission to set up this part. As part of the setup, you enable AWS for Spinnaker and add your AWS account using the Halyard interface.

spin45

Spinnaker Security Groups

Next, I set up two Spinnaker Security Groups, corresponding to two AWS Security Groups, one for the load balancer and one for the Election service. The load balancer security group exposes port 80, and the Election service security group exposes port 8080.

spin36

Spinnaker Load Balancer

Next, I created a Spinnaker Load Balancer, corresponding to an Amazon Classic Load Balancer. The Load Balancer will load-balance the Election service EC2 instances. Below you see a Load Balancer, balancing a pair of active EC2 instances, the result of a Red/Black deployment.

spin37

Spinnaker can currently create both AWS Classic Load Balancers as well as Application Load Balancers (ALB).

spin25

Spinnaker Pipeline

This post uses a single, basic Spinnaker Pipeline. The pipeline bakes a new AMI from the Debian package generated by the Jenkins pipeline. After a manual approval stage, Spinnaker deploys a set of EC2 instances, behind the Load Balancer, which contains the latest version of the Election service. Spinnaker finishes the pipeline by sending a Slack notification.

spin26

Jenkins Integration

The pipeline is triggered by the successful completion of the Jenkins pipeline. This is set in the Configuration stage of the pipeline. The integration with Jenkins is managed through Spinnaker’s Igor service.

spin22.png

Bake Stage

Next, in the Bake stage, Spinnaker bakes a new AMI, containing the Debian package generated by the Jenkins pipeline. The stageʼs configuration contains the package name to reference.

spin29

The stageʼs configuration also includes a reference to which Base AMI to use, to bake the new AMIs. Here I have used the AMI ID of the base Spinnaker AMI, I created previously.

spin27

Deploy Stage

Next, the Deploy stage deploys the Election service, running on EC2 instances, provisioned from the new AMI, which was baked in the last stage. To configure the Deploy stage, you define a Spinnaker Server Group. According to Spinnaker, the Server Group identifies the deployable artifact, VM image type, the number of instances, autoscaling policies, metadata, Load Balancer, and a Security Group.

spin32

The Server Group also defines the Deployment Strategy. Below, I chose the Red/Black Deployment Strategy (also referred to as Blue/Green). This strategy will disable, not terminate the active Server Group. If the new deployment fails, we can manually or automatically perform a Rollback to the previous, currently disabled Server Group.

spin11

Letʼs Start Baking!

With set up complete, letʼs kick off a git push, trigger and complete the Jenkins pipeline, and finally trigger the Spinnaker pipeline. Below we see the pipelineʼs Bake stage has been started. Spinnakerʼs UI lets us view the Bakery Details. The Bakery, provided by Spinnakerʼs Rosco service, bakes the AMIs. Rosco uses HashiCorp Packer to bake the AMIs, using standard Packer templates.

spin04

Below we see Spinnaker (Rosco/Packer) locating the Base Spinnaker AMI we configured in the Pipelineʼs Bake stage. Next, we see Spinnaker sshʼing into a new EC2 instance with a temporary keypair and Security Group and starting the Election service Debian package installation.

spin23

Continuing, we see the latest Debian package, derived from the Jenkins pipelineʼs archive, being pulled from the S3-based APT repo. The package is verified using the GPG key and then installed. Lastly, we see a new AMI is created, containing the deployed Election service, which was initially built and packaged by Jenkins. Note the AWS Resource Tags created by Spinnaker, as shown in the Bakery output.

spin24

The base Spinnaker AMI and the AMIs baked by Spinnaker are visible in the AWS Console. Note the naming conventions used by Spinnaker for the AMIs, the Source AMI used to build the new APIs, and the addition of the Tags, which we saw being applied in the Bakery output above. The use of Tags indirectly allows full traceability from the deployed EC2 instance all the way back to the original code commit to git by the Developer.

spin48.png

Red/Black Deployments

With the new AMI baked successfully, and a required manual approval, using a Manual Judgement type pipeline stage, we can now begin a Red/Black deployment to AWS.

spin07

Using the Server Group configuration in the Deploy stage, Spinnaker deploys two EC2 instances, behind the ELB.

spin08

Below, we see the successful results of the Red/Black deployment. The single Spinnaker Cluster contains two deployed Server Groups. One group, the previously active Server Group (RED), comprised of two EC2 instances, is disabled. The ‘RED’ EC2 instances are unregistered with the load balancer but still running. The new Server Group (BLACK), also comprised of two EC2 instances, is now active and registered with the Load Balancer. Spinnaker will spread EC2 instances evenly across all Availability Zones in the US East (N. Virginia) Region.

spin38

From the AWS Console, we can observe four running instances, though only two are registered with the load-balancer.

spin34

Here we see each deployed Server Group has a different Auto Scaling Group and Launch Configuration. Note the continued use of naming conventions by Spinnaker.

spin33

 There can be only one, Highlander!

Now, in the Deploy stage of the pipeline, we will switch the Server Groupʼs Strategy to Highlander. The Highlander strategy will, as you probably guessed by the name, destroy all other Server Groups in the Cluster. This is more typically used for lower environments, like Development or Test, where you are only interested in the next version of the application for testing. The Red/Black strategy is more applicable to Production, where you want the opportunity to quickly rollback to the previous deployment, if necessary.

spin12

Following a successful deployment, below, we now see the first two Server Groups have been terminated, and a third Server Group in the Cluster is active.

spin40.png

In the AWS Console, we can confirm the four previous EC2 instances have been successfully terminated as a result of the Highlander deployment strategy, and two new instances are running.

spin39

As well, the previous Auto Scaling Groups and Launch Configurations have been deleted from AWS by Spinnaker.

spin44.png

As expected, the Classic Load Balancer only contains the two most recent EC2 instances from the last Server Group deployed.

spin41

Confirming the Deployment

Using the DNS address of the load balancer, we can hit the Election service endpoints, on either of the EC2 instances. All API endpoints are listed in the Projectʼs README file. Below, from a web browser, we see the candidates resource returning candidate information, retrieved from the Electionʼs PostgreSQL RDS database Test instance.

spin42

Similarly, from Postman, we can hit the load balancer and get back election information from the elections resource, using an HTTP GET.

spin43.png

I intentionally left out a discussion of the service’s RDS database and how configuration management was handled with Spring Profiles and Spring Cloud Config. Both topics were out of scope for this post.

Conclusion

Although this was a brief, whirlwind overview of deployment tools, it shows the power of delivery tools like Spinnaker, when seamlessly combined with other tools, like Jenkins and the Nebula plugins. Together, these tools are capable of efficiently, repeatably, and securely deploying large numbers of containerized and non-containerized applications to a variety of private, public, and hybrid cloud infrastructure.

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

¹ Running Spinnaker on Compute Engine

, , , , , , , , , , , ,

1 Comment

Deploying Spring Boot Apps to AWS with Netflix Nebula and Spinnaker: Part 1 of 2

Listening to DevOps industry pundits, you might be convinced everyone is running containers in Production (or by now, serverless). Although containerization is growing at a phenomenal rate, several recent surveys¹ indicate less than 50% of enterprises are deploying containers in Production. Filter those results further with the fact, of those enterprises, only a small percentage of their total application portfolios are containerized, let alone in Production.

As a DevOps Consultant, I regularly work with corporations whose global portfolios are in the thousands of applications. Indeed, some percentage of their applications are containerized, with less running in Production. However, a majority of those applications, even those built on modern, light-weight, distributed architectures, are still being deployed to bare-metal and virtualized public cloud and private data center infrastructure, for a variety of reasons.

Enterprise Deployment

Due to the scale and complexity of application portfolios, many organizations have invested in enterprise deployment tools, either commercially available or developed in-house. The enterprise deployment tool’s primary objective is to standardize the process of securely, reliably, and repeatably packaging, publishing, and deploying both containerized and non-containerized applications to large fleets of virtual machines and bare-metal servers, across multiple, geographically dispersed data centers and cloud providers. Enterprise deployment tools are particularly common in tightly regulated and compliance-driven organizations, as well as organizations that have undertaken large amounts of M&A, resulting in vastly different application technology stacks.

Enterprise CI/CD/Release Workflow

Better-known examples of commercially available enterprise deployment tools include IBM UrbanCode Deploy (aka uDeploy), XebiaLabs XL Deploy, CA Automic Release Automation, Octopus Deploy, and Electric Cloud ElectricFlow. While commercial tools continue to gain market share³, many organizations are tightly coupled to their in-house solutions through years of use and fear of widespread process disruption, given current economic, security, compliance, and skills-gap sensitivities.

Deployment Tool Anatomy

Most Enterprise deployment tools are compatible with standard binary package types, including Debian (.deb) and Red Hat  (RPM) Package Manager (.rpm) packages for Linux, NuGet (.nupkg) packages for Windows, and Node Package Manager (.npm) and Bower for JavaScript. There are equivalent package types for other popular languages and formats, such as Go, Python, Ruby, SQL, Android, Objective-C, Swift, and Docker. Packages usually contain application metadata, a signature to ensure the integrity and/or authenticity², and a compressed payload.

Enterprise deployment tools are normally integrated with open-source packaging and publishing tools, such as Apache Maven, Apache Ivy/Ant, Gradle, NPMNuGet, BundlerPIP, and Docker.

Binary packages (and images), built with enterprise deployment tools, are typically stored in private, open-source or commercial binary (artifact) repositories, such as SpacewalkJFrog Artifactory, and Sonatype Nexus Repository. The latter two, Artifactory and Nexus, support a multitude of modern package types and repository structures, including Maven, NuGet, PyPI, NPM, Bower, Ruby Gems, CocoaPods, Puppet, Chef, and Docker.

Mature binary repositories provide many features in addition to package management, including role-based access control, vulnerability scanning, rich APIs, DevOps integration, and fault-tolerant, high-availability architectures.

Lastly, enterprise deployment tools generally rely on standard package management systems to retrieve and install cryptographically verifiable packages and images. These include YUM (Yellowdog Updater, Modified), APT (aptitude), APK (Alpine Linux), NuGet, Chocolatey, NPM, PIP, Bundler, and Docker. Packages are deployed directly to running infrastructure, or indirectly to intermediate deployable components as Amazon Machine Images (AMI), Google Compute Engine machine images, VMware machines, Docker Images, or CoreOS rkt.

Open-Source Alternative

One such enterprise with an extensive portfolio of both containerized and non-containerized applications is Netflix. To standardize their deployments to multiple types of cloud infrastructure, Netflix has developed several well-known open-source software (OSS) tools, including the Nebula Gradle plugins and Spinnaker. I discussed Spinnaker in my previous post, Managing Applications Across Multiple Kubernetes Environments with Istio, as an alternative to Jenkins for deploying container workloads to Kubernetes on Google (GKE).

As a leader in OSS, Netflix has documented their deployment process in several articles and presentations, including a post from 2016, ‘How We Build Code at Netflix.’ According to the article, the high-level process for deployment to Amazon EC2 instances involves the following steps:

  • Code is built and tested locally using Nebula
  • Changes are committed to a central git repository
  • Jenkins job executes Nebula, which builds, tests, and packages the application for deployment
  • Builds are “baked” into Amazon Machine Images (using Spinnaker)
  • Spinnaker pipelines are used to deploy and promote the code change

The Nebula plugins and Spinnaker leverage many underlying, open-source technologies, including Pivotal Spring, Java, Groovy, Gradle, Maven, Apache Commons, Redline RPM, HashiCorp Packer, Redis, HashiCorp Consul, Cassandra, and Apache Thrift.

Both the Nebula plugins and Spinnaker have been battle tested in Production by Netflix, as well as by many other industry leaders after Netflix open-sourced the tools in 2014 (Nebula) and 2015 (Spinnaker). Currently, there are approximately 20 Nebula Gradle plugins available on GitHub. Notable core-contributors in the development of Spinnaker include Google, Microsoft, Pivotal, Target, Veritas, and Oracle, to name a few. A sign of its success, Spinnaker currently has over 4,600 Stars on GitHub!

Part Two: Demonstration

In Part Two, we will deploy a production-ready Spring Boot application, the Election microservice, to multiple Amazon EC2 instances, behind an Elastic Load Balancer (ELB). We will use a fully automated DevOps workflow. The build, test, package, bake, deploy process will be handled by the Netflix Nebula Gradle Linux Packaging Plugin, Jenkins, and Spinnaker. The high-level process will involve the following steps:

  • Configure Gradle to build a production-ready fully executable application for Unix systems (executable JAR)
  • Using deb-s3 and GPG Suite, create a secure, signed APT (Debian) repository on Amazon S3
  • Using Jenkins and the Netflix Nebula plugin, build a Debian package, containing the executable JAR and configuration files
  • Using Jenkins and deb-s3, publish the package to the S3-based APT repository
  • Using Spinnaker (HashiCorp Packer under the covers), bake an Ubuntu Amazon Machine Image (AMI), replete with the executable JAR installed from the Debian package
  • Deploy an auto-scaling set of Amazon EC2 instances from the baked AMI, behind an ELB, running the Spring Boot application using both the Red/Black and Highlander deployment strategies
  • Be able to repeat the entire automated build, test, package, bake, deploy process, triggered by a new code push to GitHub

The overall build, test, package, bake, deploy process will look as follows.

DebianPackageWorkflow12

References

 

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

¹ Recent Surveys: ForresterPortworx,  Cloud Foundry Survey
² Courtesy Wikipedia – rpm
³ XebiaLabs Kicks Off 2017 with Triple-Digit Growth in Enterprise DevOps

, , , , , , , , , , , ,

1 Comment

Developing Cloud-Native Data-Centric Spring Boot Applications for Pivotal Cloud Foundry

In this post, we will explore the development of a cloud-native, data-centric Spring Boot 2.0 application, and its deployment to Pivotal Software’s hosted Pivotal Cloud Foundry service, Pivotal Web Services. We will add a few additional features, such as Spring Data, Lombok, and Swagger, to enhance our application.

According to Pivotal, Spring Boot makes it easy to create stand-alone, production-grade Spring-based Applications. Spring Boot takes an opinionated view of the Spring platform and third-party libraries. Spring Boot 2.0 just went GA on March 1, 2018. This is the first major revision of Spring Boot since 1.0 was released almost 4 years ago. It is also the first GA version of Spring Boot that provides support for Spring Framework 5.0.

Pivotal Web Services’ tagline is ‘The Agile Platform for the Agile Team Powered by Cloud Foundry’. According to Pivotal,  Pivotal Web Services (PWS) is a hosted environment of Pivotal Cloud Foundry (PCF). PWS is hosted on AWS in the US-East region. PWS utilizes two availability zones for redundancy. PWS provides developers a Spring-centric PaaS alternative to AWS Elastic Beanstalk, Elastic Container Service (Amazon ECS), and OpsWorks. With PWS, you get the reliability and security of AWS, combined with the rich-functionality and ease-of-use of PCF.

To demonstrate the feature-rich capabilities of the Spring ecosystem, the Spring Boot application shown in this post incorporates the following complimentary technologies:

  • Spring Boot Actuator: Sub-project of Spring Boot, adds several production grade services to Spring Boot applications with little developer effort
  • Spring Data JPA: Sub-project of Spring Data, easily implement JPA based repositories and data access layers
  • Spring Data REST: Sub-project of Spring Data, easily build hypermedia-driven REST web services on top of Spring Data repositories
  • Spring HATEOAS: Create REST representations that follow the HATEOAS principle from Spring-based applications
  • Springfox Swagger 2: We are using the Springfox implementation of the Swagger 2 specification, an automated JSON API documentation for API’s built with Spring
  • Lombok: The @Data annotation generates boilerplate code that is typically associated with simple POJOs (Plain Old Java Objects) and beans: @ToString, @EqualsAndHashCode, @Getter, @Setter, and @RequiredArgsConstructor

Source Code

All source code for this post can be found on GitHub. To get started quickly, use one of the two following commands (gist).

For this post, I have used JetBrains IntelliJ IDEA and Git Bash on Windows for development. However, all code should be compatible with most popular IDEs and development platforms. The project assumes you have Docker and the Cloud Foundry Command Line Interface (cf CLI) installed locally.

Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.

Demo Application

The Spring Boot application demonstrated in this post is a simple election-themed RESTful API. The app allows API consumers to create, read, update, and delete, candidates, elections, and votes, via its exposed RESTful HTTP-based resources.

The Spring Boot application consists of (7) JPA Entities that mirror the tables and views in the database, (7) corresponding Spring Data Repositories, (2) Spring REST Controller, (4) Liquibase change sets, and associated Spring, Liquibase, Swagger, and PCF configuration files. I have intentionally chosen to avoid the complexities of using Data Transfer Objects (DTOs) for brevity, albeit a security concern, and directly expose the entities as resources.

img022_Final_Project

Controller Resources

This application is a simple CRUD application. The application contains a few simple HTTP GET resources in each of the two controller classes, as an introduction to Spring REST controllers. For example, the CandidateController contains the /candidates/summary and /candidates/summary/{election} resources (shown below in Postman). Typically, you would expose your data to the end-user as controller resources, as opposed to exposing the entities directly. The ease of defining controller resources is one of the many powers of Spring Boot.

img025_CustomResource.PNG

Paging and Sorting

As an introduction to Spring Data’s paging and sorting features, both the VoteRepository and VotesElectionsViewRepository Repository Interfaces extend Spring Data’s PagingAndSortingRepository<T,ID> interface, instead of the default CrudRepository<T,ID> interface. With paging and sorting enabled, you may both sort and limit the amount of data returned in the response payload. For example, to reduce the size of your response payload, you might choose to page through the votes in blocks of 25 votes at a time. In that case, as shown below in Postman, if you needed to return just votes 26-50, you would append the /votes resource with ?page=1&size=25. Since paging starts at page 0 (zero), votes 26-50 will on page 1.

img024_Paging

Swagger

This project also includes the Springfox implementation of the Swagger 2 specification. Along with the Swagger 2 dependency, the project takes a dependency on io.springfox:springfox-swagger-ui. The Springfox Swagger UI dependency allows us to view and interactively test our controller resources through Swagger’s browser-based UI, as shown below.

img027B_Swagger

All Swagger configuration can be found in the project’s SwaggerConfig Spring Configuration class.

Gradle

This post’s Spring Boot application is built with Gradle, although it could easily be converted to Maven if desired. According to Gradle, Gradle is the modern tool used to build, automate and deliver everything from mobile apps to microservices.

Data

In real-life, most applications interact with one or more data sources. The Spring Boot application demonstrated in this post interacts with data from a PostgreSQL database. PostgreSQL, or simply Postgres, is the powerful, open-source object-relational database system, which has supported production-grade applications for 15 years. The application’s single elections database consists of (6) tables, (3) views, and (2) function, which are both used to generate random votes for this demonstration.

img020_Database_Diagram

Spring Data makes interacting with PostgreSQL easy. In addition to the features of Spring Data, we will use Liquibase. Liquibase is known as the source control for your database. With Liquibase, your database development lifecycle can mirror your Spring development lifecycle. Both DDL (Data Definition Language) and DML (Data Manipulation Language) changes are versioned controlled, alongside the Spring Boot application source code.

Locally, we will take advantage of Docker to host our development PostgreSQL database, using the official PostgreSQL Docker image. With Docker, there is no messy database installation and configuration of your local development environment. Recreating and deleting your PostgreSQL database is simple.

To support the data-tier in our hosted PWS environment, we will implement ElephantSQL, an offering from the Pivotal Services Marketplace. ElephantSQL is a hosted version of PostgreSQL, running on AWS. ElephantSQL is referred to as PostgreSQL as a Service, or more generally, a Database as a Service or DBaaS. As a Pivotal Marketplace service, we will get easy integration with our PWS-hosted Spring Boot application, with near-zero configuration.

Docker

First, set up your local PostgreSQL database using Docker and the official PostgreSQL Docker image. Since this is only a local database instance, we will not worry about securing our database credentials (gist).

Your running PostgreSQL container should resemble the output shown below.

img001_docker

Data Source

Most IDEs allow you to create and save data sources. Although this is not a requirement, it makes it easier to view the database’s resources and table data. Below, I have created a data source in IntelliJ from the running PostgreSQL container instance. The port, username, password, and database name were all taken from the above Docker command.

img002_IntelliJ_Data_Source

Liquibase

There are multiple strategies when it comes to managing changes to your database. With Liquibase, each set of changes are handled as change sets. Liquibase describes a change set as an atomic change that you want to apply to your database. Liquibase offers multiple formats for change set files, including XML, JSON, YAML, and SQL. For this post, I have chosen SQL, specifically PostgreSQL SQL dialect, which can be designated in the IntelliJ IDE. Below is an example of the first changeset, which creates four tables and two indexes.

img023_Change_Set

As shown below, change sets are stored in the db/changelog/changes sub-directory, as configured in the master change log file (db.changelog-master.yaml). Change set files follow an incremental naming convention.

img003C_IntelliJ_Liquibase_Changesets

The empty PostgreSQL database, before any Liquibase changes, should resemble the screengrab shown below.

img003_IntelliJ_Blank_Database_cropped

To automatically run Liquibase database migrations on startup, the org.liquibase:liquibase-core dependency must be added to the project’s build.gradle file. To apply the change sets to your local, empty PostgreSQL database, simply start the service locally with the gradle bootRun command. As the app starts after being compiled, any new Liquibase change sets will be applied.

img004_Gradle_bootRun

You might ask how does Liquibase know the change sets are new. During the initial startup of the Spring Boot application, in addition to any initial change sets, Liquibase creates two database tables to track changes, the databasechangelog and databasechangeloglock tables. Shown below are the two tables, along with the results of the four change sets included in the project, and applied by Liquibase to the local PostgreSQL elections database.

img005_IntelliJ_Initial_Database_cropped

Below we see the contents of the databasechangelog table, indicating that all four change sets were successfully applied to the database. Liquibase checks this table before applying change sets.

img006B_IntelliJ_Database_Change_Log

ElephantSQL

Before we can deploy our Spring Boot application to PWS, we need an accessible PostgreSQL instance in the Cloud; I have chosen ElephantSQL. Available through the Pivotal Services Marketplace, ElephantSQL currently offers one free and three paid service plans for their PostgreSQL as a Service. I purchased the Panda service plan as opposed to the free Turtle service plan. I found the free service plan was too limited in the maximum number of database connections for multiple service instances.

Previewing and purchasing an ElephantSQL service plan from the Pivotal Services Marketplace, assuming you have an existing PWS account, literally takes a single command (gist).

The output of the command should resemble the screengrab below. Note the total concurrent connections and total storage for each plan.

img007_PCF_ElephantSQL_Service_Purchase

To get details about the running ElephantSQL service, use the cf service elections command.

img007_PCF_ElephantSQL_Service_Info

From the ElephantSQL Console, we can obtain the connection information required to access our PostgreSQL elections database. You will need the default database name, username, password, and URL.

img012_PWS_ElephantSQL_Details

Service Binding

Once you have created the PostgreSQL database service, you need to bind the database service to the application. We will bind our application and the database, using the PCF deployment manifest file (manifest.yml), found in the project’s root directory. Binding is done using the services section (shown below).

The key/value pairs in the env section of the deployment manifest will become environment variables, local to the deployed Spring Boot service. These key/value pairs in the manifest will also override any configuration set in Spring’s external application properties file (application.yml). This file is located in the resources sub-directory. Note the SPRING_PROFILES_ACTIVE: test environment variable in the manifest.yml file. This variable designates which Spring Profile will be active from the multiple profiles defined in the application.yml file.

img008B_PCF_Manifest

Deployment to PWS

Next, we run gradle build followed by cf push to deploy one instance of the Spring Boot service to PWS and associate it with our ElephantSQL database instance. Below is the expected output from the cf push command.

img008_PCF_CF_Push

Note the route highlighted below. This is the URL where your Spring Boot service will be available.

img009_PCF_CF_Push2

To confirm your ElephantSQL database was populated by Liquibase when PWS started the deployed Spring application instance, we can check the ElephantSQL Console’s Stats tab. Note the database tables and rows in each table, signifying Liquibase ran successfully. Alternately, you could create another data source in your IDE, connected to ElephantSQL; this can be helpful for troubleshooting.

img013_Candidates

To access the running service and check that data is being returned, point your browser (or Postman) to the URL returned from the cf push command output (see second screengrab above) and hit the /candidates resource. Obviously, your URL, referred to as a route by PWS, will be different and unique. In the response payload, you should observe a JSON array of eight candidate objects. Each candidate was inserted into the Candidate table of the database, by Liquibase, when Liquibase executed the second of the four change sets on start-up.

img012_PWS_ElephantSQL

With Spring Boot Actuator and Spring Data REST, our simple Spring Boot application has dozens of resources exposed automatically, without extensive coding of resource controllers. Actuator exposes resources to help manage and troubleshoot the application, such as info, health, mappings (shown below), metrics, env, and configprops, among others. All Actuator resources are exposed explicitly, thus they can be disabled for Production deployments. With Spring Boot 2.0, all Actuator resources are now preceded with /actuator/ .

img029_Postman_Mappings

According to Pivotal, Spring Data REST builds on top of Spring Data repositories, analyzes an application’s domain model and exposes hypermedia-driven HTTP resources for aggregates contained in the model, such as our /candidates resource. A partial list of the application’s exposed resources are listed in the GitHub project’s README file.

In Spring’s approach to building RESTful web services, HTTP requests are handled by a controller. Spring Data REST automatically exposes CRUD resources for our entities. With Spring Data JPA, POJOs like our Candidate class are annotated with @Entity, indicating that it is a JPA entity. Lacking a @Table annotation, it is assumed that this entity will be mapped to a table named Candidate.

With Spring’s Data REST’s RESTful HTTP-based API, traditional database Create, Read, Update, and Delete commands for each PostgreSQL database table are automatically mapped to equivalent HTTP methods, including POST, GET, PUT, PATCH, and DELETE.

Below is an example, using Postman, to create a new Candidate using an HTTP POST method.

img029_Postman_Post

Below is an example, using Postman, to update a new Candidate using an HTTP PUT method.

img029_Postman_Put.PNG

With Spring Data REST, we can even retrieve data from read-only database Views, as shown below. This particular JSON response payload was returned from the candidates_by_elections database View, using the /election-candidates resource.

img028_Postman_View.PNG

Scaling Up

Once your application is deployed and you have tested its functionality, you can easily scale out or scale in the number instances, memory, and disk, with the cf scale command (gist).

Below is sample output from scaling up the Spring Boot application to two instances.

img016_Scale_Up2

Optionally, you can activate auto-scaling, which will allow the application to scale based on load.

img016_Autoscaling.PNG

Following the PCF architectural model, auto-scaling is actually another service from the Pivotal Services Marketplace, PCF App Autoscaler, as seen below, running alongside our ElephantSQL service.

img016_Autoscaling2.PNG

With PCF App Autoscaler, you set auto-scaling minimum and maximum instance limits and define scaling rules. Below, I have configured auto-scaling to scale out the number of application instances when the average CPU Utilization of all instances hits 80%. Conversely, the application will scale in when the average CPU Utilization recedes below 40%. In addition to CPU Utilization, PCF App Autoscaler also allows you to set scaling rules based on HTTP Throughput, HTTP Latency, RabbitMQ Depth (queue depth), and Memory Utilization.

Furthermore, I set the auto-scaling minimum number of instances to two and the maximum number of instances to four. No matter how much load is placed on the application, PWS will not scale above four instances. Conversely, PWS will maintain a minimum of two running instances at all times.

img016_Autoscaling3

Conclusion

This brief post demonstrates both the power and simplicity of Spring Boot to quickly develop highly-functional, data-centric RESTful API applications with minimal coding. Further, when coupled with Pivotal Cloud Foundry, Spring developers have a highly scalable, resilient cloud-native application hosting platform.

, , , , , , , , , , , ,

Leave a comment

Docker Enterprise Edition: Multi-Environment, Single Control Plane Architecture for AWS

Final_DockerEE_21 (1)

Designing a successful, cloud-based containerized application platform requires a balance of performance and security with cost, reliability, and manageability. Ensuring that a platform meets all functional and non-functional requirements, while remaining within budget and is easily maintainable, can be challenging.

As Cloud Architect and DevOps Team Lead, I recently participated in the development of two architecturally similar, lightweight, cloud-based containerized application platforms. From the start, both platforms were architected to maximize security and performance, while minimizing cost and operational complexity. The later platform was built on AWS with Docker Enterprise Edition.

Docker Enterprise Edition

Released in March of this year, Docker Enterprise Edition (Docker EE) is a secure, full-featured container-based management platform. There are currently eight versions of Docker EE, available for Windows Server, Azure, AWS, and multiple Linux distros, including RHEL, CentOS, Ubuntu, SUSE, and Oracle.

Docker EE is one of several production-grade container orchestration Platforms as a Service (PaaS). Some of the other container platforms in this category include:

Docker Community Edition (CE), Kubernetes, and Apache Mesos are free and open-source. Some providers, such as Rancher Labs, offer enterprise support for an additional fee. Cloud-based services, such as Red Hat Openshift Online, AWS, GCE, and ACS, charge the typical usage monthly fee. Docker EE, similar to Mesosphere Enterprise DC/OS and Red Hat OpenShift, is priced on a per node/per year annual subscription model.

Docker EE is currently offered in three subscription tiers, including Basic, Standard, and Advanced. Additionally, Docker offers Business Day and Business Critical support. Docker EE’s Advanced Tier adds several significant features, including secure multi-tenancy with node-based isolation, and image security scanning and continuous vulnerability scanning, as part of Docker EE’s Docker Trusted Registry.

Architecting for Affordability and Maintainability

Building an enterprise-scale application platform, using public cloud infrastructure, such as AWS, and a licensed Containers-as-a-Service (CaaS) platform, such as Docker EE, can quickly become complex and costly to build and maintain. Based on current list pricing, the cost of a single Linux node ranges from USD 75 per month for basic support, up to USD 300 per month for Docker Enterprise Edition Advanced with Business Critical support. Although cost is relative to the value generated by the application platform, none the less, architects should always strive to avoid unnecessary complexity and cost.

Reoccurring operational costs, such as licensed software subscriptions, support contracts, and monthly cloud-infrastructure charges, are often overlooked by project teams during the build phase. Accurately forecasting reoccurring costs of a fully functional Production platform, under expected normal load, is essential. Teams often overlook how Docker image registries, databases, data lakes, and data warehouses, quickly swell, inflating monthly cloud-infrastructure charges to maintain the platform. The need to control cloud costs have led to the growth of third-party cloud management solutions, such as CloudCheckr Cloud Management Platform (CMP).

Shared Docker Environment Model

Most software development projects require multiple environments in which to continuously develop, test, demonstrate, stage, and release code. Creating separate environments, replete with their own Docker EE Universal Control Plane (aka Control Plane or UCP), Docker Trusted Registry (DTR), AWS infrastructure, and third-party components, would guarantee a high-level of isolation and performance. However, replicating all elements in each environment would add considerable build and run costs, as well as unnecessary complexity.

On both recent projects, we choose to create a single AWS Virtual Private Cloud (VPC), which contained all of the non-production environments required by our project teams. In parallel, we built an entirely separate Production VPC for the Production environment. I’ve seen this same pattern repeated with Red Hat OpenStack and Microsoft Azure.

Production

Isolating Production from the lower environments is essential to ensure security, and to eliminate non-production traffic from impacting the performance of Production. Corporate compliance and regulatory policies often dictate complete Production isolation. Having separate infrastructure, security appliances, role-based access controls (RBAC), configuration and secret management, and encryption keys and SSL certificates, are all required.

For complete separation of Production, different AWS accounts are frequently used. Separate AWS accounts provide separate billing, usage reporting, and AWS Identity and Access Management (IAM), amongst other advantages.

Performance and Staging

Unlike Production, there are few reasons to completely isolate lower-environments from one another. The exception I’ve encountered is Performance and Staging. These two environments are frequently separated from other environments to ensure the accuracy of performance testing and release staging activities. Performance testing, in particular, can generate enormous load on systems, which if not isolated, will impair adjacent environments, applications, and monitoring systems.

On a few recent projects, to reduce cost and complexity, we repurposed the UAT environment for performance testing, once user-acceptance testing was complete. Performance testing was conducted during off-peak development and testing periods, with access to adjacent environments blocked.

The multi-purpose UAT environment further served as a Staging environment. Applications were deployed and released to the UAT and Performance environments, following a nearly-identical process used for Production. Hotfixes to Production were also tested in this environment.

Example of Shared Environments

To demonstrate how to architect a shared non-production Docker EE environment, which minimizes cost and complexity, let’s examine the example shown below. In the example, built on AWS with Docker EE, there are four typical non-production environments, CI/CD, Development, Test, and UAT, and one Production environment.

Docker_EE_AWS_Diagram_01

In the example, there are two separate VPCs, the Production VPC, and the Non-Production VPC. There is no reason to configure VPC Peering between the two VPCs, as there is no need for direct communication between the two. Within the Non-Production VPC, to the left in the diagram, there is a cluster of three Docker EE UCP Manager EC2 nodes, a cluster of three DTR Worker EC2 nodes, and the four environments, consisting of varying numbers of EC2 Worker nodes. Production, to the right of the diagram, has its own cluster of three UCP Manager EC2 nodes and a cluster of six EC2 Worker nodes.

Single Non-Production UCP

As a primary means of reducing cost and complexity, in the example, a single minimally-sized Docker EE UCP cluster of three Manager nodes orchestrate activities across all four non-production environments. Alternately, you would have to create a UCP cluster for each environment; that means nine more Worker Nodes to configure and maintain.

The UCP users, teams, organizations, access controls, Docker Secrets, overlay networks, and other UCP features, for all non-production environments, are managed through the single Control Plane. All deployments to all the non-production environments, from the CI/CD server, are performed through the single Control Plane. Each UCP Manager node is deployed to a different AWS Availability Zone (AZ) to ensure high-availability.

Shared DTR

As another means of reducing cost and complexity, in the example, a Docker EE DTR cluster of three Worker nodes contain all Docker image repositories. Both the non-production and the Production environments use this DTR as a secure source of all Docker images. Not having to replicate image repositories, access controls, infrastructure, and figuring out how to migrate images between two separate DTR clusters, is a significant time, cost, and complexity savings. Each DTR Worker node is also deployed to a different AZ to ensure high-availability.

Using a shared DTR between non-production and Production is an important security consideration your project team needs to consider. A single DTR, shared between non-production and Production, comes with inherent availability and security risks, which should be understood in advance.

Separate Non-Production Worker Nodes

In the shared non-production environments example, each environment has dedicated AWS EC2 instances configured as Docker EE Worker nodes. The number of Worker nodes is determined by the requirements for each environment, as dictated by the project’s Development, Testing, Security, and DevOps teams. Like the UCP and DTR clusters, each Worker node, within an individual environment, is deployed to a different AZ to ensure high-availability and mimic the Production architecture.

Minimizing the number of Worker nodes in each environment, as well as the type and size of each EC2 node, offers a significant potential cost and administrative savings.

Separate Environment Ingress

In the example, the UCP, DTR, and each of the four environments are accessed through separate URLs, using AWS Hosted Zone CNAME records (subdomains). Encrypted HTTPS traffic is routed through a series of security appliances, depending on traffic type, to individual private AWS Elastic Load Balancers (ELB), one for both UCPs, the DTR, and each of the environments. Each ELB load-balances traffic to the Docker EE nodes associated the specific traffic. All firewalls, ELBs, and the UCP and DTR are secured with a high-grade wildcard SSL certificate.

AWS_ELB

Separate Data Sources

In the shared non-production environments example, there is one Amazon Relational Database Service‎ (RDS) instance in non-Production and one Production. Both RDS instances are replicated across multiple Availability Zones. Within the single shared non-production RDS instance, there are four separate databases, one per non-production environment. This architecture sacrifices the potential database performance of separate RDS instances for additional cost and complexity.

Maintaining Environment Separation

Node Labels

To obtain sufficient environment separation while using a single UCP, each Docker EE Worker node is tagged with an environment node label. The node label indicates which environment the Worker node is associated with. For example, in the screenshot below, a Worker node is assigned to the Development environment by tagging it with the key of environment and the value of dev.

Node_Label

* The Docker EE screens shown here are from UCP 2.1.5, not the recently released 2.2.x, which has an updated UI appearance.Each service’s Docker Compose file uses deployment placement constraints, which indicate where Docker should or should not deploy services. In the hello-world Docker Compose file example below, the node.labels.environment constraint is set to the ENVIRONMENT variable, which is set during container deployment by the CI/CD server. This constraint directs Docker to only deploy the hello-world service to nodes which contain the placement constraint of node.labels.environment, whose value matches the ENVIRONMENT variable value.

Deploying from CI/CD Server

The ENVIRONMENT value is set as an environment variable, which is then used by the CI/CD server, running a docker stack deploy or a docker service update command, within a deployment pipeline. Below is an example of how to use the environment variable as part of a Jenkins pipeline as code Jenkinsfile.

Centralized Logging and Metrics Collection

Centralized logging and metrics collection systems are used for application and infrastructure dashboards, monitoring, and alerting. In the shared non-production environment examples, the centralized logging and metrics collection systems are internal to each VPC, but reside on separate EC2 instances and are not registered with the Control Plane. In this way, the logging and metrics collection systems should not impact the reliability, performance, and security of the applications running within Docker EE. In the example, Worker nodes run a containerized copy of fluentd, which collects and pushes logs to ELK’s Elasticsearch.

Logging and metrics collection systems could also be supplied by external cloud-based SaaS providers, such as LogglySysdig and Datadog, or by the platform’s cloud-provider, such as Amazon CloudWatch.

With four environments running multiple containerized copies of each service, figuring out which log entry came from which service instance, requires multiple data points. As shown in the example Kibana UI below, the environment value, along with the service name and container ID, as well as the git commit hash and branch, are added to each log entry for easier troubleshooting. To include the environment, the value of the ENVIRONMENT variable is passed to Docker’s fluentd log driver as an env option. This same labeling method is used to tag metrics.

ELK

Separate Docker Service Stacks

For further environment separation within the single Control Plane, services are deployed as part of the same Docker service stack. Each service stack contains all services that comprise an application running within a single environment. Multiple stacks may be required to support multiple, distinct applications within the same environment.

For example, in the screenshot below, a hello-world service container, built with a Docker image, tagged with build 59 of the Jenkins continuous integration pipeline, is deployed as part of both the Development (dev) and Test service stacks. The CD and UAT service stacks each contain different versions of the hello-world service.

Hello-World-UCP

Separate Docker Overlay Networks

For additional environment separation within the single non-production UCP, all Docker service stacks associated with an environment, reside on the same Docker overlay network. Overlay networks manage communications among the Docker Worker nodes, enabling service-to-service communication for all services on the same overlay network while isolating services running on one network from services running on another network.

in the example screenshot below, the hello-world service, a member of the test service stack, is running on the test_default overlay network.

Network

Cleaning Up

Having distinct environment-centric Docker service stacks and overlay networks makes it easy to clean up an environment, without impacting adjacent environments. Both service stacks and overlay networks can be removed to clear an environment’s contents.

Separate Performance Environment

In the alternative example below, a Performance environment has been added to the Non-Production VPC. To ensure a higher level of isolation, the Performance environment has its own UPC, RDS, and ELBs. The Performance environment shares the DTR, as well as the security, logging, and monitoring components, with the rest of the non-production environments.

Below, the Performance environment has half the number of Worker nodes as Production. Performance results can be scaled for expected Production performance, given more nodes. Alternately, the number of nodes can be scaled up temporarily to match Production, then scaled back down to a minimum after testing is complete.

Docker_EE_AWS_Diagram_02

Shared DevOps Tooling

All environments leverage shared Development and DevOps resources, deployed to a separate VPC. Resources include Agile Application Lifecycle Management (ALM), such as JIRA or CA Agile Central, source control repository management (SCM), such as GitLab or Bitbucket, binary repository management, such as Artifactory or Nexus, and a CI/CD solution, such as Jenkins, TeamCity, or Bamboo.

From the DevOps VPC, Docker images are pushed and pulled from the DTR in the Non-Production VPC. Deployments of container-based application are executed from the DevOps VPC CI/CD server to the non-production, Performance, and Production UCPs. Separate DevOps CI/CD pipelines and access controls are essential in maintaining the separation of the non-production and Production environments.

Docker_EE_AWS_Diagram_03

Complete Platform

Several common components found in a Docker EE cloud-based AWS platform were discussed in the post. However, a complete AWS application platform has many more moving parts. Below is a comprehensive list of components, including DevOps tooling, organized into two categories: 1) common components that can be potentially shared across the non-production environments to save cost and complexity, and 2) components that should be replicated in each non-environment for security and performance.

Shared Non-Production Components:

  • AWS
    • Virtual Private Cloud (VPC), Region, Availability Zones
    • Route Tables, Network ACLs, Internet Gateways
    • Subnets
    • Some Security Groups
    • IAM Groups, User, Roles, Policies (RBAC)
    • Relational Database Service‎ (RDS)
    • ElastiCache
    • API Gateway, Lambdas
    • S3 Buckets
    • Bastion Servers, NAT Gateways
    • Route 53 Hosted Zone (Registered Domain)
    • EC2 Key Pairs
    • Hardened Linux AMI
  • Docker EE
    • UCP and EC2 Manager Nodes
    • DTR and EC2 Worker Nodes
    • UCP and DTR Users, Teams, Organizations
    • DTR Image Repositories
    • Secret Management
  • Third-Party Components/Products
    • SSL Certificates
    • Security Components: Firewalls, Virus Scanning, VPN Servers
    • Container Security
    • End-User IAM
    • Directory Service
    • Log Aggregation
    • Metric Collection
    • Monitoring, Alerting
    • Configuration and Secret Management
  • DevOps
    • CI/CD Pipelines as Code
    • Infrastructure as Code
    • Source Code Repositories
    • Binary Artifact Repositories

Isolated Non-Production Components:

  • AWS
    • Route 53 Hosted Zones and Associated Records
    • Elastic Load Balancers (ELB)
    • Elastic Compute Cloud (EC2) Worker Nodes
    • Elastic IPs
    • ELB and EC2 Security Groups
    • RDS Databases (Single RDS Instance with Separate Databases)

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , ,

2 Comments

Provision and Deploy a Consul Cluster on AWS, using Terraform, Docker, and Jenkins

Cover2

Introduction

Modern DevOps tools, such as HashiCorp’s Packer and Terraform, make it easier to provision and manage complex cloud architecture. Utilizing a CI/CD server, such as Jenkins, to securely automate the use of these DevOps tools, ensures quick and consistent results.

In a recent post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, we built a Consul cluster using Docker swarm mode, to host distributed configurations for a Spring Boot application. The cluster was built locally with VirtualBox. This architecture is fine for development and testing, but not for use in Production.

In this post, we will deploy a highly available three-node Consul cluster to AWS. We will use Terraform to provision a set of EC2 instances and accompanying infrastructure. The instances will be built from a hybrid AMIs containing the new Docker Community Edition (CE). In a recent post, Baking AWS AMI with new Docker CE Using Packer, we provisioned an Ubuntu AMI with Docker CE, using Packer. We will deploy Docker containers to each EC2 host, containing an instance of Consul server.

All source code can be found on GitHub.

Jenkins

I have chosen Jenkins to automate all of the post’s build, provisioning, and deployment tasks. However, none of the code is written specific to Jenkins; you may run all of it from the command line.

For this post, I have built four projects in Jenkins, as follows:

  1. Provision Docker CE AMI: Builds Ubuntu AMI with Docker CE, using Packer
  2. Provision Consul Infra AWS: Provisions Consul infrastructure on AWS, using Terraform
  3. Deploy Consul Cluster AWS: Deploys Consul to AWS, using Docker
  4. Destroy Consul Infra AWS: Destroys Consul infrastructure on AWS, using Terraform

Jenkins UI

We will primarily be using the ‘Provision Consul Infra AWS’, ‘Deploy Consul Cluster AWS’, and ‘Destroy Consul Infra AWS’ Jenkins projects in this post. The fourth Jenkins project, ‘Provision Docker CE AMI’, automates the steps found in the recent post, Baking AWS AMI with new Docker CE Using Packer, to build the AMI used to provision the EC2 instances in this post.

Consul AWS Diagram 2

Terraform

Using Terraform, we will provision EC2 instances in three different Availability Zones within the US East 1 (N. Virginia) Region. Using Terraform’s Amazon Web Services (AWS) provider, we will create the following AWS resources:

  • (1) Virtual Private Cloud (VPC)
  • (1) Internet Gateway
  • (1) Key Pair
  • (3) Elastic Cloud Compute (EC2) Instances
  • (2) Security Groups
  • (3) Subnets
  • (1) Route
  • (3) Route Tables
  • (3) Route Table Associations

The final AWS architecture should resemble the following:

Consul AWS Diagram

Production Ready AWS

Although we have provisioned a fairly complete VPC for this post, it is far from being ready for Production. I have created two security groups, limiting the ingress and egress to the cluster. However, to further productionize the environment would require additional security hardening. At a minimum, you should consider adding public/private subnets, NAT gateways, network access control list rules (network ACLs), and the use of HTTPS for secure communications.

In production, applications would communicate with Consul through local Consul clients. Consul clients would take part in the LAN gossip pool from different subnets, Availability Zones, Regions, or VPCs using VPC peering. Communications would be tightly controlled by IAM, VPC, subnet, IP address, and port.

Also, you would not have direct access to the Consul UI through a publicly exposed IP or DNS address. Access to the UI would be removed altogether or locked down to specific IP addresses, and accessed restricted to secure communication channels.

Consul

We will achieve high availability (HA) by clustering three Consul server nodes across the three Elastic Cloud Compute (EC2) instances. In this minimally sized, three-node cluster of Consul servers, we are protected from the loss of one Consul server node, one EC2 instance, or one Availability Zone(AZ). The cluster will still maintain a quorum of two nodes. An additional level of HA that Consul supports, multiple datacenters (multiple AWS Regions), is not demonstrated in this post.

Docker

Having Docker CE already installed on each EC2 instance allows us to execute remote Docker commands over SSH from Jenkins. These commands will deploy and configure a Consul server node, within a Docker container, on each EC2 instance. The containers are built from HashiCorp’s latest Consul Docker image pulled from Docker Hub.

Getting Started

Preliminary Steps

If you have built infrastructure on AWS with Terraform, these steps should be familiar to you:

  1. First, you will need an AMI with Docker. I suggest reading Baking AWS AMI with new Docker CE Using Packer.
  2. You will need an AWS IAM User with the proper access to create the required infrastructure. For this post, I created a separate Jenkins IAM User with PowerUser level access.
  3. You will need to have an RSA public-private key pair, which can be used to SSH into the EC2 instances and install Consul.
  4. Ensure you have your AWS credentials set. I usually source mine from a .env file, as environment variables. Jenkins can securely manage credentials, using secret text or files.
  5. Fork and/or clone the Consul cluster project from  GitHub.
  6. Change the aws_key_name and public_key_path variable values to your own RSA key, in the variables.tf file
  7. Change the aws_amis_base variable values to your own AMI ID (see step 1)
  8. If you are do not want to use the US East 1 Region and its AZs, modify the variables.tf, network.tf, and instances.tf files.
  9. Disable Terraform’s remote state or modify the resource to match your remote state configuration, in the main.tf file. I am using an Amazon S3 bucket to store my Terraform remote state.

Building an AMI with Docker

If you have not built an Amazon Machine Image (AMI) for use in this post already, you can do so using the scripts provided in the previous post’s GitHub repository. To automate the AMI build task, I built the ‘Provision Docker CE AMI’ Jenkins project. Identical to the other three Jenkins projects in this post, this project has three main tasks, which include: 1) SCM: clone the Packer AMI GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Packer.

The SCM and Bindings tasks are identical to the other projects (see below for details), except for the use of a different GitHub repository. The project’s Build step, which runs the packer_build_ami.sh script looks as follows:

jenkins_13

The resulting AMI ID will need to be manually placed in Terraform’s variables.tf file, before provisioning the AWS infrastructure with Terraform. The new AMI ID will be displayed in Jenkin’s build output.

jenkins_14

Provisioning with Terraform

Based on the modifications you made in the Preliminary Steps, execute the terraform validate command to confirm your changes. Then, run the terraform plan command to review the plan. Assuming are were no errors, finally, run the terraform apply command to provision the AWS infrastructure components.

In Jenkins, I have created the ‘Provision Consul Infra AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Terraform. Those tasks look as follows:

Jenkins_08.png

You will obviously need to use your modified GitHub project, incorporating the configuration changes detailed above, as the SCM source for Jenkins.

Jenkins Credentials

You will also need to configure your AWS credentials.

Jenkins_03.png

The provision_infra.sh script provisions the AWS infrastructure using Terraform. The script also updates Terraform’s remote state. Remember to update the remote state configuration in the script to match your personal settings.

The Jenkins build output should look similar to the following:

jenkins_12.png

Although the build only takes about 90 seconds to complete, the EC2 instances could take a few extra minutes to complete their Status Checks and be completely ready. The final results in the AWS EC2 Management Console should look as follows:

EC2 Management Console

Note each EC2 instance is running in a different US East 1 Availability Zone.

Installing Consul

Once the AWS infrastructure is running and the EC2 instances have completed their Status Checks successfully, we are ready to deploy Consul. In Jenkins, I have created the ‘Deploy Consul Cluster AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run an SSH remote Docker command on each EC2 instance to deploy Consul. The SCM and Bindings tasks are identical to the project above. The project’s Build step looks as follows:

Jenkins_04.png

First, the delete_containers.sh script deletes any previous instances of Consul containers. This is helpful if you need to re-deploy Consul. Next, the deploy_consul.sh script executes a series of SSH remote Docker commands to install and configure Consul on each EC2 instance.

The entire Jenkins build process only takes about 30 seconds. Afterward, the output from a successful Jenkins build should show that all three Consul server instances are running, have formed a quorum, and have elected a Leader.

Jenkins_05.png

Persisting State

The Consul Docker image exposes VOLUME /consul/data, which is a path were Consul will place its persisted state. Using Terraform’s remote-exec provisioner, we create a directory on each EC2 instance, at /home/ubuntu/consul/config. The docker run command bind-mounts the container’s /consul/data path to the EC2 host’s /home/ubuntu/consul/config directory.

According to Consul, the Consul server container instance will ‘store the client information plus snapshots and data related to the consensus algorithm and other state, like Consul’s key/value store and catalog’ in the /consul/data directory. That container directory is now bind-mounted to the EC2 host, as demonstrated below.

jenkins_15

Accessing Consul

Following a successful deployment, you should be able to use the public URL, displayed in the build output of the ‘Deploy Consul Cluster AWS’ project, to access the Consul UI. Clicking on the Nodes tab in the UI, you should see all three Consul server instances, one per EC2 instance, running and healthy.

Consul UI

Destroying Infrastructure

When you are finished with the post, you may want to remove the running infrastructure, so you don’t continue to get billed by Amazon. The ‘Destroy Consul Infra AWS’ project destroys all the AWS infrastructure, provisioned as part of this post, in about 60 seconds. The project’s SCM and Bindings tasks are identical to the both previous projects. The Build step calls the destroy_infra.sh script, which is included in the GitHub project. The script executes the terraform destroy -force command. It will delete all running infrastructure components associated with the post and update Terraform’s remote state.

Jenkins_09

Conclusion

This post has demonstrated how modern DevOps tooling, such as HashiCorp’s Packer and Terraform, make it easy to build, provision and manage complex cloud architecture. Using a CI/CD server, such as Jenkins, to securely automate the use of these tools, ensures quick and consistent results.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , ,

3 Comments

Infrastructure as Code Maturity Model

Systematically Evolving an Organization’s Infrastructure

Infrastructure and software development teams are increasingly building and managing infrastructure using automated tools that have been described as “infrastructure as code.” – Kief Morris (Infrastructure as Code)

The process of managing and provisioning computing infrastructure and their configuration through machine-processable, declarative, definition files, rather than physical hardware configuration or the use of interactive configuration tools. – Wikipedia (abridged)

Convergence of CD, Cloud, and IaC

In 2011, co-authors Jez Humble, formerly of ThoughtWorks, and David Farley, published their ground-breaking book, Continuous Delivery. Humble and Farley’s book set out, in their words, to automate the ‘painful, risky, and time-consuming process’ of the software ‘build, deployment, and testing process.

cd_image_02

Over the next five years, Humble and Farley’s Continuous Delivery made a significant contribution to the modern phenomena of DevOps. According to Wikipedia, DevOps is the ‘culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes.

In parallel with the growth of DevOps, Cloud Computing continued to grow at an explosive rate. Amazon pioneered modern cloud computing in 2006 with the launch of its Elastic Compute Cloud. Two years later, in 2008, Microsoft launched its cloud platform, Azure. In 2010, Rackspace launched OpenStack.

Today, there is a flock of ‘cloud’ providers. Their services fall into three primary service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Since we will be discussing infrastructure, we will focus on IaaS and PaaS. Leaders in this space include Google Cloud Platform, RedHat, Oracle Cloud, Pivotal Cloud Foundry, CenturyLink Cloud, Apprenda, IBM SmartCloud Enterprise, and Heroku, to mention just a few.

Finally, fast forward to June 2016, O’Reilly releases Infrastructure as Code
Managing Servers in the Cloud
, by Kief Morris, ThoughtWorks. This crucial work bridges many of the concepts first introduced in Humble and Farley’s Continuous Delivery, with the evolving processes and practices to support cloud computing.

cd_image_03

This post examines how to apply the principles found in the Continuous Delivery Maturity Model, an analysis tool detailed in Humble and Farley’s Continuous Delivery, and discussed herein, to the best practices found in Morris’ Infrastructure as Code.

Infrastructure as Code

Before we continue, we need a shared understanding of infrastructure as code. Below are four examples of infrastructure as code, as Wikipedia defined them, ‘machine-processable, declarative, definition files.’ The code was written using four popular tools, including HashiCorp Packer, Docker, AWS CloudFormation, and HashiCorp Terraform. Executing the code provisions virtualized cloud infrastructure.

HashiCorp Packer

Packer definition of an AWS EBS-backed AMI, based on Ubuntu.

{
  "variables": {
    "aws_access_key": "",
    "aws_secret_key": ""
  },
  "builders": [{
    "type": "amazon-ebs",
    "access_key": "{{user `aws_access_key`}}",
    "secret_key": "{{user `aws_secret_key`}}",
    "region": "us-east-1",
    "source_ami": "ami-fce3c696",
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu",
    "ami_name": "packer-example {{timestamp}}"
  }]
}

Docker

Dockerfile, used to create a Docker image, and subsequently a Docker container, running MongoDB.

FROM ubuntu:16.04
MAINTAINER Docker
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
RUN echo "deb http://repo.mongodb.org/apt/ubuntu" \
$(cat /etc/lsb-release | grep DISTRIB_CODENAME | cut -d= -f2)/mongodb-org/3.2 multiverse" | \
tee /etc/apt/sources.list.d/mongodb-org-3.2.list
RUN apt-get update && apt-get install -y mongodb-org
RUN mkdir -p /data/db
EXPOSE 27017
ENTRYPOINT ["/usr/bin/mongod"]

AWS CloudFormation

AWS CloudFormation declaration for three services enabled on a running instance.

services:
  sysvinit:
    nginx:
      enabled: "true"
      ensureRunning: "true"
      files:
        - "/etc/nginx/nginx.conf"
      sources:
        - "/var/www/html"
    php-fastcgi:
      enabled: "true"
      ensureRunning: "true"
      packages:
        yum:
          - "php"
          - "spawn-fcgi"
    sendmail:
      enabled: "false"
      ensureRunning: "false"

HashiCorp Terraform

Terraform definition of an AWS m1.small EC2 instance, running NGINX on Ubuntu.

resource "aws_instance" "web" {
  connection { user = "ubuntu" }
instance_type = "m1.small"
Ami = "${lookup(var.aws_amis, var.aws_region)}"
Key_name = "${aws_key_pair.auth.id}"
vpc_security_group_ids = ["${aws_security_group.default.id}"]
Subnet_id = "${aws_subnet.default.id}"
provisioner "remote-exec" {
  inline = [
    "sudo apt-get -y update",
    "sudo apt-get -y install nginx",
    "sudo service nginx start",
  ]
 }
}

Cloud-based Infrastructure as a Service

The previous examples provide but the narrowest of views into the potential breadth of infrastructure as code. Leading cloud providers, such as Amazon and Microsoft, offer hundreds of unique offerings, most of which may be defined and manipulated through code — infrastructure as code.

cd_image_05

cd_image_04

What Infrastructure as Code?

The question many ask is, what types of infrastructure can be defined as code? Although vendors and cloud providers have their unique names and descriptions, most infrastructure is divided into a few broad categories:

  • Compute
  • Databases, Caching, and Messaging
  • Storage, Backup, and Content Delivery
  • Networking
  • Security and Identity
  • Monitoring, Logging, and Analytics
  • Management Tooling

Continuous Delivery Maturity Model

We also need a common understanding of the Continuous Delivery Maturity Model. According to Humble and Farley, the Continuous Delivery Maturity Model was distilled as a model that ‘helps to identify where an organization stands in terms of the maturity of its processes and practices and defines a progression that an organization can work through to improve.

The Continuous Delivery Maturity Model is a 5×6 matrix, consisting of six areas of practice and five levels of maturity. Each of the matrix’s 30 elements defines a required discipline an organization needs to follow, to be considered at that level of maturity within that practice.

Areas of Practice

The CD Maturity Model examines six broad areas of practice found in most enterprise software organizations:

  • Build Management and Continuous Integration
  • Environments and Deployment
  • Release Management and Compliance
  • Testing
  • Data Management
  • Configuration Management

Levels of Maturity

The CD Maturity Model defines five level of increasing maturity, from a score of -1 to 3, from Regressive to Optimizing:

  • Level 3: Optimizing – Focus on process improvement
  • Level 2: Quantitatively Managed – Process measured and controlled
  • Level 1: Consistent – Automated processes applied across whole application lifecycle
  • Level 0: Repeatable – Process documented and partly automated
  • Level -1: Regressive – Processes unrepeatable, poorly controlled, and reactive

cd_image_06

Maturity Model Analysis

The CD Maturity Model is an analysis tool. In my experience, organizations use the maturity model in one of two ways. First, an organization completes an impartial evaluation of their existing levels of maturity across all areas of practice. Then, the organization focuses on improving the overall organization’s maturity, attempting to achieve a consistent level of maturity across all areas of practice. Alternately, the organization concentrates on a subset of the practices, which have the greatest business value, or given their relative immaturity, are a detriment to the other practices.

cd_image_01

* CD Maturity Model Analysis Tool available on GitHub.

Infrastructure as Code Maturity Levels

Although infrastructure as code is not explicitly called out as a practice in the CD Maturity Model, many of it’s best practices can be found in the maturity model. For example, the model prescribes automated environment provisioning, orchestrated deployments, and the use of metrics for continuous improvement.

Instead of trying to retrofit infrastructure as code into the existing CD Maturity Model, I believe it is more effective to independently apply the model’s five levels of maturity to infrastructure as code. To that end, I have selected many of the best practices from the book, Infrastructure as Code, as well as from my experiences. Those selected practices have been distributed across the model’s five levels of maturity.

The result is the first pass at an evolving Infrastructure as Code Maturity Model. This model may be applied alongside the broader CD Maturity Model, or independently, to evaluate and further develop an organization’s infrastructure practices.

IaC Level -1: Regressive

Processes unrepeatable, poorly controlled, and reactive

  • Limited infrastructure is provisioned and managed as code
  • Infrastructure provisioning still requires many manual processes
  • Infrastructure code is not written using industry-standard tooling and patterns
  • Infrastructure code not built, unit-tested, provisioned and managed, as part of a pipeline
  • Infrastructure code, processes, and procedures are inconsistently documented, and not available to all required parties

IaC Level 0: Repeatable

Processes documented and partly automated

  • All infrastructure code and configuration are stored in a centralized version control system
  • Testing, provisioning, and management of infrastructure are done as part of automated pipeline
  • Infrastructure is deployable as individual components
  • Leverages programmatic interfaces into physical devices
  • Automated security inspection of components and dependencies
  • Self-service CLI or API, where internal customers provision their resources
  • All code, processes, and procedures documented and available
  • Immutable infrastructure and processes

IaC Level 1: Consistent

Automated processes applied across whole application lifecycle

  • Fully automated provisioning and management of infrastructure
  • Minimal use of unsupported, ‘home-grown’ infrastructure tooling
  • Unit-tests meet code-coverage requirements
  • Code is continuously tested upon every check-in to version control system
  • Continuously available infrastructure using zero-downtime provisioning
  • Uses configuration registries
  • Templatized configuration files (no awk/sed magic)
  • Secrets are securely management
  • Auto-scaling based on user-defined load characteristics

IaC Level 2: Quantitatively Managed

Processes measured and controlled

  • Uses infrastructure definition files
  • Capable of automated rollbacks
  • Infrastructure and supporting systems are highly available and fault tolerant
  • Externalized configuration, no black box API to modify configuration
  • Fully monitored infrastructure with configurable alerting
  • Aggregated, auditable infrastructure logging
  • All code, processes, and procedures are well documented in a Knowledge Management System
  • Infrastructure code uses declarative versus imperative programming model, maybe…

IaC Level 3: Optimizing

Focus on process improvement

  • Self-healing, self-configurable, self-optimizing, infrastructure
  • Performance tested and monitored against business KPIs
  • Maximal infrastructure utilization and workload density
  • Adheres to Cloud Native and 12-Factor patterns
  • Cloud-agnostic code that minimizes cloud vendor lock-in

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , ,

4 Comments

Operational Readiness Analysis

“Analysis: a detailed examination of the elements or structure of something, typically as a basis for discussion or interpretation.” — Google

Introduction

Recently, I had the opportunity, along with a colleague, to perform an operational readiness analysis of a client’s new application platform. The platform was in late-stage development, only a few weeks from going live. Being relatively new to the project team, our objective was to determine the amount of work remaining to make the platform operational, from a technical perspective. As well, we wanted to ensure there were no gaps in the project’s scope, as it related to technical operational readiness.

There were various approaches we could have taken to establish the amount of unfinished work and any existing gaps. We could have reviewed the state of the project in the client’s Agile project management system. We could have discussed the project’s state with the Product Owner, Project Manager, Team Lead, and Business Analyst. Neither of these methods on their own would have provided us with a complete picture.

The approach we ultimately took was what we’ve coined, an Operational Readiness Analysis, or our trendier title, the ‘State of the DevOps’. This approach is particularly effective and highly interactive. The analysis may be conducted at any stage of the Software Development Lifecycle (SDLC), to review a project’s operational state of readiness. The analysis involves five simple steps:

  1. Categorize
  2. Itemize
  3. Organize
  4. Prioritize
  5. Document

Grab Your Post-It Notes

We began, where all good ThoughtWorkers tend to begin, with a meeting room, whiteboard, dry-erase markers, and lots of brightly colored Post-It notes. ThoughtWorkers do love their Post-It notes. We gathered a small subset of the project team, including the Product Owner, Project Manager, and the team’s DevOps Engineer.

Step 1: Categorize

To start, we broadly categorized the operational requirements into buckets of work. In my experience, categorization typically takes one of two forms, either the use of role-based descriptors, like ‘Security’ and ‘Monitoring’, or the use of the ‘ilities’, such as ‘Scalability’ and ‘Maintainability’.

The ‘ilities’ are often referenced in regards to software architecture and the functional and nonfunctional requirements (NFRs) of a project. Although there are many ‘ilities’, project requirements frequently include Usability, Scalability, Maintainability, Reliability, Testability, Availability, and Serviceability. Although I enjoy referring to the ‘ilities’ when drafting project requirements, I find a broad audience more readily understands the high-level category descriptions.

We collectively agreed on eight categories, in addition to a catch-all ‘Other’ category, and distributed them on the whiteboard.

whiteboard-step-1

Role-based categories tend to include the following:

  • Security and Compliance
  • Monitoring and Alerting
  • Logging
  • Continuous Integration and Delivery
  • Release Management
  • Configuration
  • Environment Management
  • Infrastructure
  • Networking
  • Data Management
  • High Availability and Fault Tolerance
  • Performance
  • Backup and Restoration
  • Support and Maintenance
  • Documentation and Training
  • Technical Debt
  • Other

Step 2: Itemize

Next, each person placed as many Post-It notes as they wanted, below the categories on the whiteboard. Each Post-It represented an item someone felt needed to be addressed as part the analysis. Items might represent requirements currently in progress, or requirements still untouched in the project’s backlog. A Post-It note might contain a new requirement. Maybe there is a question about an aspect of the project’s relative importance, or there were concerns about how an aspect of the project had been implemented.

Participants placed an average of five to ten Post-It notes on the whiteboard. The end product was a collective mind-map of the project’s operational state (the following is only an example and doesn’t represent the actual results of any real client analysis).

whiteboard-step-2

Whereas categories tend to be generic in nature, the items are usually particular to the project, and it’s technology choices. The ‘Monitoring/Logging’ category might include individual items such as ‘Log Rotation Policy’ or ‘Splunk vs. ELK?’. The ‘Security’ category might include individual items such as ‘Need Password Rotation’ or ‘SSL Certificate Management’.

Participants often use one or two words to summarize more complex thoughts. For example, ‘Rollback’ might be stating, ‘we still don’t have a good rollback strategy for the database’. ‘Release Frequency’ may be questioning, ‘why can’t we release more frequently?’. Make sure to discuss and capture the participant’s full thoughts behind each Post-It, the during the Organize, Prioritize, and Document stages.

Don’t let participants give up too soon placing Post-It notes on the board. Often, one participant’s Post-It will spark additional thoughts other team members. Don’t be afraid to suggest a few items, if the group needs some initial motivation. Lastly, fear not, it’s never too late to go back and add missing items, uncovered in later stages of analysis.

Step 3: Organize

Quickly and non-judgmentally, review all items on the whiteboard. Group duplicates together. Ask for clarification on items where necessary. Move miscategorized items, if the author and team agree there is a more appropriate category.

Examine the ‘Other’ category. If there are items in the ‘Other’ category that are similar, consider adding a new category to capture them. Maybe the team missed identifying a key category, earlier. Conversely, there is nothing wrong with leaving a few stragglers in the ‘Other’ category; don’t overthink the exercise.

Participants should leave this stage with a general understanding of each item on the whiteboard.

Step 4: Prioritize

With all items identified, organized, and understood, discuss each item’s status. Does the item represent incomplete requirements? Does everyone agree an item’s requirements are complete? Team members may not agree on the completeness of particular items. Inconsistencies may reflect unclear requirements. More often, I find, disagreements are a result of an inconsistent or incomplete implementation of requirements across environments — Development, QA, Performance, Staging, and Production.

Why are operational requirements frequently implemented inconsistently? Often, a story is played early in the development stage, prior to all environments being available. For example, a story is played to add monitoring to Development and QA. The story is completed, tested, and closed. Afterward, the Performance environment is created. Later yet, the Staging and Production environments are created. But, there are no new stories to address monitoring the new environments.

One way to avoid this common issue with operational requirements is an effective DevOps automation strategy. In this example, an effective strategy would mean all new environments would get monitoring, automatically. The story should broadly address monitoring, and not, short-sightedly, monitoring of specific environments.

Often, only one or two project team members are responsible for ‘all the DevOps stuff’. They alone possess an accurate perspective on an item’s current status.

Gain agreement from the Product Owner, Project Manager, and key team members on the priority of incomplete requirements. Are they ‘Day One’ must-haves, or ‘Day Two’ and can be completed after the application goes live? Flag all high-priority items.

If the status of an item is unknown, such as ‘why is the QA environment always down?!’, note it as an open question and move on. If the item simply requires a quick answer to resolve, like ‘do we backup the database?’, answer it (‘the database is replicated and backed up daily’) and move on.

whiteboard-step-4

Step 5: Document

Snap a picture of the whiteboard and document its contents. We chose the client’s knowledge management system as a vehicle to share the whiteboard’s results. We created a table with columns for Category, Item, Description, Priority, Owner, and Notes. Also, it’s helpful to have a column for the project’s corresponding Story ID or Defect ID. Along with the whiteboard, capture key discussion points, questions, and concerns, raised by the participants.

Step 5: Document

Snap a picture of the whiteboard and document its contents. We chose Confluence to share the whiteboard results, using a table with columns for Category, Item, Description, Priority, Owner, and Notes. We also bulleted all key discussion points, questions, and concerns, raised by the participants.

operational-readiness-spreadsheet

Make the analysis results available to all team members. Let the team ask questions and poke holes in the results. Adjust the results if necessary.

Next Steps

The actions you take, given the results of the analysis, are up to you. In our case, we ensured all high-priority items were called out on the team’s Agile board. Any new items were captured in the client’s Agile project management system. Finally, we ensured open questions and concerns were addressed in a timely fashion. We continue to track each item’s status weekly, throughout the launch and post-launch periods.

We anticipate conducting a follow-up analysis, thirty days after launch. The goal will be to evaluate the effectiveness of the first analysis and identify additional operational needs as the application enters a business-as-usual (BAU) application lifecycle phase.


Postscript: the ‘ilities

The ‘ilities’, courtesy of codesqueeze.com and en.wikipedia.org

  • Scalability
    The capability of a system, network, or process to handle a growing amount of work or its potential to be enlarged to accommodate that growth.
  • Testability
    The practical feasibility of observing a reproducible series of such counterexamples if they do exist.
  • Reliability (Resilience)
    The ability of a system or component to perform its required functions under stated conditions for a specified time.
  • Usability (Performance)
    The degree to which software can be used by specified consumers to achieve quantified objectives with effectiveness, efficiency, and satisfaction in a quantified context of use.
  • Serviceability (Supportability)
    The ability of technical support personnel to install, configure, and monitor computer products, identify exceptions or faults, debug or isolate faults to root cause analysis, and provide hardware or software maintenance in pursuit of solving a problem and restoring the product into service.
  • Availability
    The proportion of time a system is in a functioning condition. Defined in the service-level agreement (SLA).
  • Maintainability
    The ease with which a product can be maintained. Isolate and correct defects or their cause, prevent unexpected breakdowns, maximize a product’s useful life, maximize efficiency, reliability, and safety, meet new requirements, make future maintenance easier, and cope with a changed environment.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , ,

Leave a comment