Archive for category DevOps

Building and Integrating LUIS-enabled Chatbots with Slack, using Azure Bot Service, Bot Builder SDK, and Cosmos DB

Introduction

In this post, we will explore the development of a machine learning-based LUIS-enabled chatbot using the Azure Bot Service and the BotBuilder SDK. We will enhance the chatbot’s functionality with Azure’s Cloud services, including Cosmos DB and Blob Storage. Once built, we will integrate our chatbot across multiple channels, including Web Chat and Slack.

If you want to compare Azure’s current chatbot technologies with those of AWS and Google, in addition to this post, please read my previous two posts in this series, Building Serverless Actions for Google Assistant with Google Cloud Functions, Cloud Datastore, and Cloud Storage and Building Asynchronous, Serverless Alexa Skills with AWS Lambda, DynamoDB, S3, and Node.js. All three of the article’s demonstrations are written in Node.js, all three leverage their cloud platform’s machine learning-based Natural Language Understanding services, and all three take advantage of NoSQL database and storage services available on their respective cloud platforms.

Technology Stack

Here is a brief overview of the key Microsoft technologies we will incorporate into our bot’s architecture.

LUIS

The machine learning-based Language Understanding Intelligent Service (LUIS) is part of Azure’s Cognitive Services, used to build Natural Language Understanding (NLU) into apps, bots, and IoT devices. According to Microsoft, LUIS allows you to quickly create enterprise-ready, custom machine learning models that continuously improve.

Designed to identify valuable information in conversations, Language Understanding interprets user goals (intents) and distills valuable information from sentences (entities), for a high quality, nuanced language model. Language Understanding integrates seamlessly with the Speech service for instant Speech-to-Intent processing, and with the Azure Bot Service, making it easy to create a sophisticated bot. A LUIS bot contains a domain-specific natural language model, which you design.

Azure Bot Service

The Azure Bot Service provides an integrated environment that is purpose-built for bot development, enabling you to build, connect, test, deploy, and manage intelligent bots, all from one place. Bot Service leverages the Bot Builder SDK.

Bot Builder SDK

The Bot Builder SDK allows you to build, connect, deploy and manage bots, which interact with users, across multiple channels, from your app or website to Facebook, Messenger, Kik, Skype, Slack, Microsoft Teams, Telegram, SMS, Twilio, Cortana, and Skype. Currently, the SDK is available for C# and Node.js. For this post, we will use the current Bot Builder Node.js SDK v3 release to write our chatbot.

Cosmos DB

According to Microsoft, Cosmos DB is a globally distributed, multi-model database-as-a-service, designed for low latency and scalable applications anywhere in the world. Cosmos DB supports multiple data models, including document, columnar, and graph. Cosmos also supports numerous database SDKs, including MongoDB, Cassandra, and Gremlin DB. We will use the MongoDB SDK to store our documents in Cosmos DB, used by our chatbot.

Azure Blob Storage

According to Microsoft, Azure’s storage-as-a-service, Blob Storage, provides massively scalable object storage for any type of unstructured data, images, videos, audio, documents, and more. We will be using Blob Storage to store publically-accessible images, used by our chatbot.

Azure Application Insights

According to Microsoft, Azure’s Application Insights provides comprehensive, actionable insights through application performance management (APM) and instant analytics. Quickly analyze application telemetry, allowing the detection of anomalies, application failure, performance changes. Application Insights will enable us to monitor our chatbot’s key metrics.

High-Level Architecture

A chatbot user interacts with the chatbot through a number of available channels, such as the Web, Slack, and Skype. The channels communicate with the Web App Bot, part of Azure Bot Service, and running on Azure’s App Service, the fully-managed platform for cloud apps. LUIS integration allows the chatbot to learn and understand the user’s intent based on our own domain-specific natural language model.

Through Azure’s App Service platform, our chatbot is able to retrieve data from Cosmos DB and images from Blob Storage. Our chatbot’s telemetry is available through Azure’s Application Insights.

Azure Chatbot Diagram

Azure Resources

Another way to understand our chatbot architecture is by examining the Azure resources necessary to build the chatbot. Below is an example of all the Azure resources that will be created as a result of building a LUIS-enabled bot, which has been integrated with Cosmos DB, Blob Storage, and Application Insights.

chatbot-10-resource-group

Chatbot Demonstration

As a demonstration, we will build an informational chatbot, the Azure Tech Facts Chatbot. The bot will respond to the user with interesting facts about Azure, Microsoft’s Cloud computing platform. Note this is not intended to be an official Microsoft bot and is only used for demonstration purposes.

Source Code

All open-sourced code for this post can be found on GitHub. The code samples in this post are displayed as GitHub Gists, which may not display correctly on some mobile and social media browsers. Links to the gists are also provided.

Development Process

This post will focus on the development and integration of a chatbot with the LUIS, Azure platform services, and channels, such as Web Chat and Slack. The post is not intended to be a general how-to article on developing Azure chatbots or the use of the Azure Cloud Platform.

Building the chatbot will involve the following steps.

  • Design the chatbot’s conversation flow;
  • Provision a Cosmos DB instance and import the Azure Facts documents;
  • Provision Azure Storage and upload the images as blobs into Azure Storage;
  • Create the new LUIS-enabled Web App Bot with Azure’s Bot Service;
  • Define the chatbot’s Intents, Entities, and Utterances with LUIS;
  • Train and publish the LUIS app;
  • Deploy and test the chatbot;
  • Integrate the chatbot with Web Chat and Slack Channels;

The post assumes you have an existing Azure account and a working knowledge of Azure. Let’s explore each step in more detail.

Cost of Azure Bots!

Be aware, you will be charged for Azure Cloud services when building this bot. Unlike an Alexa Custom Skill or an Action for Google Assistant, an Azure chatbot is not a serverless application. A common feature of serverless platforms, you only pay for the compute time you consume. There typically is no charge when your code is not running. This means, unlike AWS and Google Cloud Platform, you will pay for Azure resources you provision, whether or not you use them.

Developing this demo’s chatbot on the Azure platform, with little or no activity most of the time, cost me about $5/day. On AWS or GCP, a similar project would cost pennies per day or less (like, $0). Currently, in my opinion, Azure does not have a very competitive model for building bots, or for serverless computing in general, beyond Azure Functions, when compared to Google and AWS.

Conversational Flow

The first step in developing a chatbot is designing the conversation flow of the between the user and the bot. Defining the conversation flow is essential to developing the bot’s programmatic logic and training the domain-specific natural language model for the machine learning-based services the bot is integrated with, in this case, LUIS. What are all the ways the user might explicitly invoke our chatbot? What are all the ways the user might implicitly invoke our chatbot and provide intent to the bot? Taking the time to map out the possible conversational interactions is essential.

With more advanced bots, like Alexa, Actions for Google Assistant, and Azure Bots, we also have to consider the visual design of the conversational interfaces. In addition to simple voice and text responses, these bots are capable of responding with a rich array of UX elements, including what are generically known as ‘Cards’. Cards come in varying complexity and may contain elements such as text, title, sub-titles, text, video, audio, buttons, and links. Azure Bot Service offers several different cards for specific use cases.

Channel Design

Another layer of complexity with bots is designing for channels into which they integrate. There is a substantial visual difference in a conversational exchange displayed on Facebook Messanger, as compared to Slack, or Skype, Microsoft Teams, GroupMe, or within a web browser. Producing an effective conversational flow presentation across multiple channels a design challenge.

We will be focusing on two channels for delivery of our bot, the Web Chat and Slack channels. We will want to design the conversational flow and visual elements to be effective and engaging across both channels. The added complexity with both channels, they both have mobile and web-based interfaces. We will ensure our design works with the compact real-estate of an average-sized mobile device screen, as well as average-sized laptop’s screen.

Web Chat Channel Design

Below are two views of our chatbot, delivered through the Web Chat channel. To the left,  is an example of the bot responding with ThumbnailCard UX elements. The ThumbnailCards contain a title, sub-title, text, small image, and a button with a link. Below and to the right is an example of the bot responding with a HeroCard.  The HeroCard contains the same elements as the ThumbnailCard but takes up about twice the space with a significantly larger image.

conversational Model 3

Slack Channel Design

Below are three views of our chatbot, delivered through the Slack channel, in this case, the mobile iOS version of the Slack app. Even here on a larger iPhone 8s, there is not a lot of real estate. On the right is the same HeroCard as we saw above in the Web Chat channel. In the middle are the same ThumbnailCards. On the right is a simple text-only response. Although the text-only bot responses are not as rich as the cards, you are able to display more of the conversational flow on a single mobile screen.

Mobile Skype3

Lastly, below we see our chatbot delivered through the Slack for Mac desktop app. Within the single view, we see an example of a HeroCard (top), ThumbnailCard (center), and a text-only response (bottom). Notice how the larger UI of the desktop Slack app changes the look and feel of the chatbot conversational flow.

chatbot-51-slack-bot

In my opinion, the ThumbnailCards work well in the Web Chat channel and Slack channel’s desktop app, while the text-only responses seem to work best with the smaller footprint of the Slack channel’s mobile client. To work across a number of channels, our final bot will contain a mix of ThumbnailCards and text-only responses.

Cosmos DB

As an alternative to Microsoft’s Cognitive Service, QnA Maker, we will use Cosmos DB to house the responses to user’s requests for facts about Azure. When a user asks our informational chatbot for a fact about Azure, the bot will query Cosmos DB, passing a single unique string value, the fact the user is requesting. In response, Cosmos DB will return a JSON Document, containing field-and-value pairs with the fact’s title, image name, and textual information, as shown below.

chatbot-30-cosmos-db

There are a few ways to create the new Cosmos DB database and collection, which will hold our documents, we will use the Azure CLI. According to Microsoft, the Azure CLI 2.0 is Microsoft’s cross-platform command line interface (CLI) for managing Azure resources. You can use it in your browser with Azure Cloud Shell, or install it on macOS, Linux, or Windows, and run it from the command line. (gist).

There are a few ways for us to get our Azure facts documents into Cosmos DB. Since we are writing our chatbot in Node.js, I also chose to write a Cosmos DB facts import script in Node.js, cosmos-db-data.js. Since we are using Cosmos DB as a MongoDB datastore, all the script requires is the official MongoDB driver for Node.js. Using the MongoDB driver’s db.collection.insertMany() method, we can upload an entire array of Azure fact document objects with one call. For security, we have set the Cosmos DB connection string as an environment variable, which the script expects to find at runtime (gist).

Azure Blob Storage

When a user asks our informational chatbot for a fact about Azure, the bot will query Cosmos DB. One of the values returned is an image name. The image itself is stored on Azure Blob Storage.

chatbot-20-blob-storage

The image, actually an Azure icon available from Microsoft, is then displayed in the ThumbnailCard or HeroCard shown earlier.

chatbot-21-blob-storage

According to Microsoft, an Azure storage account provides a unique namespace in the cloud to store and access your data objects in Azure Storage. A storage account contains any blobs, files, queues, tables, and disks that you create under that account. A container organizes a set of blobs, similar to a folder in a file system. All blobs reside within a container. Similar to Cosmos DB, there are a few ways to create a new Azure Storage account and a blob storage container, which will hold our images. Once again, we will use the Azure CLI (gist).

Once the storage account and container are created using the Azure CLI, to upload the images, included with the GitHub project, by using the Azure CLI’s storage blob upload-batch command (gist).

Web App Chatbot

To create the LUIS-enabled chatbot, we can use the Azure Bot Service, available on the Azure Portal. A Web App Bot is one of a variety of bots available from Azure’s Bot Service, which is part of Azure’s larger and quickly growing suite of AI and Machine Learning Cognitive Services. A Web App Bot is an Azure Bot Service Bot deployed to an Azure App Service Web App. An App Service Web App is a fully managed platform that lets you build, deploy, and scale enterprise-grade web apps.

chatbot-84-create-chatbot.png

To create a LUIS-enabled chatbot, choose the Language Understanding Bot template, from the Node.js SDK Language options. This will provide a complete project and boilerplate bot template, written in Node.js, for you to start developing with. I chose to use the SDK v3, as v4 is still in preview and subject to change.

chatbot-80-create-chatbot

Azure Resource Manager

A great DevOps features of the Azure Platform is Azure’s ability to generate Azure Resource Manager (ARM) templates and the associated automation scripts in PowerShell, .NET, Ruby, and the CLI. This allows engineers to programmatically build and provision services on the Azure platform, without having to write the code themselves.

chatbot-83-create-chatbot

To build our chatbot, you can continue from the Azure Portal as I did, or download the ARM template and scripts, and run them locally. Once you have created the chatbot, you will have the option to download the source code as a ZIP file from the Bot Management Build console. I prefer to use the JetBrains WebStorm IDE to develop my Node.js-based bots, and GitHub to store my source code.

chatbot-14-gitflow

Application Settings

As part of developing the chatbot, you will need to add two additional application settings to the Azure App Service the chatbot is running within. The Cosmos DB connection string (COSMOS_DB_CONN_STR) and the URL of your blob storage container (ICON_STORAGE_URL) will both be referenced from within our bot, as an environment variable. You can manually add the key/value pairs from the Azure Portal (shown below), or programmatically.

chatbot-12-app-settings

The chatbot’s code, in the app.js file, is divided into three sections: Constants and Global Variables, Intent Handlers, and Helper Functions. Let’s look at each section and its functionality.

Constants

Below is the Constants used by the chatbot. I have preserved Azure’s boilerplate template comments in the app.js file. The comments are helpful in understanding the detailed inner-workings of the chatbot code (gist).

Notice that using the LUIS-enabled Language Understanding template, Azure has provisioned a LUIS.ai app and integrated it with our chatbot. More about LUIS, next.

Intent Handlers

The next part of our chatbot’s code handles intents. Our chatbot’s intents include Greeting, Help, Cancel, and AzureFacts. The Greeting intent handler defines how the bot handles greeting a new user when they make an explicit invocation of the chatbot (without intent). The Help intent handler defines how the chatbot handles a request for help. The Cancel intent handler defines how the bot handles a user’s desire to quit, or if an unknown error occurs with our bot. The AzureFact intent handler, handles implicit invocations of the chatbot (with intent), by returning the requested Azure fact. We will use LUIS to train the AzureFacts intent in the next part of this post.

Each intent handler can return a different type of response to the user. For this demo, we will have the Greeting, Help, and AzureFacts handlers return a ThumbnailCard, while the Cancel handler simply returns a text message (gist).

Helper Functions

The last part of our chatbot’s code are the helper functions the intent handlers call. The functions include a function to return a random fact if the user requests one, selectRandomFact(). There are two functions, which return a ThumbnailCard or a HeroCard, depending on the request, createHeroCard(session, botResponse) and createThumbnailCard(session, botResponse).

The buildFactResponse(factToQuery, callback) function is called by the AzureFacts intent handler. This function passes the fact from the user (i.e. certifications) and a callback to the findFact(factToQuery, callback) function. The findFact function handles calling Cosmos DB, using MongoDB Node.JS Driver’s db.collection().findOne method. The function also returns a callback (gist).

LUIS

We will use LUIS to add a perceived degree of intelligence to our chatbot, helping it understand the domain-specific natural language model of our bot. If you have built Alexa Skills or Actions for Google Assitant, LUIS apps work almost identically. The concepts of Intents, Intent Handlers, Entities, and Utterances are universal to all three platforms.

Intents are how LUIS determines what a user wants to do. LUIS will parse user utterances, understand the user’s intent, and pass that intent onto our chatbot, to be handled by the proper intent handler. The bot will then respond accordingly to that intent — with a greeting, with the requested fact, or by providing help.

chatbot-01-intents

Entities

LUIS describes an entity as being like a variable, used to capture and pass important information. We will start by defining our AzureFacts Intent’s Facts Entities. The Facts entities represent each type of fact a user might request. The requested fact is extracted from the user’s utterances and passed to the chatbot. LUIS allows us to import entities as JSON. I have included a set of Facts entities to import, in the azure-facts-entities.json file, within the project on GitHub (gist).

Each entity includes a primary canonical form, as well as possible synonyms the user may utter in their invocation. If the user utters a synonym, LUIS will understand the intent and pass the canonical form of the entity to the chatbot. For example, if we said ‘tell me about Azure AKS,’ LUIS understands the phrasing, identifies the correct intent, AzureFacts intent, substitutes the synonym, ‘AKS’, with the canonical form of the Facts entity, ‘kubernetes’, and passes the top scoring intent to be handled and the value ‘kubernetes’ to our bot. The bot would then query for the document associated with ‘kubernetes’ in Cosmos DB, and return a response.

chatbot-04-entities

Utterances

Once we have created and associated our Facts entities with our AzureFacts intent, we need to input a few possible phrases a user may utter to invoke our chatbot. Microsoft actually recommends not coding too many utterance examples, as part of their best practices. Below you see an elementary list of possible phrases associated with the AzureFacts intent. You also see the blue highlighted word, ‘Facts’ in each utterance, a reference to the Facts entities. LUIS understands that this position in the phrasing represents a Facts entity value.

chatbot-02-azure-intent

Patterns

Patterns, according to Microsoft, are designed to improve the accuracy of LUIS, when several utterances are very similar. By providing a pattern for the utterances, LUIS can have higher confidence in the predictions.

chatbot-03-patterns

The topic of training your LUIS app is well beyond the scope of this post. Microsoft has an excellent series of articles, I strongly suggest reading. They will greatly assist in improving the accuracy of your LUIS chatbot.

chatbot-05-training

Once you have completed building and training your intents, entities, phrases, and other items, you must publish your LUIS app for it to be accessed from your chatbot. Publish allows LUIS to be queried from an HTTP endpoint. The LUIS interface will enable you to publish both a Staging and a Production copy of the app. For brevity, I published directly to Production. If you recall our chatbot’s application settings, earlier, the settings include a luisAppIdluisAppKey, and a luisAppIdHostName. Together these form the HTTP endpoint, LuisModelUrl, through which the chatbot will access the LUIS app.

chatbot-06-publish

Using the LUIS API endpoint, we can test our LUIS app, independent of our chatbot. This can be very useful for troubleshooting bot issues. Below, we see an example utterance of ‘tell me about Azure Functions.’ LUIS has correctly deduced the user intent, assigning the AzureFacts intent with the highest prediction score. LUIS also identified the correct Entity, ‘functions,’ which it would typically return to the chatbot.

chatbot-07-luis-endpoint.png

Deployment

With our chatbot developed and our LUIS app built, trained, and published, we can deploy our bot to the Azure platform. There are a few ways to deploy our chatbot, depending on your platform and language choice. I chose to use the DevOps practice of Continuous Deployment, offered in the Azure Bot Management console.

chatbot-14-gitflow

With Continuous Deployment, every time we commit a change to GitHub, a webhook fires, and my chatbot is deployed to the Azure platform. If you have high confidence in your changes through testing, you could choose to commit and deploy directly.

chatbot-08-publish.png

Alternately, you might choose a safer approach, using feature branch or PR requests. In which case your chatbot will be deployed upon a successful merge of the feature branch or PR request to master.

Manual Testing

Azure provides the ability to test your bot, from the Azure portal. Using the Bot Management Test in Web Chat console, you can test your bot using the Web Chat channel. We will talk about different kinds of channel later in the post. This is an easy and quick way to manually test your chatbot.

chatbot-17-testing

For more sophisticated, automated testing of your chatbot, there are a handful of frameworks available, including bot-tester, which integrates with mocha and chai. Stuart Harrison published an excellent article on testing with the bot-tester framework, Building a test-driven chatbot for the Microsoft Bot Framework in Node.js.

Log Streaming

As part of testing and troubleshooting our chatbot in Production, we will need to review application logs occasionally. Azure offers their Log streaming feature. To access log streams, you must turn on application logging and chose a log level, in the Azure Monitoring Diagnostic logs console, which is off by default.

chatbot-18-log-stream.png

Once Application logging is active, you may review logs in the Monitoring Log stream console. Log streaming will automatically be turned off in 12 hours and times our after 30 minutes of inactivity. I personally find application logging and access to logs, more difficult and far less intuitive on the Azure platform, than on AWS or Google Cloud.

chatbot-13-log-stream-debugging

Metrics

As part of testing and eventually monitoring our chatbot in Production, the Azure App Service Overview console provides basic telemetry about the App Service, on which the bot is running. With Azure Application Insights, we can drill down into finer-grain application telemetry.

chatbot-11-app-service-overview

Web Integration

With our chatbot built, trained, deployed and tested, we can integrate it with multiple channels. Channels represent all how our users might interact with our chatbot, Web Chat, Slack, Skype, Facebook Messager, and so forth. Microsoft describes channels as a connection between the Bot Framework and communication apps. For this post, we will look at two channels, Web Chat and Slack.

chatbot-40-channels

Enabling Web Chat is probably the easiest of the channels. When you create a bot with Bot Service, the Web Chat channel is automatically configured for you. You used it to test your bot, earlier in the post. Displaying your chatbot through the Web Chat channel, embedded in your website, requires a secret, for which, you have two options. Option one is to keep your secret hidden, exchange your secret for a token, and generate the secret. Option two, according to Microsoft, is to embed the web chat control in your website using the secret. This method will allow other developers to easily embed your bot into their websites.

chatbot-61-web.png

Embedding Web Chat in your website allows your website users to interact directly with your chatbot. Shown below, I have embedded our chatbot’s Web Chat channel in my website. It will enable a user to interact with the bot, independent of the website’s primary content. Here, a user could ask for more information on a particular topic they found of interest in the article, such as Azure Kubernetes Service (AKS). The Web Chat window is collapsible when not in use.

chatbot-60-web

The Web Chat is embedded using an HTML iframe tag. The HTML code to embed the Web Chat channel is included in the Web Chat Channel Configuration console, shown above. I found an effective piece of JavaScript code by Anthony Cu, in his post, Embed the Bot Framework Web Chat as a Collapsible Window. I modified Anthony’s code to fit my own website, moving the collapsable Web Chat iframe to the bottom right corner and adjusting the dimensions of the frame, as not to intrude on the site’s central content area. I’ve included the code and a simulated HTML page in the GitHub project.

Slack Integration

To integrate your chatbot with the Slack channel, I will assume you have an existing Slack Workspace with sufficient admin rights to create and deploy Slack Apps to that Workspace.

To integrate our chatbot with Slack, we need to create a new Bot App for Slack. The Slack App will be configured to interact with our deployed bot on Azure securely. Microsoft has an excellent set of easy-to-follow documentation on connecting a bot to Slack. I am only going to summarize the steps here.

chatbot-53-slack-bot.png

Once your Slack App is created, the Slack App contains a set of credentials.

chatbot-49-slack-app.png

Those Slack App credentials are then shared with the chatbot, in the Azure Slack Channel integration console. This ensures secure communication between your chatbot and your Slack App.

chatbot-48-slack-app

Part of creating your Slack App, is configuring Event Subscriptions. The Microsoft documentation outlines six bot events that need to be configured. By subscribing to bot events, your app will be notified of user activities at the URL you specify.

chatbot-46-slack-app

You also configure a Bot User. Adding a Bot User allows you to assign a username for your bot and choose whether it is always shown as online. This is the username you will see within your Slack app.

chatbot-43-slack-app

Once the Slack App is built, credentials are exchanged, events are configured, and the Bot User is created, you finish by authorizing your Slack App to interact with users within your Slack Workspace. Below I am authorizing the Azure Tech Facts Bot Slack App to interact with users in my Programmatic Ponderings Slack Workspace.

chatbot-45-slack-app.png

Below we see the Azure Tech Facts Bot Slack App deployed to my Programmatic Ponderings Slack Workspace. The Slack App is shown in the Slack for Mac desktop app.

chatbot-50-slack-bot

Similarly, we see the same Azure Tech Facts Bot Slack App being used in the Slack iOS App.

Mobile Skype3.png

Conclusion

In this brief post, we saw how to develop a Machine learning-based LUIS-enabled chatbot using the Azure Bot Service and the BotBuilder SDK. We enhanced the bot’s functionality with two of Azure’s Cloud services, Cosmos DB and Blob Storage. Once built, we integrated our chatbot with the Web Chat and Slack channels.

This was a very simple demonstration of a LUIS chatbot. The true power of intelligent bots comes from further integrating bots with Azure AI and Machine Learning Services, such as Azure’s Cognitive Services. Additionally, Azure cloud platform offers other more traditional cloud services, in addition to Cosmos DB and Blob Storage, to extend the feature and functionally of bots, such as messaging, IoT, Azure serverless Functions, and AKS-based microservices.

Azure is a trademark of Microsoft
The image in the background of Azure icon copyright: kran77 / 123RF Stock Photo

All opinions expressed in this post are my own and not necessarily the views of my current or past employers, their clients, or Microsoft.

, , , , , , , , ,

3 Comments

Deploying Spring Boot Apps to AWS with Netflix Nebula and Spinnaker: Part 2 of 2

In Part One of this post, we examined enterprise deployment tools and introduced two of Netflix’s open-source deployment tools, the Nebula Gradle plugins, and Spinnaker. In Part Two, we will deploy a production-ready Spring Boot application, the Election microservice, to multiple Amazon EC2 instances, behind an Elastic Load Balancer (ELB). We will use a fully automated DevOps workflow. The build, test, package, bake, deploy process will be handled by the Netflix Nebula Gradle Linux Packaging Plugin, Jenkins, and Spinnaker. The high-level process will involve the following steps:

  • Configure Gradle to build a production-ready fully executable application for Unix systems (executable JAR)
  • Using deb-s3 and GPG Suite, create a secure, signed APT (Debian) repository on Amazon S3
  • Using Jenkins and the Netflix Nebula plugin, build a Debian package, containing the executable JAR and configuration files
  • Using Jenkins and deb-s3, publish the package to the S3-based APT repository
  • Using Spinnaker (HashiCorp Packer under the covers), bake an Ubuntu Amazon Machine Image (AMI), replete with the executable JAR installed from the Debian package
  • Deploy an auto-scaling set of Amazon EC2 instances from the baked AMI, behind an ELB, running the Spring Boot application using both the Red/Black and Highlander deployment strategies
  • Be able to repeat the entire automated build, test, package, bake, deploy process, triggered by a new code push to GitHub

The overall build, test, package, bake, deploy process will look as follows.

DebianPackageWorkflow12.png

DevOps Architecture

Spinnaker’s modern architecture is comprised of several independent microservices. The codebase is written in Java and Groovy. It leverages the Spring Boot framework¹. Spinnaker’s configuration, startup, updates, and rollbacks are centrally managed by Halyard. Halyard provides a single point of contact for command line interaction with Spinnaker’s microservices.

Spinnaker can be installed on most private or public infrastructure, either containerized or virtualized. Spinnaker has links to a number of Quickstart installations on their website. For this demonstration, I deployed and configured Spinnaker on Azure, starting with one of the Azure Spinnaker quick-start ARM templates. The template provisions all the necessary Azure resources. For better performance, I chose upgraded the default VM to a larger Standard D4 v3, which contains 4 vCPUs and 16 GB of memory. I would recommend at least 2 vCPUs and 8 GB of memory at a minimum for Spinnaker.

Another Azure VM, in the same virtual network as the Spinnaker VM, already hosts Jenkins, SonarQube, and Nexus Repository OSS.

From Spinnaker on Azure, Debian Packages are uploaded to the APT package repository on AWS S3. Spinnaker also bakes Amazon Machine Images (AMI) on AWS. Spinnaker provisions the AWS resources, including EC2 instances, Load Balancers, Auto Scaling Groups, Launch Configurations, and Security Groups. The only resources you need on AWS to get started with Spinnaker are a VPC and Subnets. There are some minor, yet critical prerequisites for naming your VPC and Subnets.

Other external tools include GitHub for source control and Slack for notifications. I have built and managed everything from a Mac, however, all tools are platform agnostic. The Spring Boot application was developed in JetBrains IntelliJ.

Spinnaker Architecture 2.png

Source Code

All source code for this post can be found on GitHub. The project’s README file contains a list of the Election service’s endpoints.

Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.

APT Repository

After setting up Spinnaker on Azure, I created an APT repository on Amazon S3, using the instructions provided by Netflix, in their Code Lab, An Introduction to Spinnaker: Hello Deployment. The setup involves creating an Amazon S3 bucket to serve as an APT (Debian) repository, creating a GPG key for signing, and using deb-s3 to manage the repository. The Code Lab also uses Aptly, a great tool, which I skipped for brevity.

spin19

GPG Key

On the Mac, I used GPG Suite to create a GPG (GNU Privacy Guard or GnuPG) automatic signing key for my APT repository. The key is required by Spinnaker to verify the Debian packages in the repository, before installation.

The Ruby Gem, deb-s3, makes management of the Debian packages easy and automatable with Jenkins. Jenkins uploads the Debian packages, using a deb-s3 command, such as the following (gist). In this post, Jenkins calls the command from the shell script, upload-deb-package.sh, which is included in the GitHub project.

The Jenkins user requires access to the signing key, to build and upload the Debian packages. I created my GPG key on my Mac, securely copied the key to my Ubuntu-based Jenkins VM, and then imported the key for the Jenkins user. You could also create your key on Ubuntu, directly. Make sure you backup your private key in a secure location!

Nebula Packaging Plugin

Next, I set up a Gradle task in my build.gradle file to build my Debian packages using the Netflix Nebula Gradle Linux Packaging Plugin. Although Debian packaging tasks could become complex for larger application installations, this task for this post is pretty simple. I used many of the best-practices suggested by Spring for Production-grade deployments. The best-practices guide recommends file location, file modes, and file user and group ownership. I create the JAR as a fully executable JAR, meaning it is started like any other executable and does not have to be started with the standard java -jar command.

In the task, shown below (gist), the JAR and the external configuration file (optional) are copied to specific locations during the deployment and symlinked, as required. I used the older SysVInit system (init.d) to enable the application to automatically starts on boot. You should probably use systemctl for your services with Ubuntu 16.04.

You can use the ar (archive) command (i.e., ar -x spring-postgresql-demo_4.5.0_all.deb), to extract and inspect the structure of a Debian package. The data.tar.gz file, displayed below in Atom, shows the final package structure.

spin47.png

Base AMI

Next, I baked a base AMI for Spinnaker to use. This base AMI is used by Spinnaker to bake (re-bake) the final AMI(s) used for provisioning the EC2 instances, containing the Spring Boot Application. The Spinnaker base AMI is built from another base AMI, the official Ubuntu 16.04 LTS image. I installed the OpenJDK 8 package on the AMI, which is required to run the Java-based Election service. Lastly and critically, I added information about the location of my S3-based APT Debian package repository to the list of configured APT data sources, and the GPG key required for package verification. This information and key will be used later by Spinnaker to bake AMIʼs, using this base AMI. The set-up script, base_ubuntu_ami_setup.sh, which is included in the GitHub project.

Jenkins

This post uses a single Jenkins CI/CD pipeline. Using a Webhook, the pipeline is automatically triggered by every git push to the GitHub project. The pipeline pulls the source code, builds the application, and performs unit-tests and static code analysis with SonarQube. If the build succeeds and the tests pass, the build artifact (JAR file) is bundled into a Debian package using the Nebula Packaging plugin, uploaded to the S3 APT repository using s3-deb, and archived locally for Spinnaker to reference. Once the pipeline is completed, on success or on failure, a Slack notification is sent. The Jenkinsfile, used for this post is available in the project on Github.

Below is a traditional Jenkins view of the CI/CD pipeline, with links to unit test reports, SonarQube results, build artifacts, and GitHub source code.

spin01

Below is the same pipeline viewed using the Jenkins Blue Ocean plugin.

spin02

It is important to perform sufficient testing before building the Debian package. You donʼt want to bake an AMI and deploy EC2 instances, at a cost, before finding out the application has bugs.

spin03

Spinnaker Setup

First, I set up a new Spinnaker Slack channel and a custom bot user. Spinnaker details the Slack set up in their Notifications and Events Guide. You can configure what type of Spinnaker events trigger Slack notifications.

spin46.png

AWS Spinnaker User

Next, I added the required Spinnaker User, Policy, and Roles to AWS. Spinnaker uses this access to query and provision infrastructure on your behalf. The Spinnaker User requires Power User level access to perform all their necessary tasks. AWS IAM set up is detailed by Spinnaker in their Cloud Providers Setup for AWS. They also describe the setup of other cloud providers. You need to be reasonably familiar with AWS IAM, including the PassRole permission to set up this part. As part of the setup, you enable AWS for Spinnaker and add your AWS account using the Halyard interface.

spin45

Spinnaker Security Groups

Next, I set up two Spinnaker Security Groups, corresponding to two AWS Security Groups, one for the load balancer and one for the Election service. The load balancer security group exposes port 80, and the Election service security group exposes port 8080.

spin36

Spinnaker Load Balancer

Next, I created a Spinnaker Load Balancer, corresponding to an Amazon Classic Load Balancer. The Load Balancer will load-balance the Election service EC2 instances. Below you see a Load Balancer, balancing a pair of active EC2 instances, the result of a Red/Black deployment.

spin37

Spinnaker can currently create both AWS Classic Load Balancers as well as Application Load Balancers (ALB).

spin25

Spinnaker Pipeline

This post uses a single, basic Spinnaker Pipeline. The pipeline bakes a new AMI from the Debian package generated by the Jenkins pipeline. After a manual approval stage, Spinnaker deploys a set of EC2 instances, behind the Load Balancer, which contains the latest version of the Election service. Spinnaker finishes the pipeline by sending a Slack notification.

spin26

Jenkins Integration

The pipeline is triggered by the successful completion of the Jenkins pipeline. This is set in the Configuration stage of the pipeline. The integration with Jenkins is managed through Spinnaker’s Igor service.

spin22.png

Bake Stage

Next, in the Bake stage, Spinnaker bakes a new AMI, containing the Debian package generated by the Jenkins pipeline. The stageʼs configuration contains the package name to reference.

spin29

The stageʼs configuration also includes a reference to which Base AMI to use, to bake the new AMIs. Here I have used the AMI ID of the base Spinnaker AMI, I created previously.

spin27

Deploy Stage

Next, the Deploy stage deploys the Election service, running on EC2 instances, provisioned from the new AMI, which was baked in the last stage. To configure the Deploy stage, you define a Spinnaker Server Group. According to Spinnaker, the Server Group identifies the deployable artifact, VM image type, the number of instances, autoscaling policies, metadata, Load Balancer, and a Security Group.

spin32

The Server Group also defines the Deployment Strategy. Below, I chose the Red/Black Deployment Strategy (also referred to as Blue/Green). This strategy will disable, not terminate the active Server Group. If the new deployment fails, we can manually or automatically perform a Rollback to the previous, currently disabled Server Group.

spin11

Letʼs Start Baking!

With set up complete, letʼs kick off a git push, trigger and complete the Jenkins pipeline, and finally trigger the Spinnaker pipeline. Below we see the pipelineʼs Bake stage has been started. Spinnakerʼs UI lets us view the Bakery Details. The Bakery, provided by Spinnakerʼs Rosco service, bakes the AMIs. Rosco uses HashiCorp Packer to bake the AMIs, using standard Packer templates.

spin04

Below we see Spinnaker (Rosco/Packer) locating the Base Spinnaker AMI we configured in the Pipelineʼs Bake stage. Next, we see Spinnaker sshʼing into a new EC2 instance with a temporary keypair and Security Group and starting the Election service Debian package installation.

spin23

Continuing, we see the latest Debian package, derived from the Jenkins pipelineʼs archive, being pulled from the S3-based APT repo. The package is verified using the GPG key and then installed. Lastly, we see a new AMI is created, containing the deployed Election service, which was initially built and packaged by Jenkins. Note the AWS Resource Tags created by Spinnaker, as shown in the Bakery output.

spin24

The base Spinnaker AMI and the AMIs baked by Spinnaker are visible in the AWS Console. Note the naming conventions used by Spinnaker for the AMIs, the Source AMI used to build the new APIs, and the addition of the Tags, which we saw being applied in the Bakery output above. The use of Tags indirectly allows full traceability from the deployed EC2 instance all the way back to the original code commit to git by the Developer.

spin48.png

Red/Black Deployments

With the new AMI baked successfully, and a required manual approval, using a Manual Judgement type pipeline stage, we can now begin a Red/Black deployment to AWS.

spin07

Using the Server Group configuration in the Deploy stage, Spinnaker deploys two EC2 instances, behind the ELB.

spin08

Below, we see the successful results of the Red/Black deployment. The single Spinnaker Cluster contains two deployed Server Groups. One group, the previously active Server Group (RED), comprised of two EC2 instances, is disabled. The ‘RED’ EC2 instances are unregistered with the load balancer but still running. The new Server Group (BLACK), also comprised of two EC2 instances, is now active and registered with the Load Balancer. Spinnaker will spread EC2 instances evenly across all Availability Zones in the US East (N. Virginia) Region.

spin38

From the AWS Console, we can observe four running instances, though only two are registered with the load-balancer.

spin34

Here we see each deployed Server Group has a different Auto Scaling Group and Launch Configuration. Note the continued use of naming conventions by Spinnaker.

spin33

 There can be only one, Highlander!

Now, in the Deploy stage of the pipeline, we will switch the Server Groupʼs Strategy to Highlander. The Highlander strategy will, as you probably guessed by the name, destroy all other Server Groups in the Cluster. This is more typically used for lower environments, like Development or Test, where you are only interested in the next version of the application for testing. The Red/Black strategy is more applicable to Production, where you want the opportunity to quickly rollback to the previous deployment, if necessary.

spin12

Following a successful deployment, below, we now see the first two Server Groups have been terminated, and a third Server Group in the Cluster is active.

spin40.png

In the AWS Console, we can confirm the four previous EC2 instances have been successfully terminated as a result of the Highlander deployment strategy, and two new instances are running.

spin39

As well, the previous Auto Scaling Groups and Launch Configurations have been deleted from AWS by Spinnaker.

spin44.png

As expected, the Classic Load Balancer only contains the two most recent EC2 instances from the last Server Group deployed.

spin41

Confirming the Deployment

Using the DNS address of the load balancer, we can hit the Election service endpoints, on either of the EC2 instances. All API endpoints are listed in the Projectʼs README file. Below, from a web browser, we see the candidates resource returning candidate information, retrieved from the Electionʼs PostgreSQL RDS database Test instance.

spin42

Similarly, from Postman, we can hit the load balancer and get back election information from the elections resource, using an HTTP GET.

spin43.png

I intentionally left out a discussion of the service’s RDS database and how configuration management was handled with Spring Profiles and Spring Cloud Config. Both topics were out of scope for this post.

Conclusion

Although this was a brief, whirlwind overview of deployment tools, it shows the power of delivery tools like Spinnaker, when seamlessly combined with other tools, like Jenkins and the Nebula plugins. Together, these tools are capable of efficiently, repeatably, and securely deploying large numbers of containerized and non-containerized applications to a variety of private, public, and hybrid cloud infrastructure.

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

¹ Running Spinnaker on Compute Engine

, , , , , , , , , , , ,

1 Comment

Deploying Spring Boot Apps to AWS with Netflix Nebula and Spinnaker: Part 1 of 2

Listening to DevOps industry pundits, you might be convinced everyone is running containers in Production (or by now, serverless). Although containerization is growing at a phenomenal rate, several recent surveys¹ indicate less than 50% of enterprises are deploying containers in Production. Filter those results further with the fact, of those enterprises, only a small percentage of their total application portfolios are containerized, let alone in Production.

As a DevOps Consultant, I regularly work with corporations whose global portfolios are in the thousands of applications. Indeed, some percentage of their applications are containerized, with less running in Production. However, a majority of those applications, even those built on modern, light-weight, distributed architectures, are still being deployed to bare-metal and virtualized public cloud and private data center infrastructure, for a variety of reasons.

Enterprise Deployment

Due to the scale and complexity of application portfolios, many organizations have invested in enterprise deployment tools, either commercially available or developed in-house. The enterprise deployment tool’s primary objective is to standardize the process of securely, reliably, and repeatably packaging, publishing, and deploying both containerized and non-containerized applications to large fleets of virtual machines and bare-metal servers, across multiple, geographically dispersed data centers and cloud providers. Enterprise deployment tools are particularly common in tightly regulated and compliance-driven organizations, as well as organizations that have undertaken large amounts of M&A, resulting in vastly different application technology stacks.

Enterprise CI/CD/Release Workflow

Better-known examples of commercially available enterprise deployment tools include IBM UrbanCode Deploy (aka uDeploy), XebiaLabs XL Deploy, CA Automic Release Automation, Octopus Deploy, and Electric Cloud ElectricFlow. While commercial tools continue to gain market share³, many organizations are tightly coupled to their in-house solutions through years of use and fear of widespread process disruption, given current economic, security, compliance, and skills-gap sensitivities.

Deployment Tool Anatomy

Most Enterprise deployment tools are compatible with standard binary package types, including Debian (.deb) and Red Hat  (RPM) Package Manager (.rpm) packages for Linux, NuGet (.nupkg) packages for Windows, and Node Package Manager (.npm) and Bower for JavaScript. There are equivalent package types for other popular languages and formats, such as Go, Python, Ruby, SQL, Android, Objective-C, Swift, and Docker. Packages usually contain application metadata, a signature to ensure the integrity and/or authenticity², and a compressed payload.

Enterprise deployment tools are normally integrated with open-source packaging and publishing tools, such as Apache Maven, Apache Ivy/Ant, Gradle, NPMNuGet, BundlerPIP, and Docker.

Binary packages (and images), built with enterprise deployment tools, are typically stored in private, open-source or commercial binary (artifact) repositories, such as SpacewalkJFrog Artifactory, and Sonatype Nexus Repository. The latter two, Artifactory and Nexus, support a multitude of modern package types and repository structures, including Maven, NuGet, PyPI, NPM, Bower, Ruby Gems, CocoaPods, Puppet, Chef, and Docker.

Mature binary repositories provide many features in addition to package management, including role-based access control, vulnerability scanning, rich APIs, DevOps integration, and fault-tolerant, high-availability architectures.

Lastly, enterprise deployment tools generally rely on standard package management systems to retrieve and install cryptographically verifiable packages and images. These include YUM (Yellowdog Updater, Modified), APT (aptitude), APK (Alpine Linux), NuGet, Chocolatey, NPM, PIP, Bundler, and Docker. Packages are deployed directly to running infrastructure, or indirectly to intermediate deployable components as Amazon Machine Images (AMI), Google Compute Engine machine images, VMware machines, Docker Images, or CoreOS rkt.

Open-Source Alternative

One such enterprise with an extensive portfolio of both containerized and non-containerized applications is Netflix. To standardize their deployments to multiple types of cloud infrastructure, Netflix has developed several well-known open-source software (OSS) tools, including the Nebula Gradle plugins and Spinnaker. I discussed Spinnaker in my previous post, Managing Applications Across Multiple Kubernetes Environments with Istio, as an alternative to Jenkins for deploying container workloads to Kubernetes on Google (GKE).

As a leader in OSS, Netflix has documented their deployment process in several articles and presentations, including a post from 2016, ‘How We Build Code at Netflix.’ According to the article, the high-level process for deployment to Amazon EC2 instances involves the following steps:

  • Code is built and tested locally using Nebula
  • Changes are committed to a central git repository
  • Jenkins job executes Nebula, which builds, tests, and packages the application for deployment
  • Builds are “baked” into Amazon Machine Images (using Spinnaker)
  • Spinnaker pipelines are used to deploy and promote the code change

The Nebula plugins and Spinnaker leverage many underlying, open-source technologies, including Pivotal Spring, Java, Groovy, Gradle, Maven, Apache Commons, Redline RPM, HashiCorp Packer, Redis, HashiCorp Consul, Cassandra, and Apache Thrift.

Both the Nebula plugins and Spinnaker have been battle tested in Production by Netflix, as well as by many other industry leaders after Netflix open-sourced the tools in 2014 (Nebula) and 2015 (Spinnaker). Currently, there are approximately 20 Nebula Gradle plugins available on GitHub. Notable core-contributors in the development of Spinnaker include Google, Microsoft, Pivotal, Target, Veritas, and Oracle, to name a few. A sign of its success, Spinnaker currently has over 4,600 Stars on GitHub!

Part Two: Demonstration

In Part Two, we will deploy a production-ready Spring Boot application, the Election microservice, to multiple Amazon EC2 instances, behind an Elastic Load Balancer (ELB). We will use a fully automated DevOps workflow. The build, test, package, bake, deploy process will be handled by the Netflix Nebula Gradle Linux Packaging Plugin, Jenkins, and Spinnaker. The high-level process will involve the following steps:

  • Configure Gradle to build a production-ready fully executable application for Unix systems (executable JAR)
  • Using deb-s3 and GPG Suite, create a secure, signed APT (Debian) repository on Amazon S3
  • Using Jenkins and the Netflix Nebula plugin, build a Debian package, containing the executable JAR and configuration files
  • Using Jenkins and deb-s3, publish the package to the S3-based APT repository
  • Using Spinnaker (HashiCorp Packer under the covers), bake an Ubuntu Amazon Machine Image (AMI), replete with the executable JAR installed from the Debian package
  • Deploy an auto-scaling set of Amazon EC2 instances from the baked AMI, behind an ELB, running the Spring Boot application using both the Red/Black and Highlander deployment strategies
  • Be able to repeat the entire automated build, test, package, bake, deploy process, triggered by a new code push to GitHub

The overall build, test, package, bake, deploy process will look as follows.

DebianPackageWorkflow12

References

 

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

¹ Recent Surveys: ForresterPortworx,  Cloud Foundry Survey
² Courtesy Wikipedia – rpm
³ XebiaLabs Kicks Off 2017 with Triple-Digit Growth in Enterprise DevOps

, , , , , , , , , , , ,

1 Comment

Updating and Maintaining Gradle Project Dependencies

As a DevOps Consultant, I often encounter codebases that have not been properly kept up-to-date. Likewise, I’ve authored many open-source projects on GitHub, which I use for training, presentations, and articles. Those projects often sit dormant for months at a time, #myabandonware.

Poorly maintained and dormant projects often become brittle or break, as their dependencies and indirect dependencies continue to be updated. However, blindly updating project dependencies is often the quickest way to break, or further break an application. Ask me, I’ve given in to temptation and broken my fair share of applications as a result. Nonetheless, it is helpful to be able to quickly analyze a project’s dependencies and discover available updates. Defects, performance issues, and most importantly, security vulnerabilities, are often fixed with dependency updates.

For Node.js projects, I prefer David to discover dependency updates. I have other favorites for Ruby, .NET, and Python, including OWASP Dependency-Check, great for vulnerabilities. In a similar vein, for Gradle-based Java Spring projects, I recently discovered Ben Manes’ Gradle Versions Plugin, gradle-versions-plugin. The plugin is described as a ‘Gradle plugin to discover dependency updates’. The plugin’s GitHub project has over 1,350 stars! According to the plugin project’s README file, this plugin is similar to the Versions Maven Plugin. The project further indicates there are similar Gradle plugins available, including gradle-use-latest-versionsgradle-libraries-plugin, and gradle-update-notifier.

To try the Gradle Versions Plugin, I chose a recent Gradle-based Java Spring Boot API project. I added the plugin to the gradle.build file with a single line of code.

plugins {
  id 'com.github.ben-manes.versions' version '0.17.0'
}

By executing the single Gradle task, dependencyUpdates, the plugin generates a report detailing the status of all project’s dependencies, including plugins. The plugin includes a revision task property, which controls the resolution strategy of determining what constitutes the latest version of a dependency. The property supports three strategies: release, milestone (default), and integration (i.e. SNAPSHOT), which are detailed in the plugin project’s README file.

As expected, the plugin will properly resolve any variables. Using a variable is an efficient practice for setting the Spring Boot versions for multiple dependencies (i.e. springBootVersion).

ext {
    springBootVersion = '2.0.1.RELEASE'
}

dependencies {
    compile('com.h2database:h2:1.4.197')
    compile("io.springfox:springfox-swagger-ui:2.8.0")
    compile("io.springfox:springfox-swagger2:2.8.0")
    compile("org.liquibase:liquibase-core:3.5.5")
    compile("org.sonarsource.scanner.gradle:sonarqube-gradle-plugin:2.6.2")
    compile("org.springframework.boot:spring-boot-starter-actuator:${springBootVersion}")
    compile("org.springframework.boot:spring-boot-starter-data-jpa:${springBootVersion}")
    compile("org.springframework.boot:spring-boot-starter-data-rest:${springBootVersion}")
    compile("org.springframework.boot:spring-boot-starter-hateoas:${springBootVersion}")
    compile("org.springframework.boot:spring-boot-starter-web:${springBootVersion}")
    compileOnly('org.projectlombok:lombok:1.16.20')
    runtime("org.postgresql:postgresql:42.2.2")
    testCompile("org.springframework.boot:spring-boot-starter-test:${springBootVersion}")
}

My first run, using the default revision level, resulted in the following output. The report indicated three of my project’s dependencies were slightly out of date:

> Configure project :
Inferred project: spring-postgresql-demo, version: 4.3.0-dev.2.uncommitted+929c56e

> Task :dependencyUpdates
Failed to resolve ::apiElements
Failed to resolve ::implementation
Failed to resolve ::runtimeElements
Failed to resolve ::runtimeOnly
Failed to resolve ::testImplementation
Failed to resolve ::testRuntimeOnly

------------------------------------------------------------
: Project Dependency Updates (report to plain text file)
------------------------------------------------------------

The following dependencies are using the latest milestone version:
- com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin:0.17.0
- com.netflix.nebula:gradle-ospackage-plugin:4.9.0-rc.1
- com.h2database:h2:1.4.197
- io.spring.dependency-management:io.spring.dependency-management.gradle.plugin:1.0.5.RELEASE
- org.projectlombok:lombok:1.16.20
- com.netflix.nebula:nebula-release-plugin:6.3.3
- org.sonarqube:org.sonarqube.gradle.plugin:2.6.2
- org.springframework.boot:org.springframework.boot.gradle.plugin:2.0.1.RELEASE
- org.postgresql:postgresql:42.2.2
- org.sonarsource.scanner.gradle:sonarqube-gradle-plugin:2.6.2
- org.springframework.boot:spring-boot-starter-actuator:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-data-jpa:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-data-rest:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-hateoas:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-test:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-web:2.0.1.RELEASE

The following dependencies have later milestone versions:
- org.liquibase:liquibase-core [3.5.5 -> 3.6.1]
- io.springfox:springfox-swagger-ui [2.8.0 -> 2.9.0]
- io.springfox:springfox-swagger2 [2.8.0 -> 2.9.0]

Generated report file build/dependencyUpdates/report.txt

After reading the release notes for the three available updates, and confident I had sufficient unit, smoke, and integration tests to validate any project changes, I manually updated the dependencies. Re-running the Gradle task generated the following abridged output.

------------------------------------------------------------
: Project Dependency Updates (report to plain text file)
------------------------------------------------------------

The following dependencies are using the latest milestone version:
- com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin:0.17.0
- com.netflix.nebula:gradle-ospackage-plugin:4.9.0-rc.1
- com.h2database:h2:1.4.197
- io.spring.dependency-management:io.spring.dependency-management.gradle.plugin:1.0.5.RELEASE
- org.liquibase:liquibase-core:3.6.1
- org.projectlombok:lombok:1.16.20
- com.netflix.nebula:nebula-release-plugin:6.3.3
- org.sonarqube:org.sonarqube.gradle.plugin:2.6.2
- org.springframework.boot:org.springframework.boot.gradle.plugin:2.0.1.RELEASE
- org.postgresql:postgresql:42.2.2
- org.sonarsource.scanner.gradle:sonarqube-gradle-plugin:2.6.2
- org.springframework.boot:spring-boot-starter-actuator:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-data-jpa:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-data-rest:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-hateoas:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-test:2.0.1.RELEASE
- org.springframework.boot:spring-boot-starter-web:2.0.1.RELEASE
- io.springfox:springfox-swagger-ui:2.9.0
- io.springfox:springfox-swagger2:2.9.0

Generated report file build/dependencyUpdates/report.txt

BUILD SUCCESSFUL in 3s
1 actionable task: 1 executed

After running a series of automated unit, smoke, and integration tests, to confirm no conflicts with the updates, I committed my changes to GitHub. The Gradle Versions Plugin is a simple and effective solution to Gradle dependency management.

All opinions expressed in this post are my own, and not necessarily the views of my current or past employers, or their clients.

Gradle logo courtesy Gradle.org, © Gradle Inc. 

, , , , , , ,

Leave a comment

Managing Applications Across Multiple Kubernetes Environments with Istio: Part 2

In this two-part post, we are exploring the creation of a GKE cluster, replete with the latest version of Istio, often referred to as IoK (Istio on Kubernetes). We will then deploy, perform integration testing, and promote an application across multiple environments within the cluster.

Part Two

In Part One of this post, we created a Kubernetes cluster on the Google Cloud Platform, installed Istio, provisioned a PostgreSQL database, and configured DNS for routing. Under the assumption that v1 of the Election microservice had already been released to Production, we deployed v1 to each of the three namespaces.

In Part Two of this post, we will learn how to utilize the advanced API testing capabilities of Postman and Newman to ensure v2 is ready for UAT and release to Production. We will deploy and perform integration testing of a new v2 of the Election microservice, locally on Kubernetes Minikube. Once confident v2 is functioning as intended, we will promote and test v2 across the dev, test, and uat namespaces.

Source Code

As a reminder, all source code for this post can be found on GitHub. The project’s README file contains a list of the Election microservice’s endpoints. To get started quickly, use one of the two following options (gist).

Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.

This project includes a kubernetes sub-directory, containing all the Kubernetes resource files and scripts necessary to recreate the example shown in the post.

Testing Locally with Minikube

Deploying to GKE, no matter how automated, takes time and resources, whether those resources are team members or just compute and system resources. Before deploying v2 of the Election service to the non-prod GKE cluster, we should ensure that it has been thoroughly tested locally. Local testing should include the following test criteria:

  1. Source code builds successfully
  2. All unit-tests pass
  3. A new Docker Image can be created from the build artifact
  4. The Service can be deployed to Kubernetes (Minikube)
  5. The deployed instance can connect to the database and execute the Liquibase changesets
  6. The deployed instance passes a minimal set of integration tests

Minikube gives us the ability to quickly iterate and test an application, as well as the Kubernetes and Istio resources required for its operation, before promoting to GKE. These resources include Kubernetes Namespaces, Secrets, Deployments, Services, Route Rules, and Istio Ingresses. Since Minikube is just that, a miniature version of our GKE cluster, we should be able to have a nearly one-to-one parity between the Kubernetes resources we apply locally and those applied to GKE. This post assumes you have the latest version of Minikube installed, and are familiar with its operation.

This project includes a minikube sub-directory, containing all the Kubernetes resource files and scripts necessary to recreate the Minikube deployment example shown in this post. The three included scripts are designed to be easily adapted to a CI/CD DevOps workflow. You may need to modify the scripts to match your environment’s configuration. Note this Minikube-deployed version of the Election service relies on the external Amazon RDS database instance.

Local Database Version

To eliminate the AWS costs, I have included a second, alternate version of the Minikube Kubernetes resource files, minikube_db_local This version deploys a single containerized PostgreSQL database instance to Minikube, as opposed to relying on the external Amazon RDS instance. Be aware, the database does not have persistent storage or an Istio sidecar proxy.

istio_100.png

Minikube Cluster

If you do not have a running Minikube cluster, create one with the minikube start command.

istio_081

Minikube allows you to use normal kubectl CLI commands to interact with the Minikube cluster. Using the kubectl get nodes command, we should see a single Minikube node running the latest Kubernetes v1.10.0.

istio_082

Istio on Minikube

Next, install Istio following Istio’s online installation instructions. A basic Istio installation on Minikube, without the additional add-ons, should only require a single Istio install script.

istio_083

If successful, you should observe a new istio-system namespace, containing the four main Istio components: istio-ca, istio-ingress, istio-mixer, and istio-pilot.

istio_084

Deploy v2 to Minikube

Next, create a Minikube Development environment, consisting of a dev Namespace, Istio Ingress, and Secret, using the part1-create-environment.sh script. Next, deploy v2 of the Election service to thedev Namespace, along with an associated Route Rule, using the part2-deploy-v2.sh script. One v2 instance should be sufficient to satisfy the testing requirements.

istio_085

Access to v2 of the Election service on Minikube is a bit different than with GKE. When routing external HTTP requests, there is no load balancer, no external public IP address, and no public DNS or subdomains. To access the single instance of v2 running on Minikube, we use the local IP address of the Minikube cluster, obtained with the minikube ip command. The access port required is the Node Port (nodePort) of the istio-ingress Service. The command is shown below (gist) and included in the part3-smoke-test.sh script.

The second part of our HTTP request routing is the same as with GKE, relying on an Istio Route Rules. The /v2/ sub-collection resource in the HTTP request URL is rewritten and routed to the v2 election Pod by the Route Rule. To confirm v2 of the Election service is running and addressable, curl the /v2/actuator/health endpoint. Spring Actuator’s /health endpoint is frequently used at the end of a CI/CD server’s deployment pipeline to confirm success. The Spring Boot application can take a few minutes to fully start up and be responsive to requests, depending on the speed of your local machine.

istio_093.png

Using the Kubernetes Dashboard, we should see our deployment of the single Election service Pod is running successfully in Minikube’s dev namespace.

istio_087

Once deployed, we run a battery of integration tests to confirm that the new v2 functionality is working as intended before deploying to GKE. In the next section of this post, we will explore the process creating and managing Postman Collections and Postman Environments, and how to automate those Collections of tests with Newman and Jenkins.

istio_088

Integration Testing

The typical reason an application is deployed to lower environments, prior to Production, is to perform application testing. Although definitions vary across organizations, testing commonly includes some or all of the following types: Integration Testing, Functional Testing, System Testing, Stress or Load Testing, Performance Testing, Security Testing, Usability Testing, Acceptance Testing, Regression Testing, Alpha and Beta Testing, and End-to-End Testing. Test teams may also refer to other testing forms, such as Whitebox (Glassbox), Blackbox Testing, Smoke, Validation, or Sanity Testing, and Happy Path Testing.

The site, softwaretestinghelp.com, defines integration testing as, ‘testing of all integrated modules to verify the combined functionality after integration is termed so. Modules are typically code modules, individual applications, client and server applications on a network, etc. This type of testing is especially relevant to client/server and distributed systems.

In this post, we are concerned that our integrated modules are functioning cohesively, primarily the Election service, Amazon RDS database, DNS, Istio Ingress, Route Rules, and the Istio sidecar Proxy. Unlike Unit Testing and Static Code Analysis (SCA), which is done pre-deployment, integration testing requires an application to be deployed and running in an environment.

Postman

I have chosen Postman, along with Newman, to execute a Collection of integration tests before promoting to the next environment. The integration tests confirm the deployed application’s name and version. The integration tests then perform a series of HTTP GET, POST, PUT, PATCH, and DELETE actions against the service’s resources. The integration tests verify a successful HTTP response code is returned, based on the type of request made.

istio_055

Postman tests are written in JavaScript, similar to other popular, modern testing frameworks. Postman offers advanced features such as test-chaining. Tests can be chained together through the use of environment variables to store response values and pass them onto to other tests. Values shared between tests are also stored in the Postman Environments. Below, we store the ID of the new candidate, the result of an HTTP POST to the /candidates endpoint. We then use the stored candidate ID in proceeding HTTP GET, PUT, and PATCH test requests to the same /candidates endpoint.

istio_056

Environment-specific variables, such as the resource host, port, and environment sub-collection resource, are abstracted and stored as key/value pairs within Postman Environments, and called through variables in the request URL and within the tests. Thus, the same Postman Collection of tests may be run against multiple environments using different Postman Environments.

istio_057

Postman Runner allows us to run multiple iterations of our Collection. We also have the option to build in delays between tests. Lastly, Postman Runner can load external JSON and CSV formatted test data, which is beyond the scope of this post.

istio_058

Postman contains a simple Run Summary UI for viewing test results.

istio_060

Test Automation

To support running tests from the command line, Postman provides Newman. According to Postman, Newman is a command-line collection runner for Postman. Newman offers the same functionality as Postman’s Collection Runner, all part of the newman CLI. Newman is Node.js module, installed globally as an npm package, npm install newman --global.

Typically, Development and Testing teams compose Postman Collections and define Postman Environments, locally. Teams run their tests locally in Postman, during their development cycle. Then, those same Postman Collections are executed from the command line, or more commonly as part of a CI/CD pipeline, such as with Jenkins.

Below, the same Collection of integration tests ran in the Postman Runner UI, are run from the command line, using Newman.

istio_061

Jenkins

Without a doubt, Jenkins is the leading open-source CI/CD automation server. The building, testing, publishing, and deployment of microservices to Kubernetes is relatively easy with Jenkins. Generally, you would build, unit-test, push a new Docker image, and then deploy your application to Kubernetes using a series of CI/CD pipelines. Below, we see examples of these pipelines using Jenkins Blue Ocean, starting with a continuous integration pipeline, which includes unit-testing and Static Code Analysis (SCA) with SonarQube.

istio_108

Followed by a pipeline to build the Docker Image, using the build artifact from the above pipeline, and pushes the Image to Docker Hub.

istio_109

The third pipeline that demonstrates building the three Kubernetes environments and deploying v1 of the Election service to the dev namespace. This pipeline is just for demonstration purposes; typically, you would separate these functions.

istio_110

Spinnaker

An alternative to Jenkins for the deployment of microservices is Spinnaker, created by Netflix. According to Netflix, ‘Spinnaker is an open source, multi-cloud continuous delivery platform for releasing software changes with high velocity and confidence.’ Spinnaker is designed to integrate easily with Jenkins, dividing responsibilities for continuous integration and delivery, with deployment. Below, Spinnaker two sample deployment pipelines, similar to Jenkins, for deploying v1 and v2 of the Election service to the non-prod GKE cluster.

spin_07

Below, Spinnaker has deployed v2 of the Election service to dev using a Highlander deployment strategy. Subsequently, Spinnaker has deployed v2 to test using a Red/Black deployment strategy, leaving the previously released v1 Server Group in place, in case a rollback is required.

spin_08

Once Spinnaker is has completed the deployment tasks, the Postman Collections of smoke and integration tests are executed by Newman, as part of another Jenkins CI/CD pipeline.

istio_101B.png

In this pipeline, a set of basic smoke tests is run first to ensure the new deployment is running properly, and then the integration tests are executed.

istio_102

In this simple example, we have a three-stage pipeline created from a Jenkinsfile (gist).

Test Results

Newman offers several options for displaying test results. For easy integration with Jenkins, Newman results can be delivered in a format that can be displayed as JUnit test reports. The JUnit test report format, XML, is a popular method of standardizing test results from different testing tools. Below is a truncated example of a test report file (gist).

Translating Newman test results to JUnit reports allows the percentage of test cases successfully executed, to be tracked over multiple deployments, a universal testing metric. Below we see the JUnit Test Reports Test Result Trend graph for a series of test runs.

istio_103

Deploying to Development

Development environments typically have a rapid turnover of application versions. Many teams use their Development environment as a continuous integration environment, where every commit that successfully builds and passes all unit tests, is deployed. The purpose of the CI deployments is to ensure build artifacts will successfully deploy through the CI/CD pipeline, start properly, and pass a basic set of smoke tests.

Other teams use the Development environments as an extension of their local Minikube environment. The Development environment will possess some or all of the required external integration points, which the Developer’s local Minikube environment may not. The goal of the Development environment is to help Developers ensure their application is functioning correctly and is ready for the Test teams to evaluate, prior to promotion to the Test environment.

Some external integration points, such as external payment gateways, customer relationship management (CRM) systems, content management systems (CMS), or data analytics engines, are often stubbed-out in lower environments. Generally, third-party providers only offer a limited number of parallel non-Production integration environments. While an application may pass through several non-prod environments, testing against all external integration points will only occur in one or two of those environments.

With v2 of the Election service ready for testing on GKE, we deploy it to the GKE cluster’s dev namespace using the part4a-deploy-v2-dev.sh script. We will also delete the previous v1 version of the Election service. Similar to the v1 deployment script, the v2 scripts perform a kube-inject command, which manually injects the Istio sidecar proxy alongside the Election service, into each election v2 Pod. The deployment script also deploys an alternate Istio Route Rule, which routes requests to api.dev.voter-demo.com/v2/* resource of v2 of the Election service.

istio_054.png

Once deployed, we run our Postman Collection of integration tests with Newman or as part of a CI/CD pipeline. In the Development environment, we may choose to run a limited set of tests for the sake of expediency, or because not all external integration points are accessible.

Promotion to Test

With local Minikube and Development environment testing complete, we promote and deploy v2 of the Election service to the Test environment, using the part4b-deploy-v2-test.sh script. In Test, we will not delete v1 of the Election service.

istio_062

Often, an organization will maintain a running copy of all versions of an application currently deployed to Production, in a lower environment. Let’s look at two scenarios where this is common. First, v1 of the Election service has an issue in Production, which needs to be confirmed and may require a hot-fix by the Development team. Validation of the v1 Production bug is often done in a lower environment. The second scenario for having both versions running in an environment is when v1 and v2 both need to co-exist in Production. Organizations frequently support multiple API versions. Cutting over an entire API user-base to a new API version is often completed over a series of releases, and requires careful coordination with API consumers.

Testing All Versions

An essential role of integration testing should be to confirm that both versions of the Election service are functioning correctly, while simultaneously running in the same namespace. For example, we want to verify traffic is routed correctly, based on the HTTP request URL, to the correct version. Another common test scenario is database schema changes. Suppose we make what we believe are backward-compatible database changes to v2 of the Election service. We should be able to prove, through testing, that both the old and new versions function correctly against the latest version of the database schema.

There are different automation strategies that could be employed to test multiple versions of an application without creating separate Collections and Environments. A simple solution would be to templatize the Environments file, and then programmatically change the Postman Environment’s version variable injected from a pipeline parameter (abridged environment file shown below).

istio_095.png

Once initial automated integration testing is complete, Test teams will typically execute additional forms of application testing if necessary, before signing off for UAT and Performance Testing to begin.

User-Acceptance Testing

With testing in the Test environments completed, we continue onto UAT. The term UAT suggest that a set of actual end-users (API consumers) of the Election service will perform their own testing. Frequently, UAT is only done for a short, fixed period of time, often with a specialized team of Testers. Issues experienced during UAT can be expensive and impact the ability to release an application to Production on-time if sign-off is delayed.

After deploying v2 of the Election service to UAT, and before opening it up to the UAT team, we would naturally want to repeat the same integration testing process we conducted in the previous Test environment. We must ensure that v2 is functioning as expected before our end-users begin their testing. This is where leveraging a tool like Jenkins makes automated integration testing more manageable and repeatable. One strategy would be to duplicate our existing Development and Test pipelines, and re-target the new pipeline to call v2 of the Election service in UAT.

istio_104.png

Again, in a JUnit report format, we can examine individual results through the Jenkins Console.

istio_105.png

We can also examine individual results from each test run using a specific build’s Console Output.

istio_106.png

Testing and Instrumentation

To fully evaluate the integration test results, you must look beyond just the percentage of test cases executed successfully. It makes little sense to release a new version of an application if it passes all functional tests, but significantly increases client response times, unnecessarily increases memory consumption or wastes other compute resources, or is grossly inefficient in the number of calls it makes to the database or third-party dependencies. Often times, integration testing uncovers potential performance bottlenecks that are incorporated into performance test plans.

Critical intelligence about the performance of the application can only be obtained through the use of logging and metrics collection and instrumentation. Istio provides this telemetry out-of-the-box with Zipkin, Jaeger, Service Graph, Fluentd, Prometheus, and Grafana. In the included Grafana Istio Dashboard below, we see the performance of v1 of the Election service, under test, in the Test environment. We can compare request and response payload size and timing, as well as request and response times to external integration points, such as our Amazon RDS database. We are able to observe the impact of individual test requests on the application and all its integration points.

istio_067

As part of integration testing, we should monitor the Amazon RDS CloudWatch metrics. CloudWatch allows us to evaluate critical database performance metrics, such as the number of concurrent database connections, CPU utilization, read and write IOPS, Memory consumption, and disk storage requirements.

istio_043

A discussion of metrics starts moving us toward load and performance testing against Production service-level agreements (SLAs). Using a similar approach to integration testing, with load and performance testing, we should be able to accurately estimate the sizing requirements our new application for Production. Load and Performance Testing helps answer questions like the type and size of compute resources are required for our GKE Production cluster and for our Amazon RDS database, or how many compute nodes and number of instances (Pods) are necessary to support the expected user-load.

All opinions expressed in this post are my own, and not necessarily the views of my current or past employers, or their clients.

, , , , , , , , , , , , , ,

2 Comments

Managing Applications Across Multiple Kubernetes Environments with Istio: Part 1

In the following two-part post, we will explore the creation of a GKE cluster, replete with the latest version of Istio, often referred to as IoK (Istio on Kubernetes). We will then deploy, perform integration testing, and promote an application across multiple environments within the cluster.

Application Environment Management

Container orchestration engines, such as Kubernetes, have revolutionized the deployment and management of microservice-based architectures. Combined with a Service Mesh, such as Istio, Kubernetes provides a secure, instrumented, enterprise-grade platform for modern, distributed applications.

One of many challenges with any platform, even one built on Kubernetes, is managing multiple application environments. Whether applications run on bare-metal, virtual machines, or within containers, deploying to and managing multiple application environments increases operational complexity.

As Agile software development practices continue to increase within organizations, the need for multiple, ephemeral, on-demand environments also grows. Traditional environments that were once only composed of Development, Test, and Production, have expanded in enterprises to include a dozen or more environments, to support the many stages of the modern software development lifecycle. Current application environments often include Continous Integration and Delivery (CI), Sandbox, Development, Integration Testing (QA), User Acceptance Testing (UAT), Staging, Performance, Production, Disaster Recovery (DR), and Hotfix. Each environment requiring its own compute, security, networking, configuration, and corresponding dependencies, such as databases and message queues.

Environments and Kubernetes

There are various infrastructure architectural patterns employed by Operations and DevOps teams to provide Kubernetes-based application environments to Development teams. One pattern consists of separate physical Kubernetes clusters. Separate clusters provide a high level of isolation. Isolation offers many advantages, including increased performance and security, the ability to tune each cluster’s compute resources to meet differing SLAs, and ensuring a reduced blast radius when things go terribly wrong. Conversely, separate clusters often result in increased infrastructure costs and operational overhead, and complex deployment strategies. This pattern is often seen in heavily regulated, compliance-driven organizations, where security, auditability, and separation of duties are paramount.

Kube Clusters Diagram F15

Namespaces

An alternative to separate physical Kubernetes clusters is virtual clusters. Virtual clusters are created using Kubernetes Namespaces. According to Kubernetes documentation, ‘Kubernetes supports multiple virtual clusters backed by the same physical cluster. These virtual clusters are called namespaces’.

In most enterprises, Operations and DevOps teams deliver a combination of both virtual and physical Kubernetes clusters. For example, lower environments, such as those used for Development, Test, and UAT, often reside on the same physical cluster, each in a separate virtual cluster (namespace). At the same time, environments such as Performance, Staging, Production, and DR, often require the level of isolation only achievable with physical Kubernetes clusters.

In the Cloud, physical clusters may be further isolated and secured using separate cloud accounts. For example, with AWS you might have a Non-Production AWS account and a Production AWS account, both managed by an AWS Organization.

Kube Clusters Diagram v2 F3

In a multi-environment scenario, a single physical cluster would contain multiple namespaces, into which separate versions of an application or applications are independently deployed, accessed, and tested. Below we see a simple example of a single Kubernetes non-prod cluster on the left, containing multiple versions of different microservices, deployed across three namespaces. You would likely see this type of deployment pattern as applications are deployed, tested, and promoted across lower environments, before being released to Production.

Kube Clusters Diagram v2 F5.png

Example Application

To demonstrate the promotion and testing of an application across multiple environments, we will use a simple election-themed microservice, developed for a previous post, Developing Cloud-Native Data-Centric Spring Boot Applications for Pivotal Cloud Foundry. The Spring Boot-based application allows API consumers to create, read, update, and delete, candidates, elections, and votes, through an exposed set of resources, accessed via RESTful endpoints.

Source Code

All source code for this post can be found on GitHub. The project’s README file contains a list of the Election microservice’s endpoints. To get started quickly, use one of the two following options (gist).

Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.

This project includes a kubernetes sub-directory, containing all the Kubernetes resource files and scripts necessary to recreate the example shown in the post. The scripts are designed to be easily adapted to a CI/CD DevOps workflow. You will need to modify the script’s variables to match your own environment’s configuration.

istio_107small

Database

The post’s Spring Boot application relies on a PostgreSQL database. In the previous post, ElephantSQL was used to host the PostgreSQL instance. This time, I have used Amazon RDS for PostgreSQL. Amazon RDS for PostgreSQL and ElephantSQL are equivalent choices. For simplicity, you might also consider a containerized version of PostgreSQL, managed as part of your Kubernetes environment.

Ideally, each environment should have a separate database instance. Separate database instances provide better isolation, fine-grained RBAC, easier test data lifecycle management, and improved performance. Although, for this post, I suggest a single, shared, minimally-sized RDS instance.

The PostgreSQL database’s sensitive connection information, including database URL, username, and password, are stored as Kubernetes Secrets, one secret for each namespace, and accessed by the Kubernetes Deployment controllers.

istio_043.png

Istio

Although not required, Istio makes the task of managing multiple virtual and physical clusters significantly easier. Following Istio’s online installation instructions, download and install Istio 0.7.1.

To create a Google Kubernetes Engine (GKE) cluster with Istio, you could use gcloud CLI’s container clusters create command, followed by installing Istio manually using Istio’s supplied Kubernetes resource files. This was the method used in the previous post, Deploying and Configuring Istio on Google Kubernetes Engine (GKE).

Alternatively, you could use Istio’s Google Cloud Platform (GCP) Deployment Manager files, along with the gcloud CLI’s deployment-manager deployments create command to create a Kubernetes cluster, replete with Istio, in a single step. Although arguably simpler, the deployment-manager method does not provide the same level of fine-grain control over cluster configuration as the container clusters create method. For this post, the deployment-manager method will suffice.

istio_001

The latest version of the Google Kubernetes Engine, available at the time of this post, is 1.9.6-gke.0. However, to install this version of Kubernetes Engine using the Istio’s supplied deployment Manager Jinja template requires updating the hardcoded value in the istio-cluster.jinja file from 1.9.2-gke.1. This has been updated in the next release of Istio.

istio_002

Another change, the latest version of Istio offered as an option in the istio-cluster-jinja.schema file. Specifically, the installIstioRelease configuration variable is only 0.6.0. The template does not include 0.7.1 as an option. Modify the istio-cluster-jinja.schema file to include the choice of 0.7.1. Optionally, I also set 0.7.1 as the default. This change should also be included in the next version of Istio.

istio_075.png

There are a limited number of GKE and Istio configuration defaults defined in the istio-cluster.yaml file, all of which can be overridden from the command line.

istio_002B.png

To optimize the cluster, and keep compute costs to a minimum, I have overridden several of the default configuration values using the properties flag with the gcloud CLI’s deployment-manager deployments create command. The README file provided by Istio explains how to use this feature. Configuration changes include the name of the cluster, the version of Istio (0.7.1), the number of nodes (2), the GCP zone (us-east1-b), and the node instance type (n1-standard-1). I also disabled automatic sidecar injection and chose not to install the Istio sample book application onto the cluster (gist).

Cluster Provisioning

To provision the GKE cluster and deploy Istio, first modify the variables in the part1-create-gke-cluster.sh file (shown above), then execute the script. The script also retrieves your cluster’s credentials, to enable command line interaction with the cluster using the kubectl CLI.

istio_002C.png

Once complete, validate the version of Istio by examining Istio’s Docker image versions, using the following command (gist).

The result should be a list of Istio 0.7.1 Docker images.

istio_076.png

The new cluster should be running GKE version 1.9.6.gke.0. This can be confirmed using the following command (gist).

Or, from the GCP Cloud Console.

istio_037

The new GKE cluster should be composed of (2) n1-standard-1 nodes, running in the us-east-1b zone.

istio_038

As part of the deployment, all of the separate Istio components should be running within the istio-system namespace.

istio_040

As part of the deployment, an external IP address and a load balancer were provisioned by GCP and associated with the Istio Ingress. GCP’s Deployment Manager should have also created the necessary firewall rules for cluster ingress and egress.

istio_010.png

Building the Environments

Next, we will create three namespaces,dev, test, and uat, which represent three non-production environments. Each environment consists of a Kubernetes Namespace, Istio Ingress, and Secret. The three environments are deployed using the part2-create-environments.sh script.

istio_048.png

Deploying Election v1

For this demonstration, we will assume v1 of the Election service has been previously promoted, tested, and released to Production. Hence, we would expect v1 to be deployed to each of the lower environments. Additionally, a new v2 of the Election service has been developed and tested locally using Minikube. It is ready for deployment to the three environments and will undergo integration testing (detailed in Part Two of the post).

If you recall from our GKE/Istio configuration, we chose manual sidecar injection of the Istio proxy. Therefore, all election deployment scripts perform a kube-inject command. To connect to our external Amazon RDS database, this kube-inject command requires the includeIPRanges flag, which contains two cluster configuration values, the cluster’s IPv4 CIDR (clusterIpv4Cidr) and the service’s IPv4 CIDR (servicesIpv4Cidr).

Before deployment, we export the includeIPRanges value as an environment variable, which will be used by the deployment scripts, using the following command, export IP_RANGES=$(sh ./get-cluster-ip-ranges.sh). The get-cluster-ip-ranges.sh script is shown below (gist).

Using this method with manual sidecar injection is discussed in the previous post, Deploying and Configuring Istio on Google Kubernetes Engine (GKE).

To deploy v1 of the Election service to all three namespaces, execute the part3-deploy-v1-all-envs.sh script.

istio_051.png

We should now have two instances of v1 of the Election service, running in the dev, test, and uat namespaces, for a total of six election-v1 Kubernetes Pods.

istio_052

HTTP Request Routing

Before deploying additional versions of the Election service in Part Two of this post, we should understand how external HTTP requests will be routed to different versions of the Election service, in multiple namespaces. In the post’s simple example, we have a matrix of three namespaces and two versions of the Election service. That means we need a method to route external traffic to up to six different election versions. There multiple ways to solve this problem, each with their own pros and cons. For this post, I found a combination of DNS and HTTP request rewriting is most effective.

DNS

First, to route external HTTP requests to the correct namespace, we will use subdomains. Using my current DNS management solution, Azure DNS, I create three new A records for my registered domain, voter-demo.com. There is one A record for each namespace, including api.dev, api.test, and api.uat.

istio_077.png

All three subdomains should resolve to the single external IP address assigned to the cluster’s load balancer.

istio_010.png

As part of the environments creation, the script deployed an Istio Ingress, one to each environment. The ingress accepts traffic based on a match to the Request URL (gist).

The istio-ingress service load balancer, running in the istio-system namespace, routes inbound external traffic, based on the Request URL, to the Istio Ingress in the appropriate namespace.

istio_053.png

The Istio Ingress in the namespace then directs the traffic to one of the Kubernetes Pods, containing the Election service and the Istio sidecar proxy.

istio_068.png

HTTP Rewrite

To direct the HTTP request to v1 or v2 of the Election service, an Istio Route Rule is used. As part of the environment creation, along with a Namespace and Ingress resources, we also deployed an Istio Route Rule to each environment. This particular route rule examines the HTTP request URL for a /v1/ or /v2/ sub-collection resource. If it finds the sub-collection resource, it performs a HTTPRewrite, removing the sub-collection resource from the HTTP request. The Route Rule then directs the HTTP request to the appropriate version of the Election service, v1 or v2 (gist).

According to Istio, ‘if there are multiple registered instances with the specified tag(s), they will be routed to based on the load balancing policy (algorithm) configured for the service (round-robin by default).’ We are using the default load balancing algorithm to distribute requests across multiple copies of each Election service.

The final external HTTP request routing for the Election service in the Non-Production GKE cluster is shown on the left, in the diagram, below. Every Election service Pod also contains an Istio sidecar proxy instance.

Kube Clusters Diagram F14

Below are some examples of HTTP GET requests that would be successfully routed to our Election service, using the above-described routing strategy (gist).

Part Two

In Part One of this post, we created the Kubernetes cluster on the Google Cloud Platform, installed Istio, provisioned a PostgreSQL database, and configured DNS for routing. Under the assumption that v1 of the Election microservice had already been released to Production, we deployed v1 to each of the three namespaces.

In Part Two of this post, we will learn how to utilize the sophisticated API testing capabilities of Postman and Newman to ensure v2 is ready for UAT and release to Production. We will deploy and perform integration testing of a new, v2 of the Election microservice, locally, on Kubernetes Minikube. Once we are confident v2 is functioning as intended, we will promote and test v2, across the dev, test, and uat namespaces.

All opinions expressed in this post are my own, and not necessarily the views of my current or past employers, or their clients.

, , , , , , , , , , ,

1 Comment

Developing Cloud-Native Data-Centric Spring Boot Applications for Pivotal Cloud Foundry

In this post, we will explore the development of a cloud-native, data-centric Spring Boot 2.0 application, and its deployment to Pivotal Software’s hosted Pivotal Cloud Foundry service, Pivotal Web Services. We will add a few additional features, such as Spring Data, Lombok, and Swagger, to enhance our application.

According to Pivotal, Spring Boot makes it easy to create stand-alone, production-grade Spring-based Applications. Spring Boot takes an opinionated view of the Spring platform and third-party libraries. Spring Boot 2.0 just went GA on March 1, 2018. This is the first major revision of Spring Boot since 1.0 was released almost 4 years ago. It is also the first GA version of Spring Boot that provides support for Spring Framework 5.0.

Pivotal Web Services’ tagline is ‘The Agile Platform for the Agile Team Powered by Cloud Foundry’. According to Pivotal,  Pivotal Web Services (PWS) is a hosted environment of Pivotal Cloud Foundry (PCF). PWS is hosted on AWS in the US-East region. PWS utilizes two availability zones for redundancy. PWS provides developers a Spring-centric PaaS alternative to AWS Elastic Beanstalk, Elastic Container Service (Amazon ECS), and OpsWorks. With PWS, you get the reliability and security of AWS, combined with the rich-functionality and ease-of-use of PCF.

To demonstrate the feature-rich capabilities of the Spring ecosystem, the Spring Boot application shown in this post incorporates the following complimentary technologies:

  • Spring Boot Actuator: Sub-project of Spring Boot, adds several production grade services to Spring Boot applications with little developer effort
  • Spring Data JPA: Sub-project of Spring Data, easily implement JPA based repositories and data access layers
  • Spring Data REST: Sub-project of Spring Data, easily build hypermedia-driven REST web services on top of Spring Data repositories
  • Spring HATEOAS: Create REST representations that follow the HATEOAS principle from Spring-based applications
  • Springfox Swagger 2: We are using the Springfox implementation of the Swagger 2 specification, an automated JSON API documentation for API’s built with Spring
  • Lombok: The @Data annotation generates boilerplate code that is typically associated with simple POJOs (Plain Old Java Objects) and beans: @ToString, @EqualsAndHashCode, @Getter, @Setter, and @RequiredArgsConstructor

Source Code

All source code for this post can be found on GitHub. To get started quickly, use one of the two following commands (gist).

For this post, I have used JetBrains IntelliJ IDEA and Git Bash on Windows for development. However, all code should be compatible with most popular IDEs and development platforms. The project assumes you have Docker and the Cloud Foundry Command Line Interface (cf CLI) installed locally.

Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.

Demo Application

The Spring Boot application demonstrated in this post is a simple election-themed RESTful API. The app allows API consumers to create, read, update, and delete, candidates, elections, and votes, via its exposed RESTful HTTP-based resources.

The Spring Boot application consists of (7) JPA Entities that mirror the tables and views in the database, (7) corresponding Spring Data Repositories, (2) Spring REST Controller, (4) Liquibase change sets, and associated Spring, Liquibase, Swagger, and PCF configuration files. I have intentionally chosen to avoid the complexities of using Data Transfer Objects (DTOs) for brevity, albeit a security concern, and directly expose the entities as resources.

img022_Final_Project

Controller Resources

This application is a simple CRUD application. The application contains a few simple HTTP GET resources in each of the two controller classes, as an introduction to Spring REST controllers. For example, the CandidateController contains the /candidates/summary and /candidates/summary/{election} resources (shown below in Postman). Typically, you would expose your data to the end-user as controller resources, as opposed to exposing the entities directly. The ease of defining controller resources is one of the many powers of Spring Boot.

img025_CustomResource.PNG

Paging and Sorting

As an introduction to Spring Data’s paging and sorting features, both the VoteRepository and VotesElectionsViewRepository Repository Interfaces extend Spring Data’s PagingAndSortingRepository<T,ID> interface, instead of the default CrudRepository<T,ID> interface. With paging and sorting enabled, you may both sort and limit the amount of data returned in the response payload. For example, to reduce the size of your response payload, you might choose to page through the votes in blocks of 25 votes at a time. In that case, as shown below in Postman, if you needed to return just votes 26-50, you would append the /votes resource with ?page=1&size=25. Since paging starts on page 0 (zero), votes 26-50 will on page 1.

img024_Paging

Swagger

This project also includes the Springfox implementation of the Swagger 2 specification. Along with the Swagger 2 dependency, the project takes a dependency on io.springfox:springfox-swagger-ui. The Springfox Swagger UI dependency allows us to view and interactively test our controller resources through Swagger’s browser-based UI, as shown below.

img027B_Swagger

All Swagger configuration can be found in the project’s SwaggerConfig Spring Configuration class.

Gradle

This post’s Spring Boot application is built with Gradle, although it could easily be converted to Maven if desired. According to Gradle, Gradle is the modern tool used to build, automate and deliver everything from mobile apps to microservices.

Data

In real life, most applications interact with one or more data sources. The Spring Boot application demonstrated in this post interacts with data from a PostgreSQL database. PostgreSQL, or simply Postgres, is the powerful, open-source object-relational database system, which has supported production-grade applications for 15 years. The application’s single elections database consists of (6) tables, (3) views, and (2) function, which are both used to generate random votes for this demonstration.

img020_Database_Diagram

Spring Data makes interacting with PostgreSQL easy. In addition to the features of Spring Data, we will use Liquibase. Liquibase is known as the source control for your database. With Liquibase, your database development lifecycle can mirror your Spring development lifecycle. Both DDL (Data Definition Language) and DML (Data Manipulation Language) changes are versioned controlled, alongside the Spring Boot application source code.

Locally, we will take advantage of Docker to host our development PostgreSQL database, using the official PostgreSQL Docker image. With Docker, there is no messy database installation and configuration of your local development environment. Recreating and deleting your PostgreSQL database is simple.

To support the data-tier in our hosted PWS environment, we will implement ElephantSQL, an offering from the Pivotal Services Marketplace. ElephantSQL is a hosted version of PostgreSQL, running on AWS. ElephantSQL is referred to as PostgreSQL as a Service, or more generally, a Database as a Service or DBaaS. As a Pivotal Marketplace service, we will get easy integration with our PWS-hosted Spring Boot application, with near-zero configuration.

Docker

First, set up your local PostgreSQL database using Docker and the official PostgreSQL Docker image. Since this is only a local database instance, we will not worry about securing our database credentials (gist).

Your running PostgreSQL container should resemble the output shown below.

img001_docker

Data Source

Most IDEs allow you to create and save data sources. Although this is not a requirement, it makes it easier to view the database’s resources and table data. Below, I have created a data source in IntelliJ from the running PostgreSQL container instance. The port, username, password, and database name were all taken from the above Docker command.

img002_IntelliJ_Data_Source

Liquibase

There are multiple strategies when it comes to managing changes to your database. With Liquibase, each set of changes are handled as change sets. Liquibase describes a change set as an atomic change that you want to apply to your database. Liquibase offers multiple formats for change set files, including XML, JSON, YAML, and SQL. For this post, I have chosen SQL, specifically PostgreSQL SQL dialect, which can be designated in the IntelliJ IDE. Below is an example of the first changeset, which creates four tables and two indexes.

img023_Change_Set

As shown below, change sets are stored in the db/changelog/changes sub-directory, as configured in the master change log file (db.changelog-master.yaml). Change set files follow an incremental naming convention.

img003C_IntelliJ_Liquibase_Changesets

The empty PostgreSQL database, before any Liquibase changes, should resemble the screengrab shown below.

img003_IntelliJ_Blank_Database_cropped

To automatically run Liquibase database migrations on startup, the org.liquibase:liquibase-core dependency must be added to the project’s build.gradle file. To apply the change sets to your local, empty PostgreSQL database, simply start the service locally with the gradle bootRun command. As the app starts after being compiled, any new Liquibase change sets will be applied.

img004_Gradle_bootRun

You might ask how does Liquibase know the change sets are new. During the initial startup of the Spring Boot application, in addition to any initial change sets, Liquibase creates two database tables to track changes, the databasechangelog and databasechangeloglock tables. Shown below are the two tables, along with the results of the four change sets included in the project, and applied by Liquibase to the local PostgreSQL elections database.

img005_IntelliJ_Initial_Database_cropped

Below we see the contents of the databasechangelog table, indicating that all four change sets were successfully applied to the database. Liquibase checks this table before applying change sets.

img006B_IntelliJ_Database_Change_Log

ElephantSQL

Before we can deploy our Spring Boot application to PWS, we need an accessible PostgreSQL instance in the Cloud; I have chosen ElephantSQL. Available through the Pivotal Services Marketplace, ElephantSQL currently offers one free and three paid service plans for their PostgreSQL as a Service. I purchased the Panda service plan as opposed to the free Turtle service plan. I found the free service plan was too limited in the maximum number of database connections for multiple service instances.

Previewing and purchasing an ElephantSQL service plan from the Pivotal Services Marketplace, assuming you have an existing PWS account, literally takes a single command (gist).

The output of the command should resemble the screengrab below. Note the total concurrent connections and total storage for each plan.

img007_PCF_ElephantSQL_Service_Purchase

To get details about the running ElephantSQL service, use the cf service elections command.

img007_PCF_ElephantSQL_Service_Info

From the ElephantSQL Console, we can obtain the connection information required to access our PostgreSQL elections database. You will need the default database name, username, password, and URL.

img012_PWS_ElephantSQL_Details

Service Binding

Once you have created the PostgreSQL database service, you need to bind the database service to the application. We will bind our application and the database, using the PCF deployment manifest file (manifest.yml), found in the project’s root directory. Binding is done using the services section (shown below).

The key/value pairs in the env section of the deployment manifest will become environment variables, local to the deployed Spring Boot service. These key/value pairs in the manifest will also override any configuration set in Spring’s external application properties file (application.yml). This file is located in the resources sub-directory. Note the SPRING_PROFILES_ACTIVE: test environment variable in the manifest.yml file. This variable designates which Spring Profile will be active from the multiple profiles defined in the application.yml file.

img008B_PCF_Manifest

Deployment to PWS

Next, we run gradle build followed by cf push to deploy one instance of the Spring Boot service to PWS and associate it with our ElephantSQL database instance. Below is the expected output from the cf push command.

img008_PCF_CF_Push

Note the route highlighted below. This is the URL where your Spring Boot service will be available.

img009_PCF_CF_Push2

To confirm your ElephantSQL database was populated by Liquibase when PWS started the deployed Spring application instance, we can check the ElephantSQL Console’s Stats tab. Note the database tables and rows in each table, signifying Liquibase ran successfully. Alternately, you could create another data source in your IDE, connected to ElephantSQL; this can be helpful for troubleshooting.

img013_Candidates

To access the running service and check that data is being returned, point your browser (or Postman) to the URL returned from the cf push command output (see second screengrab above) and hit the /candidates resource. Obviously, your URL, referred to as a route by PWS, will be different and unique. In the response payload, you should observe a JSON array of eight candidate objects. Each candidate was inserted into the Candidate table of the database, by Liquibase, when Liquibase executed the second of the four change sets on start-up.

img012_PWS_ElephantSQL

With Spring Boot Actuator and Spring Data REST, our simple Spring Boot application has dozens of resources exposed automatically, without extensive coding of resource controllers. Actuator exposes resources to help manage and troubleshoot the application, such as info, health, mappings (shown below), metrics, env, and configprops, among others. All Actuator resources are exposed explicitly, thus they can be disabled for Production deployments. With Spring Boot 2.0, all Actuator resources are now preceded with /actuator/ .

img029_Postman_Mappings

According to Pivotal, Spring Data REST builds on top of Spring Data repositories, analyzes an application’s domain model and exposes hypermedia-driven HTTP resources for aggregates contained in the model, such as our /candidates resource. A partial list of the application’s exposed resources are listed in the GitHub project’s README file.

In Spring’s approach to building RESTful web services, HTTP requests are handled by a controller. Spring Data REST automatically exposes CRUD resources for our entities. With Spring Data JPA, POJOs like our Candidate class are annotated with @Entity, indicating that it is a JPA entity. Lacking a @Table annotation, it is assumed that this entity will be mapped to a table named Candidate.

With Spring’s Data REST’s RESTful HTTP-based API, traditional database Create, Read, Update, and Delete commands for each PostgreSQL database table are automatically mapped to equivalent HTTP methods, including POST, GET, PUT, PATCH, and DELETE.

Below is an example, using Postman, to create a new Candidate using an HTTP POST method.

img029_Postman_Post

Below is an example, using Postman, to update a new Candidate using an HTTP PUT method.

img029_Postman_Put.PNG

With Spring Data REST, we can even retrieve data from read-only database Views, as shown below. This particular JSON response payload was returned from the candidates_by_elections database View, using the /election-candidates resource.

img028_Postman_View.PNG

Scaling Up

Once your application is deployed and you have tested its functionality, you can easily scale out or scale in the number instances, memory, and disk, with the cf scale command (gist).

Below is sample output from scaling up the Spring Boot application to two instances.

img016_Scale_Up2

Optionally, you can activate auto-scaling, which will allow the application to scale based on load.

img016_Autoscaling.PNG

Following the PCF architectural model, auto-scaling is actually another service from the Pivotal Services Marketplace, PCF App Autoscaler, as seen below, running alongside our ElephantSQL service.

img016_Autoscaling2.PNG

With PCF App Autoscaler, you set auto-scaling minimum and maximum instance limits and define scaling rules. Below, I have configured auto-scaling to scale out the number of application instances when the average CPU Utilization of all instances hits 80%. Conversely, the application will scale in when the average CPU Utilization recedes below 40%. In addition to CPU Utilization, PCF App Autoscaler also allows you to set scaling rules based on HTTP Throughput, HTTP Latency, RabbitMQ Depth (queue depth), and Memory Utilization.

Furthermore, I set the auto-scaling minimum number of instances to two and the maximum number of instances to four. No matter how much load is placed on the application, PWS will not scale above four instances. Conversely, PWS will maintain a minimum of two running instances at all times.

img016_Autoscaling3

Conclusion

This brief post demonstrates both the power and simplicity of Spring Boot to quickly develop highly-functional, data-centric RESTful API applications with minimal coding. Further, when coupled with Pivotal Cloud Foundry, Spring developers have a highly scalable, resilient cloud-native application hosting platform.

, , , , , , , , , , , ,

Leave a comment