Posts Tagged Google Kubernetes Engine
Building a Microservices Platform with Confluent Cloud, MongoDB Atlas, Istio, and Google Kubernetes Engine
Leading SaaS providers have sufficiently matured the integration capabilities of their product offerings to a point where it is now reasonable for enterprises to architect multi-vendor, single- and multi-cloud Production platforms, without re-engineering existing cloud-native applications. In previous posts, we have integrated other SaaS products, including as MongoDB Atlas fully-managed MongoDB-as-a-service, ElephantSQL fully-manage PostgreSQL-as-a-service, and CloudAMQP RabbitMQ-as-a-service, into cloud-native applications on Azure, AWS, GCP, and PCF.
In this post, we will build and deploy an existing, Spring Framework, microservice-based, cloud-native API to Google Kubernetes Engine (GKE), replete with Istio 1.0, on Google Cloud Platform (GCP). The API will rely on Confluent Cloud to provide a fully-managed, Kafka-based messaging-as-a-service (MaaS). Similarly, the API will rely on MongoDB Atlas to provide a fully-managed, MongoDB-based Database-as-a-service (DBaaS).
In a previous two-part post, Using Eventual Consistency and Spring for Kafka to Manage a Distributed Data Model: Part 1 and Part 2, we examined the role of Apache Kafka in an event-driven, eventually consistent, distributed system architecture. The system, an online storefront RESTful API simulation, was composed of multiple, Java Spring Boot microservices, each with their own MongoDB database. The microservices used a publish/subscribe model to communicate with each other using Kafka-based messaging. The Spring services were built using the Spring for Apache Kafka and Spring Data MongoDB projects.
Given the use case of placing an order through the Storefront API, we examined the interactions of three microservices, the Accounts, Fulfillment, and Orders service. We examined how the three services used Kafka to communicate state changes to each other, in a fully-decoupled manner.
The Storefront API’s microservices were managed behind an API Gateway, Netflix’s Zuul. Service discovery and load balancing were handled by Netflix’s Eureka. Both Zuul and Eureka are part of the Spring Cloud Netflix project. In that post, the entire containerized system was deployed to Docker Swarm.
Developing the services, not operationalizing the platform, was the primary objective of the previous post.
The following technologies are featured prominently in this post.
In May 2018, Google announced a partnership with Confluence to provide Confluent Cloud on GCP, a managed Apache Kafka solution for the Google Cloud Platform. Confluent, founded by the creators of Kafka, Jay Kreps, Neha Narkhede, and Jun Rao, is known for their commercial, Kafka-based streaming platform for the Enterprise.
Confluent Cloud is a fully-managed, cloud-based streaming service based on Apache Kafka. Confluent Cloud delivers a low-latency, resilient, scalable streaming service, deployable in minutes. Confluent deploys, upgrades, and maintains your Kafka clusters. Confluent Cloud is currently available on both AWS and GCP.
Confluent Cloud offers two plans, Professional and Enterprise. The Professional plan is optimized for projects under development, and for smaller organizations and applications. Professional plan rates for Confluent Cloud start at $0.55/hour. The Enterprise plan adds full enterprise capabilities such as service-level agreements (SLAs) with a 99.95% uptime and virtual private cloud (VPC) peering. The limitations and supported features of both plans are detailed, here.
Similar to Confluent Cloud, MongoDB Atlas is a fully-managed MongoDB-as-a-Service, available on AWS, Azure, and GCP. Atlas, a mature SaaS product, offers high-availability, uptime SLAs, elastic scalability, cross-region replication, enterprise-grade security, LDAP integration, BI Connector, and much more.
MongoDB Atlas currently offers four pricing plans, Free, Basic, Pro, and Enterprise. Plans range from the smallest, M0-sized MongoDB cluster, with shared RAM and 512 MB storage, up to the massive M400 MongoDB cluster, with 488 GB of RAM and 3 TB of storage.
MongoDB Atlas has been featured in several past posts, including Deploying and Configuring Istio on Google Kubernetes Engine (GKE) and Developing Applications for the Cloud with Azure App Services and MongoDB Atlas.
According to Google, Google Kubernetes Engine (GKE) provides a fully-managed, production-ready Kubernetes environment for deploying, managing, and scaling your containerized applications using Google infrastructure. GKE consists of multiple Google Compute Engine instances, grouped together to form a cluster.
A forerunner to other managed Kubernetes platforms, like EKS (AWS), AKS (Azure), PKS (Pivotal), and IBM Cloud Kubernetes Service, GKE launched publicly in 2015. GKE was built on Google’s experience of running hyper-scale services like Gmail and YouTube in containers for over 12 years.
GKE’s pricing is based on a pay-as-you-go, per-second-billing plan, with no up-front or termination fees, similar to Confluent Cloud and MongoDB Atlas. Cluster sizes range from 1 – 1,000 nodes. Node machine types may be optimized for standard workloads, CPU, memory, GPU, or high-availability. Compute power ranges from 1 – 96 vCPUs and memory from 1 – 624 GB of RAM.
In this post, we will deploy the three Storefront API microservices to a GKE cluster on GCP. Confluent Cloud on GCP will replace the previous Docker-based Kafka implementation. Similarly, MongoDB Atlas will replace the previous Docker-based MongoDB implementation.
Kubernetes and Istio 1.0 will replace Netflix’s Zuul and Eureka for API management, load-balancing, routing, and service discovery. Google Stackdriver will provide logging and monitoring. Docker Images for the services will be stored in Google Container Registry. Although not fully operationalized, the Storefront API will be closer to a Production-like platform, than previously demonstrated on Docker Swarm.
For brevity, we will not enable standard API security features like HTTPS, OAuth for authentication, and request quotas and throttling, all of which are essential in Production. Nor, will we integrate a full lifecycle API management tool, like Google Apigee.
The source code for this demonstration is contained in four separate GitHub repositories, storefront-kafka-docker, storefront-demo-accounts, storefront-demo-orders, and, storefront-demo-fulfillment. However, since the Docker Images for the three storefront services are available on Docker Hub, it is only necessary to clone the storefront-kafka-docker project. This project contains all the code to deploy and configure the GKE cluster and Kubernetes resources (gist).
Source code samples in this post are displayed as GitHub Gists, which may not display correctly on all mobile and social media browsers.
The setup of the Storefront API platform is divided into a few logical steps:
- Create the MongoDB Atlas cluster;
- Create the Confluent Cloud Kafka cluster;
- Create Kafka topics;
- Modify the Kubernetes resources;
- Modify the microservices to support Confluent Cloud configuration;
- Create the GKE cluster with Istio on GCP;
- Apply the Kubernetes resources to the GKE cluster;
- Test the Storefront API, Kafka, and MongoDB are functioning properly;
MongoDB Atlas Cluster
This post assumes you already have a MongoDB Atlas account and an existing project created. MongoDB Atlas accounts are free to set up if you do not already have one. Account creation does require the use of a Credit Card.
For minimal latency, we will be creating the MongoDB Atlas, Confluent Cloud Kafka, and GKE clusters, all on the Google Cloud Platform’s us-central1 Region. Available GCP Regions and Zones for MongoDB Atlas, Confluent Cloud, and GKE, vary, based on multiple factors.
For this demo, I suggest creating a free, M0-sized MongoDB cluster. The M0-sized 3-data node cluster, with shared RAM and 512 MB of storage, and currently running MongoDB 4.0.4, is fine for individual development. The us-central1 Region is the only available US Region for the free-tier M0-cluster on GCP. An M0-sized Atlas cluster may take between 7-10 minutes to provision.
MongoDB Atlas’ Web-based management console provides convenient links to cluster details, metrics, alerts, and documentation.
Once the cluster is ready, you can review details about the cluster and each individual cluster node.
In addition to the account owner, create a
demo_user account. This account will be used to authenticate and connect with the MongoDB databases from the storefront services. For this demo, we will use the same, single user account for all three services. In Production, you would most likely have individual users for each service.
Again, for security purposes, Atlas requires you to whitelist the IP address or CIDR block from which the storefront services will connect to the cluster. For now, open the access to your specific IP address using whatsmyip.com, or much less-securely, to all IP addresses (
0.0.0.0/0). Once the GKE cluster and external static IP addresses are created, make sure to come back and update this value; do not leave this wide open to the Internet.
The Java Spring Boot storefront services use a Spring Profile,
gke. According to Spring, Spring Profiles provide a way to segregate parts of your application configuration and make it available only in certain environments. The
gke Spring Profile’s configuration values may be set in a number of ways. For this demo, the majority of the values will be set using Kubernetes Deployment, ConfigMap and Secret resources, shown later.
The first two Spring configuration values will need are the MongoDB Atlas cluster’s connection string and the
demo_user account password. Note these both for later use.
Confluent Cloud Kafka Cluster
Similar to MongoDB Atlas, this post assumes you already have a Confluent Cloud account and an existing project. It is free to set up a Professional account and a new project if you do not already have one. Atlas account creation does require the use of a Credit Card.
The Confluent Cloud web-based management console is shown below. Experienced users of other SaaS platforms may find the Confluent Cloud web-based console a bit sparse on features. In my opinion, the console lacks some necessary features, like cluster observability, individual Kafka topic management, detailed billing history (always says $0?), and persistent history of cluster activities, which survives cluster deletion. It seems like Confluent prefers users to download and configure their Confluent Control Center to get the functionality you might normally expect from a web-based Saas management tool.
As explained earlier, for minimal latency, I suggest creating the MongoDB Atlas cluster, Confluent Cloud Kafka cluster, and the GKE cluster, all on the Google Cloud Platform’s us-central1 Region. For this demo, choose the smallest cluster size available on GCP, in the us-central1 Region, with 1 MB/s R/W throughput and 500 MB of storage. As shown below, the cost will be approximately $0.55/hour. Don’t forget to delete this cluster when you are done with the demonstration, or you will continue to be charged.
Cluster creation of the minimally-sized Confluent Cloud cluster is pretty quick.
Once the cluster is ready, Confluent provides instructions on how to interact with the cluster via the Confluent Cloud CLI. Install the Confluent Cloud CLI, locally, for use later.
As explained earlier, the Java Spring Boot storefront services use a Spring Profile,
gke. Like MongoDB Atlas, the Confluent Cloud Kafka cluster configuration values will be set using Kubernetes ConfigMap and Secret resources, shown later. There are several Confluent Cloud Java configuration values shown in the Client Config Java tab; we will need these for later use.
SASL and JAAS
Some users may not be familiar with the terms, SASL and JAAS. According to Wikipedia, Simple Authentication and Security Layer (SASL) is a framework for authentication and data security in Internet protocols. According to Confluent, Kafka brokers support client authentication via SASL. SASL authentication can be enabled concurrently with SSL encryption (SSL client authentication will be disabled).
There are numerous SASL mechanisms. The PLAIN SASL mechanism (SASL/PLAIN), used by Confluent, is a simple username/password authentication mechanism that is typically used with TLS for encryption to implement secure authentication. Kafka supports a default implementation for SASL/PLAIN which can be extended for production use. The SASL/PLAIN mechanism should only be used with SSL as a transport layer to ensure that clear passwords are not transmitted on the wire without encryption.
According to Wikipedia, Java Authentication and Authorization Service (JAAS) is the Java implementation of the standard Pluggable Authentication Module (PAM) information security framework. According to Confluent, Kafka uses the JAAS for SASL configuration. You must provide JAAS configurations for all SASL authentication mechanisms.
Similar to MongoDB Atlas, we need to authenticate with the Confluent Cloud cluster from the storefront services. The authentication to Confluent Cloud is done with an API Key. Create a new API Key, and note the Key and Secret; these two additional pieces of configuration will be needed later.
Confluent Cloud API Keys can be created and deleted as necessary. For security in Production, API Keys should be created for each service and regularly rotated.
With the cluster created, create the storefront service’s three Kafka topics manually, using the Confluent Cloud’s
ccloud CLI tool. First, configure the Confluent Cloud CLI using the
ccloud init command, using your new cluster’s Bootstrap Servers address, API Key, and API Secret. The instructions are shown above Clusters Client Config tab of the Confluent Cloud web-based management interface.
Create the storefront service’s three Kafka topics using the
ccloud topic create command. Use the
list command to confirm they are created.
# manually create kafka topics ccloud topic create accounts.customer.change ccloud topic create fulfillment.order.change ccloud topic create orders.order.fulfill # list kafka topics ccloud topic list accounts.customer.change fulfillment.order.change orders.order.fulfill
topic describe, displays topic replication details. The new topics will have a replication factor of 3 and a partition count of 12.
--verbose flag to the command,
ccloud --verbose topic describe, displays low-level topic and cluster configuration details, as well as a log of all topic-related activities.
The deployment of the three storefront microservices to the
dev Namespace will minimally require the following Kubernetes configuration resources.
- (1) Kubernetes Namespace;
- (3) Kubernetes Deployments;
- (3) Kubernetes Services;
- (1) Kubernetes ConfigMap;
- (2) Kubernetes Secrets;
- (1) Istio 1.0 Gateway;
- (1) Istio 1.0 VirtualService;
- (2) Istio 1.0 ServiceEntry;
v1alpha3 API introduced the last three configuration resources in the list, to control traffic routing into, within, and out of the mesh. There are a total of four new io
v1alpha3 API routing resources: Gateway, VirtualService, DestinationRule, and ServiceEntry.
Creating and managing such a large number of resources is a common complaint regarding the complexity of Kubernetes. Imagine the resource sprawl when you have dozens of microservices replicated across several namespaces. Fortunately, all resource files for this post are included in the storefront-kafka-docker project’s gke directory.
To follow along with the demo, you will need to make minor modifications to a few of these resources, including the Istio Gateway, Istio VirtualService, two Istio ServiceEntry resources, and two Kubernetes Secret resources.
Istio Gateway & VirtualService
Both the Istio Gateway and VirtualService configuration resources are contained in a single file, istio-gateway.yaml. For the demo, I am using a personal domain,
storefront-demo.com, along with the sub-domain,
api.dev, to host the Storefront API. The domain’s primary A record (‘@’) and sub-domain A record are both associated with the external IP address on the frontend of the load balancer. In the file, this host is configured for the Gateway and VirtualService resources. You can choose to replace the host with your own domain, or simply remove the host block altogether on lines 13–14 and 21–22. Removing the host blocks, you would then use the external IP address on the frontend of the load balancer (explained later in the post) to access the Storefront API (gist).
There are two Istio ServiceEntry configuration resources. Both ServiceEntry resources control egress traffic from the Storefront API services, both of their ServiceEntry Location items are set to
MESH_INTERNAL. The first ServiceEntry, mongodb-atlas-external-mesh.yaml, defines MongoDB Atlas cluster egress traffic from the Storefront API (gist).
Both need to have their
host items replaced with the appropriate Atlas and Confluent URLs.
Inspecting Istio Resources
The easiest way to view Istio resources is from the command line using the
kubectl CLI tools.
istioctl get gateway istioctl get virtualservices istioctl get serviceentry kubectl describe gateway kubectl describe virtualservices kubectl describe serviceentry
In this demo, we are only deploying to a single Kubernetes Namespace,
dev. However, Istio will also support routing traffic to multiple namespaces. For example, a typical non-prod Kubernetes cluster might support
uat, each associated with a different sub-domain. One way to support multiple Namespaces with Istio 1.0 is to add each host to the Istio Gateway (lines 14–16, below), then create a separate Istio VirtualService for each Namespace. All the VirtualServices are associated with the single Gateway. In the VirtualService, each service’s host address is the fully qualified domain name (FQDN) of the service. Part of the FQDN is the Namespace, which we change for each for each VirtualService (gist).
MongoDB Atlas Secret
There is one Kubernetes Secret for the sensitive MongoDB configuration and one Secret for the sensitive Confluent Cloud configuration. The Kubernetes Secret object type is intended to hold sensitive information, such as passwords, OAuth tokens, and SSH keys.
Kubernetes Secrets are Base64 encoded. The easiest way to encode the secret values is using the Linux
base64 program. The
base64 program encodes and decodes Base64 data, as specified in RFC 4648. Pass each MongoDB URI string to the
base64 program using
MONGODB_URI=mongodb+srv://demo_user:your_password@your_cluster_address/accounts?retryWrites=true echo -n $MONGODB_URI | base64 bW9uZ29kYitzcnY6Ly9kZW1vX3VzZXI6eW91cl9wYXNzd29yZEB5b3VyX2NsdXN0ZXJfYWRkcmVzcy9hY2NvdW50cz9yZXRyeVdyaXRlcz10cnVl
Repeat this process for the three MongoDB connection strings.
Confluent Cloud Secret
The confluent-cloud-kafka-secret.yaml file contains two data fields in the Secret’s data map,
sasl.jaas.config. These configuration items were both listed in the Client Config Java tab of the Confluent Cloud web-based management console, as shown previously. The
sasl.jaas.config data field requires the Confluent Cloud cluster API Key and Secret you created earlier. Again, use the base64 encoding process for these two data fields (gist).
Confluent Cloud ConfigMap
Accounts Deployment Resource
To see how the services consume the ConfigMap and Secret values, review the Accounts Deployment resource, shown below. Note the environment variables section, on lines 44–90, are a mix of hard-coded values and values referenced from the ConfigMap and two Secrets, shown above (gist).
Modify Microservices for Confluent Cloud
As explained earlier, Confluent Cloud’s Kafka cluster requires some very specific configuration, based largely on the security features of Confluent Cloud. Connecting to Confluent Cloud requires some minor modifications to the existing storefront service source code. The changes are identical for all three services. To understand the service’s code, I suggest reviewing the previous post, Using Eventual Consistency and Spring for Kafka to Manage a Distributed Data Model: Part 1. Note the following changes are already made to the source code in the
gke git branch, and not necessary for this demo.
The previous Kafka
ReceiverConfig Java classes have been converted to Java interfaces. There are four new
ReceiverConfigNonConfluent classes, which implement one of the new interfaces. The new classes contain the Spring Boot Profile class-level annotation. One set of Sender and Receiver classes are assigned the
@Profile("gke") annotation, and the others, the
@Profile("!gke") annotation. When the services start, one of the two class implementations are is loaded, depending on the Active Spring Profile,
gke or not
gke. To understand the changes better, examine the Account service’s SenderConfigConfluent.java file (gist).
Line 20: Designates this class as belonging to the
gke Spring Profile.
Line 23: The class now implements an interface.
Lines 25–44: Reference the Confluent Cloud Kafka cluster configuration. The values for these variables will come from the Kubernetes ConfigMap and Secret, described previously, when the services are deployed to GKE.
Lines 55–59: Additional properties that have been added to the Kafka Sender configuration properties, specifically for Confluent Cloud.
Once code changes were completed and tested, the Docker Image for each service was rebuilt and uploaded to Docker Hub for public access. When recreating the images, the version of the Java Docker base image was upgraded from the previous post to Alpine OpenJDK 12 (
Google Kubernetes Engine (GKE) with Istio
Having created the MongoDB Atlas and Confluent Cloud clusters, built the Kubernetes and Istio resources, modified the service’s source code, and pushed the new Docker Images to Docker Hub, the GKE cluster may now be built.
For the sake of brevity, we will manually create the cluster and deploy the resources, using the Google Cloud SDK gcloud and Kubernetes kubectl CLI tools, as opposed to automating with CI/CD tools, like Jenkins or Spinnaker. For this demonstration, I suggest a minimally-sized two-node GKE cluster using n1-standard-2 machine-type instances. The latest available release of Kubernetes on GKE at the time of this post was 1.11.5-gke.5 and Istio 1.03 (Istio on GKE still considered beta). Note Kubernetes and Istio are evolving rapidly, thus the configuration flags often change with newer versions. Check the GKE Clusters tab for the latest
clusters create command format (gist).
Executing these commands successfully will build the cluster and the
dev Namespace, into which all the resources will be deployed. The two-node cluster creation process takes about three minutes on average.
We can also observe the new GKE cluster from the GKE Clusters Details tab.
Creating the GKE cluster also creates several other GCP resources, including a TCP load balancer and three external IP addresses. Shown below in the VPC network External IP addresses tab, there is one IP address associated with each of the two GKE cluster’s VM instances, and one IP address associated with the frontend of the load balancer.
While the TCP load balancer’s frontend is associated with the external IP address, the load balancer’s backend is a target pool, containing the two GKE cluster node machine instances.
A forwarding rule associates the load balancer’s frontend IP address with the backend target pool. External requests to the frontend IP address will be routed to the GKE cluster. From there, requests will be routed by Kubernetes and Istio to the individual storefront service Pods, and through the Istio sidecar (Envoy) proxies. There is an Istio sidecar proxy deployed to each Storefront service Pod.
Below, we see the details of the load balancer’s target pool, containing the two GKE cluster’s VMs.
As shown at the start of the post, a simplified view of the GCP/GKE network routing looks as follows. For brevity, firewall rules and routes are not illustrated in the diagram.
Apply Kubernetes Resources
Again, using kubectl, deploy the three services and associated Kubernetes and Istio resources. Note the Istio Gateway and VirtualService(s) are not deployed to the
dev Namespace since their role is to control ingress and route traffic to the
dev Namespace and the services within it (gist).
Once these commands complete successfully, on the Workloads tab, we should observe two Pods of each of the three storefront service Kubernetes Deployments deployed to the
dev Namespace, all six Pods with a Status of ‘OK’. A Deployment controller provides declarative updates for Pods and ReplicaSets.
On the Services tab, we should observe the three storefront service’s Kubernetes Services. A Service in Kubernetes is a REST object.
On the Configuration Tab, we should observe the Kubernetes ConfigMap and two Secrets also deployed to the dev Environment.
Below, we see the confluent-cloud-kafka ConfigMap resource with its data map of Confluent Cloud configuration.
Below, we see the confluent-cloud-kafka Secret with its data map of sensitive Confluent Cloud configuration.
Test the Storefront API
If you recall from part two of the previous post, there are a set of seven Storefront API endpoints that can be called to create sample data and test the API. The HTTP GET Requests hit each service, generate test data, populate the three MongoDB databases, and produce and consume Kafka messages across all three topics. Making these requests is the easiest way to confirm the Storefront API is working properly.
- Sample Customer: accounts/customers/sample
- Sample Orders: orders/customers/sample/orders
- Sample Fulfillment Requests: orders/customers/sample/fulfill
- Sample Processed Order Event: fulfillment/fulfillment/sample/process
- Sample Shipped Order Event: fulfillment/fulfillment/sample/ship
- Sample In-Transit Order Event: fulfillment/fulfillment/sample/in-transit
- Sample Received Order Event: fulfillment/fulfillment/sample/receive
Thee are a wide variety of tools to interact with the Storefront API. The project includes a simple Python script, sample_data.py, which will make HTTP GET requests to each of the above endpoints, after confirming their health, and return a success message.
Postman, my personal favorite, is also an excellent tool to explore the Storefront API resources. I have the above set of the HTTP GET requests saved in a Postman Collection. Using Postman, below, we see the response from an HTTP GET request to the
Postman also allows us to create integration tests and run Collections of Requests in batches using Postman’s Collection Runner. To test the Storefront API, below, I used Collection Runner to run a single series of integration tests, intended to confirm the API’s functionality, by checking for expected HTTP response codes and expected values in the response payloads. Postman also shows the response times from the Storefront API. Since this platform was not built to meet Production SLAs, measuring response times is less critical in the Development environment.
If you recall, the GKE cluster had the Stackdriver Kubernetes option enabled, which gives us, amongst other observability features, access to all cluster, node, pod, and container logs. To confirm data is flowing to the MongoDB databases and Kafka topics, we can check the logs from any of the containers. Below we see the logs from the two Accounts Pod containers. Observe the
AfterSaveListener handler firing on an
onAfterSave event, which sends a
CustomerChangeEvent payload to the
accounts.customer.change Kafka topic, without error. These entries confirm that both Atlas and Confluent Cloud are reachable by the GKE-based workloads, and appear to be functioning properly.
MongoDB Atlas Collection View
Review the MongoDB Atlas Clusters Collections tab. In this Development environment, the MongoDB databases and collections are created the first time a service tries to connects to them. In Production, the databases would be created and secured in advance of deploying resources. Once the sample data requests are completed successfully, you should now observe the three Storefront API databases, each with collections of documents.
In addition to the Atlas web-based management console, MongoDB Compass is an excellent desktop tool to explore and manage MongoDB databases. Compass is available for Mac, Linux, and Windows. One of the many great features of Compass is the ability to visualize collection schemas and interactively filter documents. Below we see the
fulfillment.requests collection schema.
Confluent Control Center
Confluent Control Center is a downloadable, web browser-based tool for managing and monitoring Apache Kafka, including your Confluent Cloud clusters. Confluent Control Center provides rich functionality for building and monitoring production data pipelines and streaming applications. Confluent offers a free 30-day trial of Confluent Control Center. Since the Control Center is provided at an additional fee, and I found difficult to configure for Confluent Cloud clusters based on Confluent’s documentation, I chose not to cover it in detail, for this post.
Tear Down Cluster
Delete your Confluent Cloud and MongoDB clusters using their web-based management consoles. To delete the GKE cluster and all deployed Kubernetes resources, use the
cluster delete command. Also, double-check that the external IP addresses and load balancer, associated with the cluster, were also deleted as part of the cluster deletion (gist).
In this post, we have seen how easy it is to integrate Cloud-based DBaaS and MaaS products with the managed Kubernetes services from GCP, AWS, and Azure. As this post demonstrated, leading SaaS providers have sufficiently matured the integration capabilities of their product offerings to a point where it is now reasonable for enterprises to architect multi-vendor, single- and multi-cloud Production platforms, without re-engineering existing cloud-native applications.
In future posts, we will revisit this Storefront API example, further demonstrating how to enable HTTPS (Securing Your Istio Ingress Gateway with HTTPS) and end-user authentication (Istio End-User Authentication for Kubernetes using JSON Web Tokens (JWT) and Auth0)
All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.
In this two-part post, we are exploring the creation of a GKE cluster, replete with the latest version of Istio, often referred to as IoK (Istio on Kubernetes). We will then deploy, perform integration testing, and promote an application across multiple environments within the cluster.
In Part One of this post, we created a Kubernetes cluster on the Google Cloud Platform, installed Istio, provisioned a PostgreSQL database, and configured DNS for routing. Under the assumption that v1 of the Election microservice had already been released to Production, we deployed v1 to each of the three namespaces.
In Part Two of this post, we will learn how to utilize the advanced API testing capabilities of Postman and Newman to ensure v2 is ready for UAT and release to Production. We will deploy and perform integration testing of a new v2 of the Election microservice, locally on Kubernetes Minikube. Once confident v2 is functioning as intended, we will promote and test v2 across the
As a reminder, all source code for this post can be found on GitHub. The project’s README file contains a list of the Election microservice’s endpoints. To get started quickly, use one of the two following options (gist).
Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
This project includes a kubernetes sub-directory, containing all the Kubernetes resource files and scripts necessary to recreate the example shown in the post.
Testing Locally with Minikube
Deploying to GKE, no matter how automated, takes time and resources, whether those resources are team members or just compute and system resources. Before deploying v2 of the Election service to the non-prod GKE cluster, we should ensure that it has been thoroughly tested locally. Local testing should include the following test criteria:
- Source code builds successfully
- All unit-tests pass
- A new Docker Image can be created from the build artifact
- The Service can be deployed to Kubernetes (Minikube)
- The deployed instance can connect to the database and execute the Liquibase changesets
- The deployed instance passes a minimal set of integration tests
Minikube gives us the ability to quickly iterate and test an application, as well as the Kubernetes and Istio resources required for its operation, before promoting to GKE. These resources include Kubernetes Namespaces, Secrets, Deployments, Services, Route Rules, and Istio Ingresses. Since Minikube is just that, a miniature version of our GKE cluster, we should be able to have a nearly one-to-one parity between the Kubernetes resources we apply locally and those applied to GKE. This post assumes you have the latest version of Minikube installed, and are familiar with its operation.
This project includes a minikube sub-directory, containing all the Kubernetes resource files and scripts necessary to recreate the Minikube deployment example shown in this post. The three included scripts are designed to be easily adapted to a CI/CD DevOps workflow. You may need to modify the scripts to match your environment’s configuration. Note this Minikube-deployed version of the Election service relies on the external Amazon RDS database instance.
Local Database Version
To eliminate the AWS costs, I have included a second, alternate version of the Minikube Kubernetes resource files, minikube_db_local This version deploys a single containerized PostgreSQL database instance to Minikube, as opposed to relying on the external Amazon RDS instance. Be aware, the database does not have persistent storage or an Istio sidecar proxy.
If you do not have a running Minikube cluster, create one with the
minikube start command.
Minikube allows you to use normal
kubectl CLI commands to interact with the Minikube cluster. Using the
kubectl get nodes command, we should see a single Minikube node running the latest Kubernetes v1.10.0.
Istio on Minikube
Next, install Istio following Istio’s online installation instructions. A basic Istio installation on Minikube, without the additional add-ons, should only require a single Istio install script.
If successful, you should observe a new
istio-system namespace, containing the four main Istio components:
Deploy v2 to Minikube
Next, create a Minikube Development environment, consisting of a
dev Namespace, Istio Ingress, and Secret, using the
part1-create-environment.sh script. Next, deploy v2 of the Election service to the
dev Namespace, along with an associated Route Rule, using the
part2-deploy-v2.sh script. One v2 instance should be sufficient to satisfy the testing requirements.
Access to v2 of the Election service on Minikube is a bit different than with GKE. When routing external HTTP requests, there is no load balancer, no external public IP address, and no public DNS or subdomains. To access the single instance of v2 running on Minikube, we use the local IP address of the Minikube cluster, obtained with the
minikube ip command. The access port required is the Node Port (
nodePort) of the
istio-ingress Service. The command is shown below (gist) and included in the
The second part of our HTTP request routing is the same as with GKE, relying on an Istio Route Rules. The
/v2/ sub-collection resource in the HTTP request URL is rewritten and routed to the v2 election Pod by the Route Rule. To confirm v2 of the Election service is running and addressable, curl the
/v2/actuator/health endpoint. Spring Actuator’s
/health endpoint is frequently used at the end of a CI/CD server’s deployment pipeline to confirm success. The Spring Boot application can take a few minutes to fully start up and be responsive to requests, depending on the speed of your local machine.
Using the Kubernetes Dashboard, we should see our deployment of the single Election service Pod is running successfully in Minikube’s
Once deployed, we run a battery of integration tests to confirm that the new v2 functionality is working as intended before deploying to GKE. In the next section of this post, we will explore the process creating and managing Postman Collections and Postman Environments, and how to automate those Collections of tests with Newman and Jenkins.
The typical reason an application is deployed to lower environments, prior to Production, is to perform application testing. Although definitions vary across organizations, testing commonly includes some or all of the following types: Integration Testing, Functional Testing, System Testing, Stress or Load Testing, Performance Testing, Security Testing, Usability Testing, Acceptance Testing, Regression Testing, Alpha and Beta Testing, and End-to-End Testing. Test teams may also refer to other testing forms, such as Whitebox (Glassbox), Blackbox Testing, Smoke, Validation, or Sanity Testing, and Happy Path Testing.
The site, softwaretestinghelp.com, defines integration testing as, ‘testing of all integrated modules to verify the combined functionality after integration is termed so. Modules are typically code modules, individual applications, client and server applications on a network, etc. This type of testing is especially relevant to client/server and distributed systems.’
In this post, we are concerned that our integrated modules are functioning cohesively, primarily the Election service, Amazon RDS database, DNS, Istio Ingress, Route Rules, and the Istio sidecar Proxy. Unlike Unit Testing and Static Code Analysis (SCA), which is done pre-deployment, integration testing requires an application to be deployed and running in an environment.
I have chosen Postman, along with Newman, to execute a Collection of integration tests before promoting to the next environment. The integration tests confirm the deployed application’s name and version. The integration tests then perform a series of HTTP GET, POST, PUT, PATCH, and DELETE actions against the service’s resources. The integration tests verify a successful HTTP response code is returned, based on the type of request made.
/candidates endpoint. We then use the stored candidate ID in proceeding HTTP GET, PUT, and PATCH test requests to the same
Environment-specific variables, such as the resource host, port, and environment sub-collection resource, are abstracted and stored as key/value pairs within Postman Environments, and called through variables in the request URL and within the tests. Thus, the same Postman Collection of tests may be run against multiple environments using different Postman Environments.
Postman Runner allows us to run multiple iterations of our Collection. We also have the option to build in delays between tests. Lastly, Postman Runner can load external JSON and CSV formatted test data, which is beyond the scope of this post.
Postman contains a simple Run Summary UI for viewing test results.
To support running tests from the command line, Postman provides Newman. According to Postman, Newman is a command-line collection runner for Postman. Newman offers the same functionality as Postman’s Collection Runner, all part of the
newman CLI. Newman is Node.js module, installed globally as an npm package,
npm install newman --global.
Typically, Development and Testing teams compose Postman Collections and define Postman Environments, locally. Teams run their tests locally in Postman, during their development cycle. Then, those same Postman Collections are executed from the command line, or more commonly as part of a CI/CD pipeline, such as with Jenkins.
Below, the same Collection of integration tests ran in the Postman Runner UI, are run from the command line, using Newman.
Without a doubt, Jenkins is the leading open-source CI/CD automation server. The building, testing, publishing, and deployment of microservices to Kubernetes is relatively easy with Jenkins. Generally, you would build, unit-test, push a new Docker image, and then deploy your application to Kubernetes using a series of CI/CD pipelines. Below, we see examples of these pipelines using Jenkins Blue Ocean, starting with a continuous integration pipeline, which includes unit-testing and Static Code Analysis (SCA) with SonarQube.
Followed by a pipeline to build the Docker Image, using the build artifact from the above pipeline, and pushes the Image to Docker Hub.
The third pipeline that demonstrates building the three Kubernetes environments and deploying v1 of the Election service to the dev namespace. This pipeline is just for demonstration purposes; typically, you would separate these functions.
An alternative to Jenkins for the deployment of microservices is Spinnaker, created by Netflix. According to Netflix, ‘Spinnaker is an open source, multi-cloud continuous delivery platform for releasing software changes with high velocity and confidence.’ Spinnaker is designed to integrate easily with Jenkins, dividing responsibilities for continuous integration and delivery, with deployment. Below, Spinnaker two sample deployment pipelines, similar to Jenkins, for deploying v1 and v2 of the Election service to the non-prod GKE cluster.
Below, Spinnaker has deployed v2 of the Election service to dev using a Highlander deployment strategy. Subsequently, Spinnaker has deployed v2 to test using a Red/Black deployment strategy, leaving the previously released v1 Server Group in place, in case a rollback is required.
Once Spinnaker is has completed the deployment tasks, the Postman Collections of smoke and integration tests are executed by Newman, as part of another Jenkins CI/CD pipeline.
In this pipeline, a set of basic smoke tests is run first to ensure the new deployment is running properly, and then the integration tests are executed.
Newman offers several options for displaying test results. For easy integration with Jenkins, Newman results can be delivered in a format that can be displayed as JUnit test reports. The JUnit test report format, XML, is a popular method of standardizing test results from different testing tools. Below is a truncated example of a test report file (gist).
Translating Newman test results to JUnit reports allows the percentage of test cases successfully executed, to be tracked over multiple deployments, a universal testing metric. Below we see the JUnit Test Reports Test Result Trend graph for a series of test runs.
Deploying to Development
Development environments typically have a rapid turnover of application versions. Many teams use their Development environment as a continuous integration environment, where every commit that successfully builds and passes all unit tests, is deployed. The purpose of the CI deployments is to ensure build artifacts will successfully deploy through the CI/CD pipeline, start properly, and pass a basic set of smoke tests.
Other teams use the Development environments as an extension of their local Minikube environment. The Development environment will possess some or all of the required external integration points, which the Developer’s local Minikube environment may not. The goal of the Development environment is to help Developers ensure their application is functioning correctly and is ready for the Test teams to evaluate, prior to promotion to the Test environment.
Some external integration points, such as external payment gateways, customer relationship management (CRM) systems, content management systems (CMS), or data analytics engines, are often stubbed-out in lower environments. Generally, third-party providers only offer a limited number of parallel non-Production integration environments. While an application may pass through several non-prod environments, testing against all external integration points will only occur in one or two of those environments.
With v2 of the Election service ready for testing on GKE, we deploy it to the GKE cluster’s dev namespace using the
part4a-deploy-v2-dev.sh script. We will also delete the previous v1 version of the Election service. Similar to the v1 deployment script, the v2 scripts perform a
kube-inject command, which manually injects the Istio sidecar proxy alongside the Election service, into each election v2 Pod. The deployment script also deploys an alternate Istio Route Rule, which routes requests to
api.dev.voter-demo.com/v2/* resource of v2 of the Election service.
Once deployed, we run our Postman Collection of integration tests with Newman or as part of a CI/CD pipeline. In the Development environment, we may choose to run a limited set of tests for the sake of expediency, or because not all external integration points are accessible.
Promotion to Test
With local Minikube and Development environment testing complete, we promote and deploy v2 of the Election service to the Test environment, using the
part4b-deploy-v2-test.sh script. In Test, we will not delete v1 of the Election service.
Often, an organization will maintain a running copy of all versions of an application currently deployed to Production, in a lower environment. Let’s look at two scenarios where this is common. First, v1 of the Election service has an issue in Production, which needs to be confirmed and may require a hot-fix by the Development team. Validation of the v1 Production bug is often done in a lower environment. The second scenario for having both versions running in an environment is when v1 and v2 both need to co-exist in Production. Organizations frequently support multiple API versions. Cutting over an entire API user-base to a new API version is often completed over a series of releases, and requires careful coordination with API consumers.
Testing All Versions
An essential role of integration testing should be to confirm that both versions of the Election service are functioning correctly, while simultaneously running in the same namespace. For example, we want to verify traffic is routed correctly, based on the HTTP request URL, to the correct version. Another common test scenario is database schema changes. Suppose we make what we believe are backward-compatible database changes to v2 of the Election service. We should be able to prove, through testing, that both the old and new versions function correctly against the latest version of the database schema.
There are different automation strategies that could be employed to test multiple versions of an application without creating separate Collections and Environments. A simple solution would be to templatize the Environments file, and then programmatically change the Postman Environment’s
version variable injected from a pipeline parameter (abridged environment file shown below).
Once initial automated integration testing is complete, Test teams will typically execute additional forms of application testing if necessary, before signing off for UAT and Performance Testing to begin.
With testing in the Test environments completed, we continue onto UAT. The term UAT suggest that a set of actual end-users (API consumers) of the Election service will perform their own testing. Frequently, UAT is only done for a short, fixed period of time, often with a specialized team of Testers. Issues experienced during UAT can be expensive and impact the ability to release an application to Production on-time if sign-off is delayed.
After deploying v2 of the Election service to UAT, and before opening it up to the UAT team, we would naturally want to repeat the same integration testing process we conducted in the previous Test environment. We must ensure that v2 is functioning as expected before our end-users begin their testing. This is where leveraging a tool like Jenkins makes automated integration testing more manageable and repeatable. One strategy would be to duplicate our existing Development and Test pipelines, and re-target the new pipeline to call v2 of the Election service in UAT.
Again, in a JUnit report format, we can examine individual results through the Jenkins Console.
We can also examine individual results from each test run using a specific build’s Console Output.
Testing and Instrumentation
To fully evaluate the integration test results, you must look beyond just the percentage of test cases executed successfully. It makes little sense to release a new version of an application if it passes all functional tests, but significantly increases client response times, unnecessarily increases memory consumption or wastes other compute resources, or is grossly inefficient in the number of calls it makes to the database or third-party dependencies. Often times, integration testing uncovers potential performance bottlenecks that are incorporated into performance test plans.
Critical intelligence about the performance of the application can only be obtained through the use of logging and metrics collection and instrumentation. Istio provides this telemetry out-of-the-box with Zipkin, Jaeger, Service Graph, Fluentd, Prometheus, and Grafana. In the included Grafana Istio Dashboard below, we see the performance of v1 of the Election service, under test, in the Test environment. We can compare request and response payload size and timing, as well as request and response times to external integration points, such as our Amazon RDS database. We are able to observe the impact of individual test requests on the application and all its integration points.
As part of integration testing, we should monitor the Amazon RDS CloudWatch metrics. CloudWatch allows us to evaluate critical database performance metrics, such as the number of concurrent database connections, CPU utilization, read and write IOPS, Memory consumption, and disk storage requirements.
A discussion of metrics starts moving us toward load and performance testing against Production service-level agreements (SLAs). Using a similar approach to integration testing, with load and performance testing, we should be able to accurately estimate the sizing requirements our new application for Production. Load and Performance Testing helps answer questions like the type and size of compute resources are required for our GKE Production cluster and for our Amazon RDS database, or how many compute nodes and number of instances (Pods) are necessary to support the expected user-load.
All opinions expressed in this post are my own, and not necessarily the views of my current or past employers, or their clients.
In the following two-part post, we will explore the creation of a GKE cluster, replete with the latest version of Istio, often referred to as IoK (Istio on Kubernetes). We will then deploy, perform integration testing, and promote an application across multiple environments within the cluster.
Application Environment Management
Container orchestration engines, such as Kubernetes, have revolutionized the deployment and management of microservice-based architectures. Combined with a Service Mesh, such as Istio, Kubernetes provides a secure, instrumented, enterprise-grade platform for modern, distributed applications.
One of many challenges with any platform, even one built on Kubernetes, is managing multiple application environments. Whether applications run on bare-metal, virtual machines, or within containers, deploying to and managing multiple application environments increases operational complexity.
As Agile software development practices continue to increase within organizations, the need for multiple, ephemeral, on-demand environments also grows. Traditional environments that were once only composed of Development, Test, and Production, have expanded in enterprises to include a dozen or more environments, to support the many stages of the modern software development lifecycle. Current application environments often include Continous Integration and Delivery (CI), Sandbox, Development, Integration Testing (QA), User Acceptance Testing (UAT), Staging, Performance, Production, Disaster Recovery (DR), and Hotfix. Each environment requiring its own compute, security, networking, configuration, and corresponding dependencies, such as databases and message queues.
Environments and Kubernetes
There are various infrastructure architectural patterns employed by Operations and DevOps teams to provide Kubernetes-based application environments to Development teams. One pattern consists of separate physical Kubernetes clusters. Separate clusters provide a high level of isolation. Isolation offers many advantages, including increased performance and security, the ability to tune each cluster’s compute resources to meet differing SLAs, and ensuring a reduced blast radius when things go terribly wrong. Conversely, separate clusters often result in increased infrastructure costs and operational overhead, and complex deployment strategies. This pattern is often seen in heavily regulated, compliance-driven organizations, where security, auditability, and separation of duties are paramount.
An alternative to separate physical Kubernetes clusters is virtual clusters. Virtual clusters are created using Kubernetes Namespaces. According to Kubernetes documentation, ‘Kubernetes supports multiple virtual clusters backed by the same physical cluster. These virtual clusters are called namespaces’.
In most enterprises, Operations and DevOps teams deliver a combination of both virtual and physical Kubernetes clusters. For example, lower environments, such as those used for Development, Test, and UAT, often reside on the same physical cluster, each in a separate virtual cluster (namespace). At the same time, environments such as Performance, Staging, Production, and DR, often require the level of isolation only achievable with physical Kubernetes clusters.
In the Cloud, physical clusters may be further isolated and secured using separate cloud accounts. For example, with AWS you might have a Non-Production AWS account and a Production AWS account, both managed by an AWS Organization.
In a multi-environment scenario, a single physical cluster would contain multiple namespaces, into which separate versions of an application or applications are independently deployed, accessed, and tested. Below we see a simple example of a single Kubernetes non-prod cluster on the left, containing multiple versions of different microservices, deployed across three namespaces. You would likely see this type of deployment pattern as applications are deployed, tested, and promoted across lower environments, before being released to Production.
To demonstrate the promotion and testing of an application across multiple environments, we will use a simple election-themed microservice, developed for a previous post, Developing Cloud-Native Data-Centric Spring Boot Applications for Pivotal Cloud Foundry. The Spring Boot-based application allows API consumers to create, read, update, and delete, candidates, elections, and votes, through an exposed set of resources, accessed via RESTful endpoints.
All source code for this post can be found on GitHub. The project’s README file contains a list of the Election microservice’s endpoints. To get started quickly, use one of the two following options (gist).
Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
This project includes a kubernetes sub-directory, containing all the Kubernetes resource files and scripts necessary to recreate the example shown in the post. The scripts are designed to be easily adapted to a CI/CD DevOps workflow. You will need to modify the script’s variables to match your own environment’s configuration.
The post’s Spring Boot application relies on a PostgreSQL database. In the previous post, ElephantSQL was used to host the PostgreSQL instance. This time, I have used Amazon RDS for PostgreSQL. Amazon RDS for PostgreSQL and ElephantSQL are equivalent choices. For simplicity, you might also consider a containerized version of PostgreSQL, managed as part of your Kubernetes environment.
Ideally, each environment should have a separate database instance. Separate database instances provide better isolation, fine-grained RBAC, easier test data lifecycle management, and improved performance. Although, for this post, I suggest a single, shared, minimally-sized RDS instance.
The PostgreSQL database’s sensitive connection information, including database URL, username, and password, are stored as Kubernetes Secrets, one secret for each namespace, and accessed by the Kubernetes Deployment controllers.
Although not required, Istio makes the task of managing multiple virtual and physical clusters significantly easier. Following Istio’s online installation instructions, download and install Istio 0.7.1.
To create a Google Kubernetes Engine (GKE) cluster with Istio, you could use
container clusters create command, followed by installing Istio manually using Istio’s supplied Kubernetes resource files. This was the method used in the previous post, Deploying and Configuring Istio on Google Kubernetes Engine (GKE).
Alternatively, you could use Istio’s Google Cloud Platform (GCP) Deployment Manager files, along with the
deployment-manager deployments create command to create a Kubernetes cluster, replete with Istio, in a single step. Although arguably simpler, the
deployment-manager method does not provide the same level of fine-grain control over cluster configuration as the container clusters create method. For this post, the
deployment-manager method will suffice.
The latest version of the Google Kubernetes Engine, available at the time of this post, is 1.9.6-gke.0. However, to install this version of Kubernetes Engine using the Istio’s supplied deployment Manager Jinja template requires updating the hardcoded value in the
istio-cluster.jinja file from 1.9.2-gke.1. This has been updated in the next release of Istio.
Another change, the latest version of Istio offered as an option in the istio-cluster-jinja.schema file. Specifically, the
installIstioRelease configuration variable is only 0.6.0. The template does not include 0.7.1 as an option. Modify the
istio-cluster-jinja.schema file to include the choice of 0.7.1. Optionally, I also set 0.7.1 as the default. This change should also be included in the next version of Istio.
There are a limited number of GKE and Istio configuration defaults defined in the
istio-cluster.yaml file, all of which can be overridden from the command line.
To optimize the cluster, and keep compute costs to a minimum, I have overridden several of the default configuration values using the properties flag with the gcloud CLI’s
deployment-manager deployments create command. The README file provided by Istio explains how to use this feature. Configuration changes include the name of the cluster, the version of Istio (0.7.1), the number of nodes (2), the GCP zone (us-east1-b), and the node instance type (n1-standard-1). I also disabled automatic sidecar injection and chose not to install the Istio sample book application onto the cluster (gist).
To provision the GKE cluster and deploy Istio, first modify the variables in the
part1-create-gke-cluster.sh file (shown above), then execute the script. The script also retrieves your cluster’s credentials, to enable command line interaction with the cluster using the
Once complete, validate the version of Istio by examining Istio’s Docker image versions, using the following command (gist).
The result should be a list of Istio 0.7.1 Docker images.
The new cluster should be running GKE version 1.9.6.gke.0. This can be confirmed using the following command (gist).
Or, from the GCP Cloud Console.
The new GKE cluster should be composed of (2) n1-standard-1 nodes, running in the us-east-1b zone.
As part of the deployment, all of the separate Istio components should be running within the
As part of the deployment, an external IP address and a load balancer were provisioned by GCP and associated with the Istio Ingress. GCP’s Deployment Manager should have also created the necessary firewall rules for cluster ingress and egress.
Building the Environments
Next, we will create three namespaces,
uat, which represent three non-production environments. Each environment consists of a Kubernetes Namespace, Istio Ingress, and Secret. The three environments are deployed using the
Deploying Election v1
For this demonstration, we will assume v1 of the Election service has been previously promoted, tested, and released to Production. Hence, we would expect v1 to be deployed to each of the lower environments. Additionally, a new v2 of the Election service has been developed and tested locally using Minikube. It is ready for deployment to the three environments and will undergo integration testing (detailed in Part Two of the post).
If you recall from our GKE/Istio configuration, we chose manual sidecar injection of the Istio proxy. Therefore, all election deployment scripts perform a
kube-inject command. To connect to our external Amazon RDS database, this
kube-inject command requires the
includeIPRanges flag, which contains two cluster configuration values, the cluster’s IPv4 CIDR (
clusterIpv4Cidr) and the service’s IPv4 CIDR (
Before deployment, we export the
includeIPRanges value as an environment variable, which will be used by the deployment scripts, using the following command,
export IP_RANGES=$(sh ./get-cluster-ip-ranges.sh). The
get-cluster-ip-ranges.sh script is shown below (gist).
Using this method with manual sidecar injection is discussed in the previous post, Deploying and Configuring Istio on Google Kubernetes Engine (GKE).
To deploy v1 of the Election service to all three namespaces, execute the
We should now have two instances of v1 of the Election service, running in the
uat namespaces, for a total of six election-v1 Kubernetes Pods.
HTTP Request Routing
Before deploying additional versions of the Election service in Part Two of this post, we should understand how external HTTP requests will be routed to different versions of the Election service, in multiple namespaces. In the post’s simple example, we have a matrix of three namespaces and two versions of the Election service. That means we need a method to route external traffic to up to six different election versions. There multiple ways to solve this problem, each with their own pros and cons. For this post, I found a combination of DNS and HTTP request rewriting is most effective.
First, to route external HTTP requests to the correct namespace, we will use subdomains. Using my current DNS management solution, Azure DNS, I create three new A records for my registered domain,
voter-demo.com. There is one A record for each namespace, including
All three subdomains should resolve to the single external IP address assigned to the cluster’s load balancer.
istio-ingress service load balancer, running in the
istio-system namespace, routes inbound external traffic, based on the Request URL, to the Istio Ingress in the appropriate namespace.
The Istio Ingress in the namespace then directs the traffic to one of the Kubernetes Pods, containing the Election service and the Istio sidecar proxy.
To direct the HTTP request to v1 or v2 of the Election service, an Istio Route Rule is used. As part of the environment creation, along with a Namespace and Ingress resources, we also deployed an Istio Route Rule to each environment. This particular route rule examines the HTTP request URL for a
/v2/ sub-collection resource. If it finds the sub-collection resource, it performs a HTTPRewrite, removing the sub-collection resource from the HTTP request. The Route Rule then directs the HTTP request to the appropriate version of the Election service, v1 or v2 (gist).
According to Istio, ‘if there are multiple registered instances with the specified tag(s), they will be routed to based on the load balancing policy (algorithm) configured for the service (round-robin by default).’ We are using the default load balancing algorithm to distribute requests across multiple copies of each Election service.
The final external HTTP request routing for the Election service in the Non-Production GKE cluster is shown on the left, in the diagram, below. Every Election service Pod also contains an Istio sidecar proxy instance.
Below are some examples of HTTP GET requests that would be successfully routed to our Election service, using the above-described routing strategy (gist).
In Part One of this post, we created the Kubernetes cluster on the Google Cloud Platform, installed Istio, provisioned a PostgreSQL database, and configured DNS for routing. Under the assumption that v1 of the Election microservice had already been released to Production, we deployed v1 to each of the three namespaces.
In Part Two of this post, we will learn how to utilize the sophisticated API testing capabilities of Postman and Newman to ensure v2 is ready for UAT and release to Production. We will deploy and perform integration testing of a new, v2 of the Election microservice, locally, on Kubernetes Minikube. Once we are confident v2 is functioning as intended, we will promote and test v2, across the
All opinions expressed in this post are my own, and not necessarily the views of my current or past employers, or their clients.
Unquestionably, Kubernetes has quickly become the leading Container-as-a-Service (CaaS) platform. In late September 2017, Rancher Labs announced the release of Rancher 2.0, based on Kubernetes. In mid-October, at DockerCon Europe 2017, Docker announced they were integrating Kubernetes into the Docker platform. In late October, Microsoft released the public preview of Managed Kubernetes for Azure Container Service (AKS). In November, Google officially renamed its Google Container Engine to Google Kubernetes Engine. Most recently, at AWS re:Invent 2017, Amazon announced its own manged version of Kubernetes, Amazon Elastic Container Service for Kubernetes (Amazon EKS).
The recent abundance of Kuberentes-based CaaS offerings makes deploying, scaling, and managing modern distributed applications increasingly easier. However, as Craig McLuckie, CEO of Heptio, recently stated, “…it doesn’t matter who is delivering Kubernetes, what matters is how it runs.” Making Kubernetes run better is the goal of a new generation of tools, such as Istio, Envoy, Project Calico, Helm, and Ambassador.
What is Istio?
One of those new tools and the subject of this post is Istio. Released in Alpha by Google, IBM and Lyft, in May 2017, Istio is an open platform to connect, manage, and secure microservices. Istio describes itself as, “…an easy way to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more, without requiring any changes in service code. You add Istio support to services by deploying a special sidecar proxy throughout your environment that intercepts all network communication between microservices, configured and managed using Istio’s control plane functionality.”
Istio contains several components, split between the data plane and a control plane. The data plane includes the Istio Proxy (an extended version of Envoy proxy). The control plane includes the Istio Mixer, Istio Pilot, and Istio-Auth. The Istio components work together to provide behavioral insights and operational control over a microservice-based service mesh. Istio describes a service mesh as a “transparently injected layer of infrastructure between a service and the network that gives operators the controls they need while freeing developers from having to bake solutions to distributed system problems into their code.”
In this post, we will deploy the latest version of Istio, v0.4.0, on Google Cloud Platform, using the latest version of Google Kubernetes Engine (GKE), 1.8.4-gke.1. Both versions were just released in mid-December, as this post is being written. Google, as you probably know, was the creator of Kubernetes, now an open-source CNCF project. Google was the first Cloud Service Provider (CSP) to offer managed Kubernetes in the Cloud, starting in 2014, with Google Container Engine (GKE), which used Kubernetes. This post will outline the installation of Istio on GKE, as well as the deployment of a sample application, integrated with Istio, to demonstrate Istio’s observability features.
The scripts used in this post are as follows, in order of execution (gist).
Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
Creating GKE Cluster
First, we create the Google Kubernetes Engine (GKE) cluster. The GKE cluster creation is highly-configurable from either the GCP Cloud Console or from the command line, using the Google Cloud Platform gcloud CLI. The CLI will be used throughout the post. I have chosen to create a highly-available, 3-node cluster (1 node/zone) in GCP’s South Carolina us-east1 region (gist).
Once built, we need to retrieve the cluster’s credentials.
The resulting GKE cluster will have the following characteristics (gist).
With the GKE cluster created, we can now deploy Istio. There are at least two options for deploying Istio on GCP. You may choose to manually install and configure Istio in a GKE cluster, as I will do in this post, following these instructions. Alternatively, you may choose to use the Istio GKE Deployment Manager. This all-in-one GCP service will create your GKE cluster, and install and configure Istio and the Istio add-ons, including their Book Info sample application.
There were a few reasons I chose not to use the Istio GKE Deployment Manager option. First, until very recently, you could not install the latest versions of Istio with this option (as of 12/21 you can now deploy v0.3.0 and v0.4.0). Secondly, currently, you only have the choice of GKE version 1.7.8-gke.0. I wanted to test the latest v1.8.4 release with a stable GA version of RBAC. Thirdly, at least three out of four of my initial attempts to use the Istio GKE Deployment Manager failed during provisioning for unknown reasons. Lastly, you will learn more about GKE, Kubernetes, and Istio by doing it yourself, at least the first time.
Istio Code Changes
Before installing Istio, I had to make several minor code changes to my existing Kubernetes resource files. The requirements are detailed in Istio’s Pod Spec Requirements. These changes are minor, but if missed, cause errors during deployment, which can be hard to identify and resolve.
First, you need to name your Service ports in your Service resource files. More specifically, you need to name your service ports,
http, as shown in the Candidate microservice’s Service resource file, below (note line 10) (gist).
Second, an app label is required for Istio. I added an app label to each Deployment and Service resource file, as shown below in the Candidate microservice’s Deployment resource files (note lines 5 and 6) (gist).
The next set of code changes were to my existing Ingress resource file. The requirements for an Ingress resource using Istio are explained here. The first change, Istio ignores all annotations other than
kubernetes.io/ingress.class: istio (note line 7, below). The next change, if using HTTPS, the secret containing your TLS/SSL certificate and private key must be called
istio-ingress-certs; all other names will be ignored (note line 10, below). Related and critically important, that secret must be deployed to the
istio-system namespace, not the application’s namespace. The last change, for my particular my prefix match routing rules, I needed to change the rules from
/.* is a special Istio notation that is used to indicate a prefix match (note lines 14, 18, and 22, below) (gist).
To install Istio, you first will need to download and uncompress the correct distribution of Istio for your OS. Istio provides instructions for installation on various platforms.
install-istio.sh script contains a variable,
ISTIO_HOME, which should point to the root of your local Istio directory. We will also deploy all the current Istio add-ons, including Prometheus, Grafana, Zipkin, Service Graph, and Zipkin-to-Stackdriver (gist).
Once installed, from the GCP Cloud Console, an alternative to the native Kubernetes Dashboard, we should see the following Istio resources deployed and running. Below, note the three nodes are distributed across three zones within the GCP us-east-1 region, the correct version of GKE is employed, Stackdriver logging and monitoring are enabled, and the Alpha Clusters features are also enabled.
And here, we see the nodes that comprise the GKE cluster.
Below, note the four components that comprise Istio:
istio-pilot. Additionally, note the five components that comprise the Istio add-ons.
Below, observe the Istio Ingress has automatically been assigned a public IP address by GCP, accessible on ports 80 and 443. This IP address is how we will communicate with applications running on our GKE cluster, behind the Istio Ingress Load Balancer. Later, we will see how the Istio Ingress Load Balancer knows how to route incoming traffic to those application endpoints, using the Voter API’s Ingress configuration.
Istio makes ample use of Kubernetes Config Maps and Secrets, to store configuration, and to store certificates for mutual TLS.
Creation of the GKE cluster and deployed Istio to the cluster is complete. Following, I will demonstrate the deployment of the Voter API to the cluster. This will be used to demonstrate the capabilities of Istio on GKE.
In addition to the GCP Cloud Console, the native Kubernetes Dashboard is also available. To open, use the
kubectl proxy command and connect to the Kubernetes Dashboard at
https://127.0.0.1:8001/ui. You should now be able to view and edit all resources, from within the Kubernetes Dashboard.
To demonstrate the functionality of Istio and GKE, I will deploy the Voter API. I have used variations of the sample Voter API application in several previous posts, including Architecting Cloud-Optimized Apps with AKS (Azure’s Managed Kubernetes), Azure Service Bus, and Cosmos DB and Eventual Consistency: Decoupling Microservices with Spring AMQP and RabbitMQ. I suggest reading these two post to better understand the Voter API’s design.
For this post, I have reconfigured the Voter API to use MongoDB’s Atlas Database-as-a-Service (DBaaS) as the NoSQL data-source for each microservice. The Voter API is connected to a MongoDB Atlas 3-node M10 instance cluster in GCP’s us-east1 (South Carolina) region. With Atlas, you have the choice of deploying clusters to GCP or AWS.
The Voter API will use CloudAMQP’s RabbitMQ-as-a-Service for its decoupled, eventually consistent, message-based architecture. For this post, the Voter API is configured to use a RabbitMQ cluster in GCP’s us-east1 (South Carolina) region; I chose a minimally-configured free version of RabbitMQ. CloudAMQP allows you to provide a much more robust multi-node clusters for Production, on GCP or AWS.
CloudAMQP provides access to their own Management UI, in addition to access to RabbitMQ’s Management UI.
With the Voter API running and taking traffic, we can see each Voter API microservice instance, nine replicas in total, connected to RabbitMQ. They are each publishing and consuming messages off the two queues.
The GKE, MongoDB Atlas, and RabbitMQ clusters are all running in the same GCP Region. Optimizing the Voter API cloud architecture on GCP, within a single Region, greatly reduces network latency, increases API performance, and improves end-to-end application and infrastructure observability and traceability.
Installing the Voter API
For simplicity, I have divided the Voter API deployment into three parts. First, we create the new
voter-api Kubernetes Namespace, followed by creating a series of Voter API Kuberentes Secrets (gist).
There are a total of five secrets, one secret for each of the three microservice’s MongoDB databases, one secret for the RabbitMQ connection string (shown below), and one secret containing a Let’s Encrypt SSL/TLS certificate chain and private key for the Voter API’s domain,
api.voter-demo.com (shown below).
Next, we create the microservice pods, using the Kubernetes Deployment files, create three ClusterIP-type Kubernetes Services, and a Kubernetes Ingress. The Ingress contains the service endpoint configuration, which Istio Ingress will use to correctly route incoming external API traffic (gist).
Three Kubernetes Pods for each of the three microservice should be created, for a total of nine pods. In the GCP Cloud UI’s Workloads (Kubernetes Deployments), you should see the following three resources. Note each Workload has three pods, each containing one replica of the microservice.
In the GCP Cloud UI’s Discovery and Load Balancing tab, you should observe the following four resources. Note the Voter API Ingress endpoints for the three microservices, which are used by the Istio Proxy, discussed below.
Examining the Voter API deployment more closely, you will observe that each of the nine Voter API microservice pods have two containers running within them (gist).
Along with the microservice container, there is an Istio Proxy container, commonly referred to as a sidecar container. Istio Proxy is an extended version of the Envoy proxy, Lyfts well-known, highly performant edge and service proxy. The proxy sidecar container is injected automatically when the Voter API pods are created. This is possible because we deployed the Istio Initializer (
istio-initializer.yaml). The Istio Initializer guarantees that Istio Proxy will be automatically injected into every microservice Pod. This is referred to as automatic sidecar injection. Below we see an example of one of three Candidate pods running the istio-proxy sidecar.
In the example above, all traffic to and from the Candidate microservice now passes through the Istio Proxy sidecar. With Istio Proxy, we gain several enterprise-grade features, including enhanced observability, service discovery and load balancing, credential injection, and connection management.
Manual Sidecar Injection
What if we have application components we do not want automatically managed with Istio Proxy. In that case, manual sidecar injection might be preferable to automatic sidecar injection with Istio Initializer. For manual sidecar injection, we execute a
istioctl kube-inject command for each of the Kubernetes Deployments. The command manually injects the Istio Proxy container configuration into the Deployment resource file, alongside each Voter API microservice container. On Mac and Linux, this command is similar to the following. Proxy injection is discussed in detail, here (gist).
External Service Egress
Whether you choose automatic or manual sidecar injection of the Istio Proxy, Istio’s egress rules currently only support HTTP and HTTPS requests. The Voter API makes external calls to its backend services, using two alternate protocols, MongoDB Wire Protocol (
mongodb://) and RabbitMQ AMQP (
amqps://). Since we cannot use an Istio egress rule for either protocol, we will use the
includeIPRanges option with the
istioctl kube-inject command to open egress to the two backend services. This will completely bypass Istio for a specific IP range. You can read more about calling external services directly, on Istio’s website.
You will need to modify the
includeIPRanges argument within the
create-voter-api-part3.sh script, adding your own GKE cluster’s IP ranges to the
IP_RANGES variable. The two IP ranges can be found using the following GCP CLI command (gist).
create-voter-api-part3.sh script also contains a modified version the
istioctl kube-inject command for each Voter API Deployment. Using the modified command, the original Deployment files are not altered, instead, a temporary copy of the Deployment file is created into which Istio injects the required modifications. The temporary Deployment file is then used for the deployment, and then immediately deleted (gist).
Some would argue not having the actual deployed version of the file checked into in source code control is an anti-pattern; in this case, I would disagree. If I need to redeploy, I would just run the
istioctl kube-inject command again. You can always view, edit, and import the deployed YAML file, from the GCP CLI or GKE Management UI.
The amount of Istio configuration injected into each microservice Pod’s Deployment resource file is considerable. The Candidate Deployment file swelled from 68 lines to 276 lines of code! This hints at the power, as well as the complexity of Istio. Shown below is a snippet of the Candidate Deployment YAML, after Istio injection.
Confirming Voter API
Installation of the Voter API is now complete. We can validate the Voter API is working, and that traffic is being routed through Istio, using Postman. Below, we see a list of candidates successfully returned from the Voter microservice, through the Voter API. This means, not only us the API running, but that messages have been successfully passed between the services, using RabbitMQ, and saved to the microservice’s corresponding MongoDB databases.
Below, note the
x-envoy-upstream-service-time response headers. They both confirm the Voter API HTTPS traffic is being managed by Istio.
Observability is certainly one of the primary advantages of implementing Istio. For anyone like myself, who has spent many long and often frustrating hours installing, configuring, and managing monitoring systems for distributed platforms, Istio’s observability features are most welcome. Istio provides Prometheus, Grafana, Zipkin, Service Graph, and Zipkin-to-Stackdriver add-ons. Combined with the monitoring capabilities of Backend-as-a-Service providers, such as MongoDB Altas and CloudAMQP RabvbitMQ, you get considerable visibility into your application, out-of-the-box.
First, we will look at Prometheus, a leading open-source monitoring solution. The easiest way to access the Prometheus UI, or any of the other add-ons, including Prometheus, is using port-forwarding. For example with Prometheus, we use the following command (gist).
Alternatively, you could securely expose any of the Istio add-ons through the Istio Ingress, similar to how the Voter API microservice endpoints are exposed.
Prometheus collects time series metrics from both the Istio and Voter API components. Below we see two examples of typical metrics being collected; they include the 201 responses generated by the Candidate microservice, and the outflow of bytes by the Election microservice, over a given period of time.
Although Prometheus is an excellent monitoring solution, Grafana, the leading open source software for time series analytics, provides a much easier way to visualize the metrics collected by Prometheus. Conveniently, Istio provides a dynamically-configured Grafana Dashboard, which will automatically display metrics for components deployed to GKE.
Below, note the metrics collected for the Candidate and Election microservice replicas. Out-of-the-box, Grafana displays common HTTP KPIs, such as request rate, success rate, response codes, response time, and response size. Based on the version label included in the Deployment resource files, we can delineate metrics collected by the version of the Voter API microservices, in this case, v1 of the Candidate and Election microservices.
Next, we have Zipkin, a leading distributed tracing system.
Since the Voter API application uses RabbitMQ to decouple communications between services, versus direct HTTP-based IPC, we won’t see any complex multi-segment traces. We will only see traces representing traffic to and from the microservices, which passes through the Istio Ingress.
Similar to Zipkin, Service Graph is not as valuable with the Voter API application as it could be with more complex applications. Below is a Service Graph view of the Voter API showing microservice version and requests/second to each microservice.
One last tool we have to monitor our GKE cluster is Stackdriver. Stackdriver provides fine-grain monitoring, logging, and diagnostics. If you recall, we enabled Stackdriver logging and monitoring when we first provisioned the GKE cluster. Stackdrive allows us to examine the performance of the GKE cluster’s resources, review logs, and set alerts.
When we installed Istio, we also installed the Zipkin-to-Stackdriver add-on. The Stackdriver Trace Zipkin Collector is a drop-in replacement for the standard Zipkin HTTP collector that writes to Google’s free Stackdriver Trace distributed tracing service. To use Stackdriver for traces originating from Zipkin, there is additional configuration required, which is commented out of the current version of the
zipkin-to-stackdriver.yaml file (gist).
Instructions to configure the Zipkin-to-Stackdriver feature can be found here. Below is an example of how you might add the necessary configuration using a Kubernetes ConfigMap to inject the required user credentials JSON file (
zipkin-to-stackdriver-creds.json) into the
zipkin-to-stackdriver container. The new configuration can be seen on lines 27-44 (gist).
Istio provides a significant amount of fine-grained management control to Kubernetes. Managed Kubernetes CaaS offerings like GKE, coupled with tools like Istio, will soon make running reliable and secure containerized applications in Production, commonplace.
- Introducing Istio: A robust service mesh for microservices
- IBM Code: What is a Service Mesh and how Istio fits in
- Microservices Patterns With Envoy Sidecar Proxy: The series
- OpenShift: Microservices Patterns with Envoy Sidecar Proxy, Part I: Circuit Breaking
- OpenShift: Microservices Patterns with Envoy Proxy, Part II: Timeouts and Retries
- OpenShift: Microservices Patterns With Envoy Proxy, Part III: Distributed Tracing
- Using Kubernetes Configmap with configuration files…
All opinions in this post are my own, and not necessarily the views of my current or past employers or their clients.