Archive for category Kubernetes

Kubernetes-based Microservice Observability with Istio Service Mesh: Part 1

In this two-part post, we will explore the set of observability tools which are part of the Istio Service Mesh. These tools include Jaeger, Kiali, Prometheus, and Grafana. To assist in our exploration, we will deploy a Go-based, microservices reference platform to Google Kubernetes Engine, on the Google Cloud Platform.

Golang Service Diagram with Proxy v2

What is Observability?

Similar to blockchain, serverless, AI and ML, chatbots, cybersecurity, and service meshes, Observability is a hot buzz word in the IT industry right now. According to Wikipedia, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. Logs, metrics, and traces are often known as the three pillars of observability. These are the external outputs of the system, which we may observe.

The O’Reilly book, Distributed Systems Observability, by Cindy Sridharan, does an excellent job of detailing ‘The Three Pillars of Observability’, in Chapter 4. I recommend reading this free online excerpt, before continuing. A second great resource for information on observability is honeycomb.io, a developer of observability tools for production systems, led by well-known industry thought-leader, Charity Majors. The honeycomb.io site includes articles, blog posts, whitepapers, and podcasts on observability.

As modern distributed systems grow ever more complex, the ability to observe those systems demands equally modern tooling that was designed with this level of complexity in mind. Traditional logging and monitoring systems often struggle with today’s hybrid and multi-cloud, polyglot language-based, event-driven, container-based and serverless, infinitely-scalable, ephemeral-compute platforms.

Tools like Istio Service Mesh attempt to solve the observability challenge by offering native integrations with several best-of-breed, open-source telemetry tools. Istio’s integrations include Jaeger for distributed tracing, Kiali for Istio service mesh-based microservice visualization, and Prometheus and Grafana for metric collection, monitoring, and alerting. Combined with cloud platform-native monitoring and logging services, such as Stackdriver for Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP), we have a complete observability platform for modern, distributed applications.

A Reference Microservices Platform

To demonstrate the observability tools integrated with the latest version of Istio Service Mesh, we will deploy a reference microservices platform, written in Go, to GKE on GCP. I developed the reference platform to demonstrate concepts such as API management, Service Meshes, Observability, DevOps, and Chaos Engineering. The platform is comprised of (14) components, including (8) Go-based microservices, labeled generically as Service A – Service H, (1) Angular 7, TypeScript-based front-end, (4) MongoDB databases, and (1) RabbitMQ queue for event queue-based communications. The platform and all its source code is free and open source.

The reference platform is designed to generate HTTP-based service-to-service, TCP-based service-to-database (MongoDB), and TCP-based service-to-queue-to-service (RabbitMQ) IPC (inter-process communication). Service A calls Service B and Service C, Service B calls Service D and Service E, Service D produces a message on a RabbitMQ queue that Service F consumes and writes to MongoDB, and so on. These distributed communications can be observed using Istio’s observability tools when the system is deployed to a Kubernetes cluster running the Istio service mesh.

Service Responses

On the reference platform, each upstream service responds to requests from downstream services by returning a small informational JSON payload (termed a greeting in the source code).

Golang Service Diagram with Proxy v2 res

The responses are aggregated across the service call chain, resulting in an array of service responses being returned to the edge service and on to the Angular-based UI, running in the end user’s web browser. The response aggregation feature is simply used to confirm that the service-to-service communications, Istio components, and the telemetry tools are working properly.

screen_shot_2019-03-19_at_8_43_10_pm

Each Go microservice contains a /ping and /health endpoint. The /health endpoint can be used to configure Kubernetes Liveness and Readiness Probes. Additionally, the edge service, Service A, is configured for Cross-Origin Resource Sharing (CORS) using the access-control-allow-origin: * response header. CORS allows the Angular UI, running in end user’s web browser, to call the Service A /ping endpoint, which resides in a different subdomain from UI. Shown below is the Go source code for Service A.

For this demonstration, the MongoDB databases will be hosted, external to the services on GCP, on MongoDB Atlas, a MongoDB-as-a-Service, cloud-based platform. Similarly, the RabbitMQ queues will be hosted on CloudAMQP, a RabbitMQ-as-a-Service, cloud-based platform. I have used both of these SaaS providers in several previous posts. Using external services will help us understand how Istio and its observability tools collect telemetry for communications between the Kubernetes cluster and external systems.

Shown below is the Go source code for Service F, This service consumers messages from the RabbitMQ queue, placed there by Service D, and writes the messages to MongoDB.

Source Code

All source code for this post is available on GitHub in two projects. The Go-based microservices source code, all Kubernetes resources, and all deployment scripts are located in the k8s-istio-observe-backend project repository. The Angular UI TypeScript-based source code is located in the k8s-istio-observe-frontend project repository.

Docker images referenced in the Kubernetes Deployment resource files, for the Go services and UI, are all available on Docker Hub. The Go microservice Docker images were built using the official Golang Alpine base image on DockerHub, containing Go version 1.12.0. Using the Alpine image to compile the Go source code ensures the containers will be as small as possible and contain a minimal attack surface.

Note several services were updated to release v1.4.0 of the project on 3/19/2019. Please make sure you pull the latest project code from both repositories. There were changes to work with Istio 1.1.0 and enable distributed tracing with Jaeger.

System Requirements

To follow along with the post, you will need the latest version of gcloud CLI (min. ver. 239.0.0), part of the Google Cloud SDK, Helm, and the just releases Istio 1.1.0, installed and configured locally or on your build machine.

screen_shot_2019-03-19_at_9_23_17_pm.png

Set-up and Installation

To deploy the microservices platform to GKE, we will proceed in the following order.

  1. Create the MongoDB Atlas database cluster;
  2. Create the CloudAMQP RabbitMQ cluster;
  3. Modify the Kubernetes resources and scripts for your own environments;
  4. Create the GKE cluster on GCP;
  5. Deploy Istio 1.1.0 to the GKE cluster, using Helm;
  6. Create DNS records for the platform’s exposed resources;
  7. Deploy the Go-based microservices, Angular UI, and associated resources to GKE;
  8. Test and troubleshoot the platform;
  9. Observe the results in part two!

MongoDB Atlas Cluster

MongoDB Atlas is a fully-managed MongoDB-as-a-Service, available on AWS, Azure, and GCP. Atlas, a mature SaaS product, offers high-availability, guaranteed uptime SLAs, elastic scalability, cross-region replication, enterprise-grade security, LDAP integration, a BI Connector, and much more.

MongoDB Atlas currently offers four pricing plans, Free, Basic, Pro, and Enterprise. Plans range from the smallest, M0-sized MongoDB cluster, with shared RAM and 512 MB storage, up to the massive M400 MongoDB cluster, with 488 GB of RAM and 3 TB of storage.

For this post, I have created an M2-sized MongoDB cluster in GCP’s us-central1 (Iowa) region, with a single user database account for this demo. The account will be used to connect from four of the eight microservices, running on GKE.

screen_shot_2019-03-09_at_7_48_00_pm

Originally, I started with an M0-sized cluster, but the compute resources were insufficient to support the volume of calls from the Go-based microservices. I suggest at least an M2-sized cluster or larger.

CloudAMQP RabbitMQ Cluster

CloudAMQP provides full-managed RabbitMQ clusters on all major cloud and application platforms. RabbitMQ will support a decoupled, eventually consistent, message-based architecture for a portion of our Go-based microservices. For this post, I have created a RabbitMQ cluster in GCP’s us-central1 (Iowa) region, the same as our GKE cluster and MongoDB Atlas cluster. I chose a minimally-configured free version of RabbitMQ. CloudAMQP also offers robust, multi-node RabbitMQ clusters for Production use.

Modify Configurations

There are a few configuration settings you will need to change in the GitHub project’s Kubernetes resource files and Bash deployment scripts.

Istio ServiceEntry for MongoDB Atlas

Modify the Istio ServiceEntry, external-mesh-mongodb-atlas.yaml file, adding you MongoDB Atlas host address. This file allows egress traffic from four of the microservices on GKE to the external MongoDB Atlas cluster.

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: mongodb-atlas-external-mesh
spec:
  hosts:
  - {{ your_host_goes_here }}
  ports:
  - name: mongo
    number: 27017
    protocol: MONGO
  location: MESH_EXTERNAL
  resolution: NONE

Istio ServiceEntry for CloudAMQP RabbitMQ

Modify the Istio ServiceEntry, external-mesh-cloudamqp.yaml file, adding you CloudAMQP host address. This file allows egress traffic from two of the microservices to the CloudAMQP cluster.

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: cloudamqp-external-mesh
spec:
  hosts:
  - {{ your_host_goes_here }}
  ports:
  - name: rabbitmq
    number: 5672
    protocol: TCP
  location: MESH_EXTERNAL
  resolution: NONE

Istio Gateway and VirtualService Resources

There are numerous strategies you may use to route traffic into the GKE cluster, via Istio. I am using a single domain for the post, example-api.com, and four subdomains. One set of subdomains is for the Angular UI, in the dev Namespace (ui.dev.example-api.com) and the test Namespace (ui.test.example-api.com). The other set of subdomains is for the edge API microservice, Service A, which the UI calls (api.dev.example-api.com and api.test.example-api.com). Traffic is routed to specific Kubernetes Service resources, based on the URL.

According to Istio, the Gateway describes a load balancer operating at the edge of the mesh, receiving incoming or outgoing HTTP/TCP connections. Modify the Istio ingress Gateway,  inserting your own domains or subdomains in the hosts section. These are the hosts on port 80 that will be allowed into the mesh.

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: demo-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - ui.dev.example-api.com
    - ui.test.example-api.com
    - api.dev.example-api.com
    - api.test.example-api.com

According to Istio, a VirtualService defines a set of traffic routing rules to apply when a host is addressed. A VirtualService is bound to a Gateway to control the forwarding of traffic arriving at a particular host and port. Modify the project’s four Istio VirtualServices, inserting your own domains or subdomains. Here is an example of one of the four VirtualServices, in the istio-gateway.yaml file.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: angular-ui-dev
spec:
  hosts:
  - ui.dev.example-api.com
  gateways:
  - demo-gateway
  http:
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        port:
          number: 80
        host: angular-ui.dev.svc.cluster.local

Kubernetes Secret

The project contains a Kubernetes Secret, go-srv-demo.yaml, with two values. One is for the MongoDB Atlas connection string and one is for the CloudAMQP connections string. Remember Kubernetes Secret values need to be base64 encoded.

apiVersion: v1
kind: Secret
metadata:
  name: go-srv-config
type: Opaque
data:
  mongodb.conn: {{ your_base64_encoded_secret }}
  rabbitmq.conn: {{ your_base64_encoded_secret }}

On Linux and Mac, you can use the base64 program to encode the connection strings.

> echo -n "mongodb+srv://username:password@atlas-cluster.gcp.mongodb.net/test?retryWrites=true" | base64
bW9uZ29kYitzcnY6Ly91c2VybmFtZTpwYXNzd29yZEBhdGxhcy1jbHVzdGVyLmdjcC5tb25nb2RiLm5ldC90ZXN0P3JldHJ5V3JpdGVzPXRydWU=

> echo -n "amqp://username:password@rmq.cloudamqp.com/cluster" | base64
YW1xcDovL3VzZXJuYW1lOnBhc3N3b3JkQHJtcS5jbG91ZGFtcXAuY29tL2NsdXN0ZXI=

Bash Scripts Variables

The bash script, part3_create_gke_cluster.sh, contains a series of environment variables. At a minimum, you will need to change the PROJECT variable in all scripts to match your GCP project name.

# Constants - CHANGE ME!
readonly PROJECT='{{ your_gcp_project_goes_here }}'
readonly CLUSTER='go-srv-demo-cluster'
readonly REGION='us-central1'
readonly MASTER_AUTH_NETS='72.231.208.0/24'
readonly NAMESPACE='dev'
readonly GKE_VERSION='1.12.5-gke.5'
readonly MACHINE_TYPE='n1-standard-2'

The bash script, part4_install_istio.sh, includes the ISTIO_HOME variable. The value should correspond to your local path to Istio 1.1.0. On my local Mac, this value is shown below.

readonly ISTIO_HOME='/Applications/istio-1.1.0'

Deploy GKE Cluster

Next, deploy the GKE cluster using the included bash script, part3_create_gke_cluster.sh. This will create a Regional, multi-zone, 3-node GKE cluster, using the latest version of GKE at the time of this post, 1.12.5-gke.5. The cluster will be deployed to the same region as the MongoDB Atlas and CloudAMQP clusters, GCP’s us-central1 (Iowa) region. Planning where your Cloud resources will reside, for both SaaS providers and primary Cloud providers can be critical to minimizing latency for network I/O intensive applications.

screen_shot_2019-03-09_at_5_44_33_pm

Deploy Istio using Helm

With the GKE cluster and associated infrastructure in place, deploy Istio. For this post, I have chosen to install Istio using Helm, as recommended my Istio. To deploy Istio using Helm, use the included bash script, part4_install_istio.sh.

screen_shot_2019-03-09_at_5_47_57_pm

The script installs Istio, using the Helm Chart in the local Istio 1.1.0 install/kubernetes/helm/istio directory, which you installed as a requirement for this demonstration. The Istio install script overrides several default values in the Istio Helm Chart using the --set, flag. The list of available configuration values is detailed in the Istio Chart’s GitHub project. The options enable Istio’s observability features, which we will explore in part two. Features include Kiali, Grafana, Prometheus, and Jaeger.

helm install ${ISTIO_HOME}/install/kubernetes/helm/istio-init \
  --name istio-init \
  --namespace istio-system

helm install ${ISTIO_HOME}/install/kubernetes/helm/istio \
  --name istio \
  --namespace istio-system \
  --set prometheus.enabled=true \
  --set grafana.enabled=true \
  --set kiali.enabled=true \
  --set tracing.enabled=true

kubectl apply --namespace istio-system \
  -f ./resources/secrets/kiali.yaml

Below, we see the Istio-related Workloads running on the cluster, including the observability tools.

screen_shot_2019-03-09_at_5_58_35_pm

Below, we see the corresponding Istio-related Service resources running on the cluster.

screen_shot_2019-03-09_at_5_59_14_pm

Modify DNS Records

Instead of using IP addresses to route traffic the GKE cluster and its applications, we will use DNS. As explained earlier, I have chosen a single domain for the post, example-api.com, and four subdomains. One set of subdomains is for the Angular UI, in the dev Namespace and the test Namespace. The other set of subdomains is for the edge microservice, Service A, which the API calls. Traffic is routed to specific Kubernetes Service resources, based on the URL.

Deploying the GKE cluster and Istio triggers the creation of a Google Load Balancer, four IP addresses, and all required firewall rules. One of the four IP addresses, the one shown below, associated with the Forwarding rule, will be associated with the front-end of the load balancer.screen_shot_2019-03-09_at_5_49_37_pm

Below, we see the new load balancer, with the front-end IP address and the backend VM pool of three GKE cluster’s worker nodes. Each node is assigned one of the IP addresses, as shown above.

screen_shot_2019-03-09_at_5_57_20_pm

As shown below, using Google Cloud DNS, I have created the four subdomains and assigned the IP address of the load balancer’s front-end to all four subdomains. Ingress traffic to these addresses will be routed through the Istio ingress Gateway and the four Istio VirtualServices, to the appropriate Kubernetes Service resources. Use your choice of DNS management tools to create the four A Type DNS records.

screen_shot_2019-03-09_at_5_56_29_pm

Deploy the Reference Platform

Next, deploy the eight Go-based microservices, the Angular UI, and the associated Kubernetes and Istio resources to the GKE cluster. To deploy the platform, use the included bash deploy script, part5a_deploy_resources.sh. If anything fails and you want to remove the existing resources and re-deploy, without destroying the GKE cluster or Istio, you can use the part5b_delete_resources.sh delete script.

screen_shot_2019-03-09_at_6_01_29_pm

The deploy script deploys all the resources two Kubernetes Namespaces, dev and test. This will allow us to see how we can differentiate between Namespaces when using the observability tools.

Below, we see the Istio-related resources, which we just deployed. They include the Istio Gateway, four Istio VirtualService, and two Istio ServiceEntry resources.

screen_shot_2019-03-10_at_10_48_49_pm

Below, we see the platform’s Workloads (Kubernetes Deployment resources), running on the cluster. Here we see two Pods for each Workload, a total of 18 Pods, running in the dev Namespace. Each Pod contains both the deployed microservice or UI component, as well as a copy of Istio’s Envoy Proxy.

screen_shot_2019-03-09_at_6_12_59_pm

Below, we see the corresponding Kubernetes Service resources running in the dev Namespace.

screen_shot_2019-03-09_at_6_03_02_pm

Below, a similar view of the Deployment resources running in the test Namespace. Again, we have two Pods for each deployment with each Pod contains both the deployed microservice or UI component, as well as a copy of Istio’s Envoy Proxy.

screen_shot_2019-03-09_at_6_13_16_pm

Test the Platform

We do want to ensure the platform’s eight Go-based microservices and Angular UI are working properly, communicating with each other, and communicating with the external MongoDB Atlas and CloudAMQP RabbitMQ clusters. The easiest way to test the cluster is by viewing the Angular UI in a web browser.

screen_shot_2019-03-19_at_8_43_10_pm

The UI requires you to input the host domain of the Service A, the API’s edge service. Since you cannot use my subdomain, and the JavaScript code is running locally to your web browser, this option allows you to provide your own host domain. This is the same domain or domains you inserted into the two Istio VirtualService for the UI. This domain route your API calls to either the FQDN (fully qualified domain name) of the Service A Kubernetes Service running in the dev namespace, service-a.dev.svc.cluster.local, or the test Namespace, service-a.test.svc.cluster.local.

screen_shot_2019-03-17_at_12_02_22_pm.png

You can also use performance testing tools to load-test the platform. Many issues will not show up until the platform is under load. I recently starting using hey, a modern load generator tool, as a replacement for Apache Bench (ab), Unlike ab, hey supports HTTP/2 endpoints, which is required to test the platform on GKE with Istio. Below, I am running hey directly from Google Cloud Shell. The tool is simulating 25 concurrent users, generating a total of 1,000 HTTP/2-based GET requests to Service A.

screen_shot_2019-03-19_at_8_53_47_pm

Troubleshooting

If for some reason the UI fails to display, or the call from the UI to the API fails, and assuming all Kubernetes and Istio resources are running on the GKE cluster (all green), the most common explanation is usually a misconfiguration of the following resources:

  1. Your four Cloud DNS records are not correct. They are not pointing to the load balancer’s front-end IP address;
  2. You did not configure the four Kubernetes VirtualService resources with the correct subdomains;
  3. The GKE-based microservices cannot reach the external MongoDB Atlas and CloudAMQP RabbitMQ clusters. Likely, the Kubernetes Secret is constructed incorrectly, or the two ServiceEntry resources contain the wrong host information for those external clusters;

I suggest starting the troubleshooting by calling Service A, the API’s edge service, directly, using cURL or Postman. You should see a JSON response payload, similar to the following. This suggests the issue is with the UI, not the API.

screen_shot_2019-03-17_at_12_06_27_pm.png

Next, confirm that the four MongoDB databases were created for Service D, Service, F, Service, G, and Service H. Also, confirm that new documents are being written to the database’s collections.

screen_shot_2019-03-17_at_11_55_19_am

Next, confirm new the RabbitMQ queue was created, using the CloudAMQP RabbitMQ Management Console. Service D produces messages, which Service F consumes from the queue.

screen_shot_2019-03-09_at_6_22_08_pm

Lastly, review the Stackdriver logs to see if there are any obvious errors.

screen-shot-2019-03-08-at-4_44_03-pm

Part Two

In part two of this post, we will explore each observability tool, and see how they can help us manage our GKE cluster and the reference platform running in the cluster.

screen_shot_2019-03-09_at_11_38_34_pm

Since the cluster only takes minutes to fully create and deploy resources to, if you want to tear down the GKE cluster, run the part6_tear_down.sh script.

screen_shot_2019-03-10_at_10_58_55_pm.png

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , , , , , , , , , ,

Leave a comment

Automating Multi-Environment Kubernetes Virtual Clusters with Google Cloud DNS, Auth0, and Istio 1.0

Kubernetes supports multiple virtual clusters within the same physical cluster. These virtual clusters are called Namespaces. Namespaces are a way to divide cluster resources between multiple users. Many enterprises use Namespaces to divide the same physical Kubernetes cluster into different virtual software development environments as part of their overall Software Development Lifecycle (SDLC). This practice is commonly used in ‘lower environments’ or ‘non-prod’ (not Production) environments. These environments commonly include Continous Integration and Delivery (CI/CD), Development, Integration, Testing/Quality Assurance (QA), User Acceptance Testing (UAT), Staging, Demo, and Hotfix. Namespaces provide a basic form of what is referred to as soft multi-tenancy.

Generally, the security boundaries and performance requirements between non-prod environments, within the same enterprise, are less restrictive than Production or Disaster Recovery (DR) environments. This allows for multi-tenant environments, while Production and DR are normally single-tenant environments. In order to approximate the performance characteristics of Production, the Performance Testing environment is also often isolated to a single-tenant. A typical enterprise would minimally have a non-prod, performance, production, and DR environment.

Using Namespaces to create virtual separation on the same physical Kubernetes cluster provides enterprises with more efficient use of virtual compute resources, reduces Cloud costs, eases the management burden, and often expedites and simplifies the release process.

Demonstration

In this post, we will re-examine the topic of virtual clusters, similar to the recent post, Managing Applications Across Multiple Kubernetes Environments with Istio: Part 1 and Part 2. We will focus specifically on automating the creation of the virtual clusters on GKE with Istio 1.0, managing the Google Cloud DNS records associated with the cluster’s environments, and enabling both HTTPS and token-based OAuth access to each environment. We will use the Storefront API for our demonstration, featured in the previous three posts, including Building a Microservices Platform with Confluent Cloud, MongoDB Atlas, Istio, and Google Kubernetes Engine.

gke-routing.png

Source Code

The source code for this post may be found on the gke branch of the storefront-kafka-docker GitHub repository.

git clone --branch gke --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/storefront-kafka-docker.git

Source code samples in this post are displayed as GitHub Gists, which may not display correctly on all mobile and social media browsers, such as LinkedIn.

This project contains all the code to deploy and configure the GKE cluster and Kubernetes resources.

Screen Shot 2019-01-19 at 11.49.31 AM.png

To follow along, you will need to register your own domain, arrange for an Auth0, or alternative, authentication and authorization service, and obtain an SSL/TLS certificate.

SSL/TLS Wildcard Certificate

In the recent post, Securing Your Istio Ingress Gateway with HTTPS, we examined how to create and apply an SSL/TLS certificate to our GKE cluster, to secure communications. Although we are only creating a non-prod cluster, it is more and more common to use SSL/TLS everywhere, especially in the Cloud. For this post, I have registered a single wildcard certificate, *.api.storefront-demo.com. This certificate will cover the three second-level subdomains associated with the virtual clusters: dev.api.storefront-demo.com, test.api.storefront-demo.com, and uat.api.storefront-demo.com. Setting the environment name, such as dev.*, as the second-level subdomain of my storefront-demo domain, following the first level api.* subdomain, makes the use of a wildcard certificate much easier.

screen_shot_2019-01-13_at_10.04.23_pm

As shown below, my wildcard certificate contains the Subject Name and Subject Alternative Name (SAN) of *.api.storefront-demo.com. For Production, api.storefront-demo.com, I prefer to use a separate certificate.

screen_shot_2019-01-13_at_10.36.33_pm_detail

Create GKE Cluster

With your certificate in hand, create the non-prod Kubernetes cluster. Below, the script creates a minimally-sized, three-node, multi-zone GKE cluster, running on GCP, with Kubernetes Engine cluster version 1.11.5-gke.5 and Istio on GKE version 1.0.3-gke.0. I have enabled the master authorized networks option to secure my GKE cluster master endpoint. For the demo, you can add your own IP address CIDR on line 9 (i.e. 1.2.3.4/32), or remove lines 30 – 31 to remove the restriction (gist).

  • Lines 16–39: Create a 3-node, multi-zone GKE cluster with Istio;
  • Line 48: Creates three non-prod Namespaces: dev, test, and uat;
  • Lines 51–53: Enable Istio automatic sidecar injection within each Namespace;

If successful, the results should look similar to the output, below.

screen_shot_2019-01-15_at_11.51.08_pm

The cluster will contain a pool of three minimally-sized VMs, the Kubernetes nodes.

screen_shot_2019-01-16_at_12.06.03_am

Deploying Resources

The Istio Gateway and three ServiceEntry resources are the primary resources responsible for routing the traffic from the ingress router to the Services, within the multiple Namespaces. Both of these resource types are new to Istio 1.0 (gist).

  • Lines 9–16: Port config that only accepts HTTPS traffic on port 443 using TLS;
  • Lines 18–20: The three subdomains being routed to the non-prod GKE cluster;
  • Lines 28, 63, 98: The three subdomains being routed to the non-prod GKE cluster;
  • Lines 39, 47, 65, 74, 82, 90, 109, 117, 125: Routing to FQDN of Storefront API Services within the three Namespaces;

Next, deploy the Istio and Kubernetes resources to the new GKE cluster. For the sake of brevity, we will deploy the same number of instances and the same version of each the three Storefront API services (Accounts, Orders, Fulfillment) to each of the three non-prod environments (dev, test, uat). In reality, you would have varying numbers of instances of each service, and each environment would contain progressive versions of each service, as part of the SDLC of each microservice (gist).

  • Lines 13–14: Deploy the SSL/TLS certificate and the private key;
  • Line 17: Deploy the Istio Gateway and three ServiceEntry resources;
  • Lines 20–22: Deploy the Istio Authentication Policy resources each Namespace;
  • Lines 26–37: Deploy the same set of resources to the dev, test, and uat Namespaces;

The deployed Storefront API Services should look as follows.

screen_shot_2019-01-13_at_7.16.03_pm

Google Cloud DNS

Next, we need to enable DNS access to the GKE cluster using Google Cloud DNS. According to Google, Cloud DNS is a scalable, reliable and managed authoritative Domain Name System (DNS) service running on the same infrastructure as Google. It has low latency, high availability, and is a cost-effective way to make your applications and services available to your users.

Whenever a new GKE cluster is created, a new Network Load Balancer is also created. By default, the load balancer’s front-end is an external IP address.

screen_shot_2019-01-15_at_11.56.01_pm.png

Using a forwarding rule, traffic directed at the external IP address is redirected to the load balancer’s back-end. The load balancer’s back-end is comprised of three VM instances, which are the three Kubernete nodes in the GKE cluster.

screen_shot_2019-01-15_at_11.56.19_pm

If you are following along with this post’s demonstration, we will assume you have a domain registered and configured with Google Cloud DNS. I am using the storefront-demo.com domain, which I have used in the last three posts to demonstrate Istio and GKE.

Google Cloud DNS has a fully functional web console, part of the Google Cloud Console. However, using the Cloud DNS web console is impractical in a DevOps CI/CD workflow, where Kubernetes clusters, Namespaces, and Workloads are ephemeral. Therefore we will use the following script. Within the script, we reset the IP address associated with the A records for each non-prod subdomains associated with storefront-demo.com domain (gist).

  • Lines 23–25: Find the previous load balancer’s front-end IP address;
  • Lines 27–29: Find the new load balancer’s front-end IP address;
  • Line 35: Start the Cloud DNS transaction;
  • Lines 37–47: Add the DNS record changes to the transaction;
  • Line 49: Execute the Cloud DNS transaction;

The outcome of the script is shown below. Note how changes are executed as part of a transaction, by automatically creating a transaction.yaml file. The file contains the six DNS changes, three additions and three deletions. The command executes the transaction and then deletes the transaction.yaml file.

> sh ./part3_set_cloud_dns.sh
Old LB IP Address: 35.193.208.115
New LB IP Address: 35.238.196.231

Transaction started [transaction.yaml].

dev.api.storefront-demo.com.
Record removal appended to transaction at [transaction.yaml].
Record addition appended to transaction at [transaction.yaml].

test.api.storefront-demo.com.
Record removal appended to transaction at [transaction.yaml].
Record addition appended to transaction at [transaction.yaml].

uat.api.storefront-demo.com.
Record removal appended to transaction at [transaction.yaml].
Record addition appended to transaction at [transaction.yaml].

Executed transaction [transaction.yaml] for managed-zone [storefront-demo-com-zone].
Created [https://www.googleapis.com/dns/v1/projects/gke-confluent-atlas/managedZones/storefront-demo-com-zone/changes/53].

ID  START_TIME                STATUS
55  2019-01-16T04:54:14.984Z  pending

Based on my own domain and cluster details, the transaction.yaml file looks as follows. Again, note the six DNS changes, three additions, followed by three deletions (gist).

Confirm DNS Changes

Use the dig command to confirm the DNS records are now correct and that DNS propagation has occurred. The IP address returned by dig should be the external IP address assigned to the front-end of the Google Cloud Load Balancer.

> dig dev.api.storefront-demo.com +short
35.238.196.231

Or, all the three records.

echo \
  "dev.api.storefront-demo.com\n" \
  "test.api.storefront-demo.com\n" \
  "uat.api.storefront-demo.com" \
  > records.txt | dig -f records.txt +short

35.238.196.231
35.238.196.231
35.238.196.231

Optionally, more verbosely by removing the +short option.

> dig +nocmd dev.api.storefront-demo.com

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30763
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;dev.api.storefront-demo.com.   IN  A

;; ANSWER SECTION:
dev.api.storefront-demo.com. 299 IN A   35.238.196.231

;; Query time: 27 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Jan 16 18:00:49 EST 2019
;; MSG SIZE  rcvd: 72

The resulting records in the Google Cloud DNS management console should look as follows.

screen_shot_2019-01-15_at_11.57.12_pm

JWT-based Authentication

As discussed in the previous post, Istio End-User Authentication for Kubernetes using JSON Web Tokens (JWT) and Auth0, it is typical to limit restrict access to the Kubernetes cluster, Namespaces within the cluster, or Services running within Namespaces to end-users, whether they are humans or other applications. In that previous post, we saw an example of applying a machine-to-machine (M2M) Istio Authentication Policy to only the uat Namespace. This scenario is common when you want to control access to resources in non-production environments, such as UAT, to outside test teams, accessing the uat Namespace through an external application. To simulate this scenario, we will apply the following Istio Authentication Policy to the uat Namespace. (gist).

For the dev and test Namespaces, we will apply an additional, different Istio Authentication Policy. This policy will protect against the possibility of dev and test M2M API consumers interfering with uat M2M API consumers and vice-versa. Below is the dev and test version of the Policy (gist).

Testing Authentication

Using Postman, with the ‘Bearer Token’ type authentication method, as detailed in the previous post, a call a Storefront API resource in the uat Namespace should succeed. This also confirms DNS and HTTPS are working properly.

screen_shot_2019-01-15_at_11.58.41_pm

The dev and test Namespaces require different authentication. Trying to use no Authentication, or authenticating as a UAT API consumer, will result in a 401 Unauthorized HTTP status, along with the Origin authentication failed. error message.

screen_shot_2019-01-16_at_12.00.55_am

Conclusion

In this brief post, we demonstrated how to create a GKE cluster with Istio 1.0.x, containing three virtual clusters, or Namespaces. Each Namespace represents an environment, which is part of an application’s SDLC. We enforced HTTP over TLS (HTTPS) using a wildcard SSL/TLS certificate. We also enforced end-user authentication using JWT-based OAuth 2.0 with Auth0. Lastly, we provided user-friendly DNS routing to each environment, using Google Cloud DNS. Short of a fully managed API Gateway, like Apigee, and automating the execution of the scripts with Jenkins or Spinnaker, this cluster is ready to provide a functional path to Production for developing our Storefront API.

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , , , , ,

1 Comment

Istio End-User Authentication for Kubernetes using JSON Web Tokens (JWT) and Auth0

In the recent post, Building a Microservices Platform with Confluent Cloud, MongoDB Atlas, Istio, and Google Kubernetes Engine, we built and deployed a microservice-based, cloud-native API to Google Kubernetes Engine, with Istio 1.0.x, on Google Cloud Platform. For brevity, we intentionally omitted a few key features required to operationalize and secure the API. These missing features included HTTPS, user authentication, request quotas, request throttling, and the integration of a full lifecycle API management tool, like Google Apigee.

In a follow-up post, Securing Your Istio Ingress Gateway with HTTPS, we disabled HTTP access to the API running on the GKE cluster. We then enabled bidirectional encryption of communications between a client and GKE cluster with HTTPS.

In this post, we will further enhance the security of the Storefront Demo API by enabling Istio end-user authentication using JSON Web Token-based credentials. Using JSON Web Tokens (JWT), pronounced ‘jot’, will allow Istio to authenticate end-users calling the Storefront Demo API. We will use Auth0, an Authentication-as-a-Service provider, to generate JWT tokens for registered Storefront Demo API consumers, and to validate JWT tokens from Istio, as part of an OAuth 2.0 token-based authorization flow.

istio-gke-auth

JSON Web Tokens

Token-based authentication, according to Auth0, works by ensuring that each request to a server is accompanied by a signed token which the server verifies for authenticity and only then responds to the request. JWT, according to JWT.io, is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. This information can be verified and trusted because it is digitally signed. Other common token types include Simple Web Tokens (SWT) and Security Assertion Markup Language Tokens (SAML).

JWTs can be signed using a secret with the Hash-based Message Authentication Code (HMAC) algorithm, or a public/private key pair using Rivest–Shamir–Adleman (RSA) or Elliptic Curve Digital Signature Algorithm (ECDSA). Authorization is the most common scenario for using JWT. Within the token payload, you can easily specify user roles and permissions as well as resources that the user can access.

A registered API consumer makes an initial request to the Authorization server, in which they exchange some form of credentials for a token. The JWT is associated with a set of specific user roles and permissions. Each subsequent request will include the token, allowing the user to access authorized routes, services, and resources that are permitted with that token.

Auth0

To use JWTs for end-user authentication with Istio, we need a way to authenticate credentials associated with specific users and exchange those credentials for a JWT. Further, we need a way to validate the JWTs from Istio. To meet these requirements, we will use Auth0. Auth0 provides a universal authentication and authorization platform for web, mobile, and legacy applications. According to G2 Crowd, competitors to Auth0 in the Customer Identity and Access Management (CIAM) Software category include Okta, Microsoft Azure Active Directory (AD) and AD B2C, Salesforce Platform: Identity, OneLogin, Idaptive, IBM Cloud Identity Service, and Bitium.

screen_shot_2019-01-09_at_10.18.16_am.png

Auth0 currently offers four pricing plans: Free, Developer, Developer Pro, and Enterprise. Subscriptions to plans are on a monthly or discounted yearly basis. For this demo’s limited requirements, you need only use Auth0’s Free Plan.

screen_shot_2019-01-06_at_6.11.45_pm

Client Credentials Grant

The OAuth 2.0 protocol defines four flows, or grants types, to get an Access Token, depending on the application architecture and the type of end-user. We will be simulating a third-party, external application that needs to consume the Storefront API, using the Client Credentials grant type. According to Auth0, The Client Credentials Grant, defined in The OAuth 2.0 Authorization Framework RFC 6749, section 4.4, allows an application to request an Access Token using its Client Id and Client Secret. It is used for non-interactive applications, such as a CLI, a daemon, or a Service running on your backend, where the token is issued to the application itself, instead of an end user.

jwt-istio-authorize-flow

With Auth0, we need to create two types of entities, an Auth0 API and an Auth0 Application. First, we define an Auth0 API, which represents the Storefront API we are securing. Second, we define an Auth0 Application, a consumer of our API. The Application is associated with the API. This association allows the Application (consumer of the API) to authenticate with Auth0 and receive a JWT. Note there is no direct integration between Auth0 and Istio or the Storefront API. We are facilitating a decoupled, mutual trust relationship between Auth0, Istio, and the registered end-user application consuming the API.

Start by creating a new Auth0 API, the ‘Storefront Demo API’. For this demo, I used my domain’s URL as the Identifier. For use with Istio, choose RS256 (RSA Signature with SHA-256), an asymmetric algorithm that uses a public/private key pair, as opposed to the HS256 symmetric algorithm. With RS256, Auth0 will use the same private key to both create the signature and to validate it. Auth0 has published a good post on the use of RS256 vs. HS256 algorithms.

screen_shot_2019-01-05_at_9.39.01_am

screen_shot_2019-01-05_at_1.49.06_pm

Scopes

Auth0 allows granular access control to your API through the use of Scopes. The permissions represented by the Access Token in OAuth 2.0 terms are known as scopes, According to Auth0. The scope parameter allows the application to express the desired scope of the access request. The scope parameter can also be used by the authorization server in the response to indicate which scopes were actually granted.

Although it is necessary to define and assign at least one scope to our Auth0 Application, we will not actually be using those scopes to control fine-grain authorization to resources within the Storefront API. In this demo, if an end-user is authenticated, they will be authorized to access all Storefront API resources.

screen_shot_2019-01-05_at_9.45.22_am

Machine to Machine Applications

Next, define a new Auth0 Machine to Machine (M2M) Application, ‘Storefront Demo API Consumer 1’.

screen_shot_2019-01-06_at_7.05.21_pm.png

Next, authorize the new M2M Application to request access to the new Storefront Demo API. Again, we are not using scopes, but at least one scope is required, or you will not be able to authenticate, later.

screen_shot_2019-01-06_at_7.23.40_pm.png

Each M2M Application has a unique Client ID and Client Secret, which are used to authenticate with the Auth0 server and retrieve a JWT.

screen_shot_2019-01-05_at_1.50.32_pm

Multiple M2M Applications may be authorized to request access to APIs.

screen_shot_2019-01-05_at_1.50.17_pm

In the Endpoints tab of the Advanced Application Settings, there are a series of OAuth URLs. To authorize our new M2M Application to consume the Storefront Demo API, we need the ‘OAuth Authorization URL’.

screen_shot_2019-01-06_at_7.32.54_pm.png

Testing Auth0

To test the Auth0 JWT-based authentication and authorization workflow, I prefer to use Postman. Conveniently, Auth0 provides a Postman Collection with all the HTTP request you will need, already built. Use the Client Credentials POST request. The grant_type header value will always be client_credentials. You will need to supply the Auth0 Application’s Client ID and Client Secret as the client_id and client_secret header values. The audience header value will be the API Identifier you used to create the Auth0 API earlier.

screen_shot_2019-01-06_at_5.25.50_pm

If the HTTP request is successful, you should receive a JWT access_token in response, which will allow us to authenticate with the Storefront API, later. Note the scopes you defined with Auth0 are also part of the response, along with the token’s TTL.

jwt.io Debugger

For now, test the JWT using the jwt.io Debugger page. If everything is working correctly, the JWT should be successfully validated.

screen_shot_2019-01-05_at_1.54.35_pm

Istio Authentication Policy

To enable Istio end-user authentication using JWT with Auth0, we add an Istio Policy authentication resource to the existing set of deployed resources. You have a few choices for end-user authentication, such as:

  1. Applied globally, to all Services across all Namespaces via the Istio Ingress Gateway;
  2. Applied locally, to all Services within a specific Namespace (i.e. uat);
  3. Applied locally, to a single Service or Services within a specific Namespace (i.e prod.accounts);

In reality, since you would likely have more than one registered consumer of the API, with different roles, you would have more than one Authentication Policy applied the cluster.

For this demo, we will enable global end-user authentication to the Storefront API, using JWTs, at the Istio Ingress Gateway. To create an Istio Authentication Policy resource, we use the Istio Authentication API version authentication.istio.io/v1alpha1(gist).

The single audiences YAML map value is the same Audience header value you used in your earlier Postman request, which was the API Identifier you used to create the Auth0 Storefront Demo API earlier. The issuer YAML scalar value is Auth0 M2M Application’s Domain value, found in the ‘Storefront Demo API Consumer 1’ Settings tab. The jwksUri YAML scalar value is the JSON Web Key Set URL value, found in the Endpoints tab of the Advanced Application Settings.

screen_shot_2019-01-06_at_8.26.55_pm.png

The JSON Web Key Set URL is a publicly accessible endpoint. This endpoint will be accessed by Istio to obtain the public key used to authenticate the JWT.

screen_shot_2019-01-06_at_5.27.40_pm

Assuming you have already have deployed the Storefront API to the GKE cluster, simply apply the new Istio Policy. We should now have end-user authentication enabled on the Istio Ingress Gateway using JSON Web Tokens.

kubectl apply -f ./resources/other/ingressgateway-jwt-policy.yaml

Finer-grain Authentication

If you need finer-grain authentication of resources, alternately, you can apply an Istio Authentication Policy across a Namespace and to a specific Service or Services. Below, we see an example of applying a Policy to only the uat Namespace. This scenario is common when you want to control access to resources in non-production environments, such as UAT, to outside test teams or a select set of external beta-testers. According to Istio, to apply Namespace-wide end-user authentication, across a single Namespace, it is necessary to name the Policy, default (gist).

Below, we see an even finer-grain Policy example, scoped to just the accounts Service within just the prod Namespace. This scenario is common when you have an API consumer whose role only requires access to a portion of the API. For example, a marketing application might only require access to the accounts Service, but not the orders or fulfillment Services (gist).

Test Authentication

To test end-user authentication, first, call any valid Storefront Demo API endpoint, without supplying a JWT for authorization. You should receive a ‘401 Unauthorized’ HTTP response code, along with an Origin authentication failed. message in the response body. This means the Storefront Demo API is now inaccessible unless the API consumer supplies a JWT, which can be successfully validated by Istio.

screen_shot_2019-01-06_at_5.22.36_pm

Next, add authorization to the Postman request by selecting the ‘Bearer Token’ type authentication method. Copy and paste the JWT (access_token) you received earlier from the Client Credentials request. This will add an Authorization request header. In curl, the request header would look as follows (gist).

Make the request with Postman. If the Istio Policy is applied correctly, the request should now receive a successful response from the Storefront API. A successful response indicates that Istio successfully validated the JWT, located in the Authorization header, against the Auth0 Authorization Server. Istio then allows the user, the ‘Storefront Demo API Consumer 1’ application, access to all Storefront API resources.

screen_shot_2019-01-06_at_5.22.20_pm

Troubleshooting

Istio has several pages of online documentation on troubleshooting authentication issues. One of the first places to look for errors, if your end-user authentication is not working, but the JWT is valid, is the Istio Pilot logs. The core component used for traffic management in Istio, Pilot, manages and configures all the Envoy proxy instances deployed in a particular Istio service mesh. Pilot distributes authentication policies, like our new end-user authentication policy, and secure naming information to the proxies.

Below, in Google Stackdriver Logging, we see typical log entries indicating the Pilot was unable to retrieve the JWT public key (recall we are using RS256 public/private key pair asymmetric algorithm). This particular error was due to a typo in the Istio Policy authentication resource YAML file.

screen_shot_2019-01-06_at_8.49.56_pm

Below we see an Istio Mixer log entry containing details of a Postman request to the Accounts Storefront service /accounts/customers/summary endpoint. According to Istio, Mixer is the Istio component responsible for providing policy controls and telemetry collection. Note the apiClaims section of the textPayload of the log entry, corresponds to the Payload Segment of the JWT passed in this request. The log entry clearly shows that the JWT was decoded and validated by Istio, before forwarding the request to the Accounts Service.

screen_shot_2019-01-07_at_8.59.50_pm.png

Conclusion

In this brief post, we added end-user authentication to our Storefront Demo API, running on GKE with Istio. Although still not Production-ready, we have secured the Storefront API with both HTTPS client-server encryption and JSON Web Token-based authorization. Next steps would be to add mutual TLS (mTLS) and a fully-managed API Gateway in front of the Storefront API GKE cluster, to provide advanced API features, like caching, quotas and rate limits.

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , , , , ,

3 Comments