Posts Tagged event-driven

Event-driven, Serverless Architectures with AWS Lambda, SQS, DynamoDB, and API Gateway

Introduction

In this post, we will explore modern application development using an event-driven, serverless architecture on AWS. To demonstrate this architecture, we will integrate several fully-managed services, all part of the AWS Serverless Computing platform, including Lambda, API Gateway, SQS, S3, and DynamoDB. The result will be an application composed of small, easily deployable, loosely coupled, independently scalable, serverless components.

What is ‘Event-Driven’?

According to Otavio Ferreira, Manager, Amazon SNS, and James Hood, Senior Software Development Engineer, in their AWS Compute Blog, Enriching Event-Driven Architectures with AWS Event Fork Pipelines, “Many customers are choosing to build event-driven applications in which subscriber services automatically perform work in response to events triggered by publisher services. This architectural pattern can make services more reusable, interoperable, and scalable.” This description of an event-driven architecture perfectly captures the essence of the following post. All interactions between application components in this post will be as a direct result of triggering an event.

What is ‘Serverless’?

Mistakingly, many of us think of serverless as just functions (aka Function-as-a-Service or FaaS). When it comes to functions on AWS, Lambda is just one of many fully-managed services that make up the AWS Serverless Computing platform. So, what is ‘serverless’? According to AWS, “Serverless applications don’t require provisioning, maintaining, and administering servers for backend components such as compute, databases, storage, stream processing, message queueing, and more.

As a Developer, one of my favorite features of serverless is the cost, or lack thereof. With serverless on AWS, you pay for consistent throughput or execution duration rather than by server unit, and, at least on AWS, you don’t pay for idle resources. This is not always true of ‘serverless’ offerings on other leading Cloud platforms. Remember, if you’re paying for it but not using it, it’s not serverless.

If you’re paying for it but not using it, it’s not serverless.

Demonstration

To demonstrate an event-driven, serverless architecture, we will build, package, and deploy an application capable of extracting messages from CSV files placed in S3, transforming those messages, queueing them to SQS, and finally, writing the messages to DynamoDB, using Lambda functions throughout. We will also expose a RESTful API, via API Gateway, to perform CRUD-like operations on those messages in DynamoDB.

AWS Technologies

In this demonstration, we will use several AWS serverless services, including the following.

Each Lambda will use function-specific execution roles, part of AWS Identity and Access Management (IAM). We will log the event details and monitor services using Amazon CloudWatch.

To codify, build, package, deploy, and manage the Lambda functions and other AWS resources in a fully automated fashion, we will also use the following AWS services:

Architecture

The high-level architecture for the platform provisioned and deployed in this post is illustrated in the diagram below. There are two separate workflows. In the first workflow (top), data is extracted from CSV files placed in S3, transformed, queued to SQS, and written to DynamoDB, using Python-based Lambda functions throughout. In the second workflow (bottom), data is manipulated in DynamoDB through interactions with a RESTful API, exposed via an API Gateway, and backed by Node.js-based Lambda functions.

new-01-sqs-dynamodb

Using the vast array of current AWS services, there are several ways we could extract, transform, and load data from static files into DynamoDB. The demonstration’s event-driven, serverless architecture represents just one possible approach.

Source Code

All source code for this post is available on GitHub in a single public repository, serverless-sqs-dynamo-demo. To clone the GitHub repository, execute the following command.

git clone --branch master --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/serverless-sqs-dynamo-demo.git

The project files relevant to this demonstration are organized as follows.

.
├── README.md
├── lambda_apigtw_to_dynamodb
│   ├── app.js
│   ├── events
│   ├── node_modules
│   ├── package.json
│   └── tests
├── lambda_s3_to_sqs
│   ├── __init__.py
│   ├── app.py
│   ├── requirements.txt
│   └── tests
├── lambda_sqs_to_dynamodb
│   ├── __init__.py
│   ├── app.py
│   ├── requirements.txt
│   └── tests
├── requirements.txt
├── template.yaml
└── sample_data
    ├── data.csv
    ├── data_bad_msg.csv
    └── data_good_msg.csv

Some source code samples in this post are GitHub Gists, which may not display correctly on all social media browsers, such as LinkedIn.

Prerequisites

The demonstration assumes you already have an AWS account. You will need the latest copy of the AWS CLI, SAM CLI, and Python 3 installed on your development machine.

Additionally, you will need two existing S3 buckets. One bucket will be used to store the packaged project files for deployment. The second bucket is where we will place CSV data files, which, in turn, will trigger events that invoke multiple Lambda functions.

Deploying the Project

Before diving into the code, we will deploy the project to AWS. Conveniently, the entire project’s resources are codified in an AWS SAM template. We are using the AWS Serverless Application Model (SAM). AWS SAM is a model used to define serverless applications on AWS. According to the official SAM GitHub project documentation, AWS SAM is based on AWS CloudFormation. A serverless application is defined in a CloudFormation template and deployed as a CloudFormation stack.

Template Parameter

CloudFormation will create and uniquely name the SQS queues and the DynamoDB table. However, to avoid circular references, a common issue when creating resources associated with S3 event notifications, it is easier to use a pre-existing bucket. To start, you will need to change the SAM template’s DataBucketName parameter’s default value to your own S3 bucket name. Again, this bucket is where we will eventually push the CSV data files. Alternately, override the default values using the sam build command, next.

Parameters:
  DataBucketName:
    Type: String
    Description: S3 bucket where CSV files are processed
    Default: your-data-bucket-name

SAM CLI Commands

With the DataBucketName parameter set, proceed to validate, build, package, and deploy the project using the SAM CLI and the commands below. In addition to the sam validate command, I also like to use the aws cloudformation validate-template command to validate templates and catch any potential, additional errors.

Note the S3_BUCKET_BUILD variable, below, refers to the name of the S3 bucket SAM will use package and deploy the project from, as opposed to the S3 bucket, which the CSV data files will be placed into (gist).


# variables
S3_BUILD_BUCKET=your_build_bucket_name
STACK_NAME=your_cloudformation_stack_name
# validate
sam validate –template template.yaml
aws cloudformation validate-template \
–template-body file://template.yaml
# build
sam build –template template.yaml
# package
sam package \
–output-template-file packaged.yaml \
–s3-bucket $S3_BUILD_BUCKET
# deploy
sam deploy –template-file packaged.yaml \
–stack-name $STACK_NAME \
–capabilities CAPABILITY_IAM \
–debug

After validating the template, SAM will build and package each individual Lambda function and its associated dependencies. Below, we see each individual Lambda function being packaged with a copy of its dependencies.

screen_shot_2019-09-30_at_8_42_41_pm

Once packaged, SAM will deploy the project and create the AWS resources as a CloudFormation stack.

screen_shot_2019-09-30_at_8_43_11_pm

Once the stack creation is complete, use the CloudFormation management console to review the AWS resources created by SAM. There are approximately 14 resources defined in the SAM template, which result in 33 individual resources deployed as part of the CloudFormation stack.

screen_shot_2019-09-30_at_8_44_46_pm

Note the stack’s output values. You will need these values to interact with the deployed platform, later in the demonstration.

screen_shot_2019-09-30_at_8_45_13_pm

Test the Deployed Application

Once the CloudFormation stack has deployed without error, copying a CSV file to the S3 bucket is the quickest way to confirm everything is working. The project includes test data files with 20 rows of test message data. Below is a sample of the CSV file, which is included in the project. The data was collected from IoT devices that measured response time from wired versus wireless devices on a LAN network; the message details are immaterial to this demonstration (gist).



timestamp location source local_dest local_avg remote_dest remote_avg
1559040909.3853335 location-03 wireless router-1 4.39 device-1 9.09
1559040919.5273902 location-03 wireless router-1 0.49 device-1 16.75
1559040929.6446512 location-03 wireless router-1 0.56 device-1 8.31
1559040939.7712135 location-03 wireless router-1 1.64 device-1 9.4
1559040949.891723 location-03 wireless router-1 1.18 device-1 9.07
1559040960.011338 location-03 wireless router-1 0.42 device-1 8.4
1559040970.1319716 location-03 wireless router-1 1.73 device-1 8.66
1559040980.2533505 location-03 wireless router-1 0.67 device-1 8.61
1559040990.3816211 location-03 wireless router-1 1.27 device-1 10.87
1559041000.5105414 location-03 wireless router-1 1.63 device-1 10.08

Run the following commands to copy the test data file to your S3 bucket.

S3_DATA_BUCKET=your_data_bucket_name
aws s3 cp sample_data/data.csv s3://$S3_DATA_BUCKET

Visit the DynamoDB management console. You should observe a new DynamoDB table.

screen_shot_2019-09-30_at_8_47_10_pm

Within the new DynamoDB table, you should observe twenty items, corresponding to each of the twenty rows in the CSV file, uploaded to S3.

screen_shot_2019-09-30_at_8_51_07_pm

Drill into an individual item within the table and review its attributes. They should match the rows in the CSV file.

screen_shot_2019-09-30_at_8_51_54_pm

Both the Python- and Node.js-based Lambda functions have their default logging levels set to debug. The debug-level output from each Lambda function is streamed to individual Amazon CloudWatch Log Groups. We can use the CloudWatch logs to troubleshoot any issues with the deployed application. Below we see an example of CloudWatch log entries for the request and response payloads generated from GetMessageFunction Lambda function, which is querying DynamoDB for a single Item.

screen_shot_2019-10-03_at_9_16_22_pm

Event-Driven Patterns

There are three distinct and discrete event-driven dataflows within the demonstration’s architecture

  1. S3 Event Source for Lambda (S3 to SQS)
  2. SQS Event Source for Lambda (SQS to DynamoDB)
  3. API Gateway Event Source for Lambda (API Gateway to DynamoDB)

Let’s examine each event-driven dataflow and the Lambda code associated with that part of the architecture.

S3 Event Source for Lambda

Whenever a file is copied into the target S3 bucket, an S3 Event Notification triggers an asynchronous invocation of a Lambda. According to AWS, when you invoke a function asynchronously, the Lambda sends the event to the SQS queue. A separate process reads events from the queue and executes your Lambda function.

new-02-sqs-dynamodb

The Lambda’s function handler, written in Python, reads the CSV file, whose filename is contained in the event. The Lambda extracts the rows in the CSV file, transforms the data, and pushes each message to the SQS queue (gist).


def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(
event['Records'][0]['s3']['object']['key'],
encoding='utf-8'
)
messages = read_csv_file(bucket, key)
process_messages(messages)

view raw

app.py

hosted with ❤ by GitHub

Below is an example of a message body, part an SQS message, extracted from a single row of the CSV file, and sent by the Lambda to the SQS queue. The timestamp has been converted to separate date and time fields by the Lambda. The DynamoDB table is part of the SQS message body. The key/value pairs in the Item JSON object reflect the schema of the DynamoDB table (gist).


{
"TableName": "your-dynamodb-table-name",
"Item": {
"date": {
"S": "2001-01-01"
},
"time": {
"S": "09:01:05"
},
"location": {
"S": "location-03"
},
"source": {
"S": "wireless"
},
"local_dest": {
"S": "router-1"
},
"local_avg": {
"N": "5.55"
},
"remote_dest": {
"S": "device-1"
},
"remote_avg": {
"N": "10.10"
}
}
}

SQS Event Source for Lambda

According to AWS, SQS offers two types of message queues, Standard and FIFO (First-In-First-Out). An SQS FIFO queue is designed to guarantee that messages are processed exactly once, in the exact order that they are sent. A Standard SQS queue offers maximum throughput, best-effort ordering, and at-least-once delivery.

Examining the SQS management console, you should observe that the CloudFormation stack creates two SQS Standard queues—a primary queue and a Dead Letter Queue (DLQ). According to AWS, Amazon SQS supports dead-letter queues, which other queues (source queues) can target for messages that cannot be processed (consumed) successfully.

screen_shot_2019-09-30_at_8_55_33_pm

Examining the SQS Lambda Triggers tab, you should observe the Lambda, which will be triggered by the SQS events.

screen_shot_2019-09-30_at_8_56_25_pm

When a message is pushed into the SQS queue by the previous process, an SQS event is fired, which synchronously triggers an invocation of the Lambda using the SQS Event Source for Lambda functionality. When a function is invoked synchronously, Lambda runs the function and waits for a response.

new-03-sqs-dynamodb

In the demonstration, the Lambda’s function handler, also written in Python, pulls the message off of the SQS queue and writes the message (DynamoDB put) to the DynamoDB table. Although writing is the primary use case in this demonstration, an event could also trigger a get, scan, update, or delete command to be executed on the DynamoDB table (gist).


def lambda_handler(event, context):
operations = {
'DELETE': lambda dynamo, x: dynamo.delete_item(**x),
'POST': lambda dynamo, x: dynamo.put_item(**x),
'PUT': lambda dynamo, x: dynamo.update_item(**x),
'GET': lambda dynamo, x: dynamo.get_item(**x),
'GET_ALL': lambda dynamo, x: dynamo.scan(**x),
}
for record in event['Records']:
payload = loads(record['body'], parse_float=str)
operation = record['messageAttributes']['Method']['stringValue']
if operation in operations:
try:
operations[operation](dynamo_client, payload)
except Exception as e:
logger.error(e)
else:
logger.error('Unsupported method \'{}\''.format(operation))

view raw

app.py

hosted with ❤ by GitHub

API Gateway Event Source for Lambda

Examining the API Gateway management console, you should observe that CloudFormation created a new Edge-optimized API. The API contains several resources and their associated HTTP methods.

screen_shot_2019-09-30_at_9_02_52_pm

Each API resource is associated with a deployed Lambda function. Switching to the Lambda console, you should observe a total of seven new Lambda functions. There are five Lambda functions related to the API, in addition to the Lambda called by the S3 event notifications and the Lambda called by the SQS event notifications.

screen_shot_2019-09-30_at_8_46_24_pm

Examining one of the Lambda functions associated with the API Gateway, we should observe that the API Gateway trigger for the Lambda (lower left and bottom).

screen_shot_2019-09-30_at_8_59_39_pm

When an end-user makes an HTTP(S) request via the RESTful API exposed by the API Gateway, an event is fired, which synchronously invokes a Lambda using the API Gateway Event Source for Lambda functionality. The event contains details about the HTTP request that is received. The event triggers any one of five different Lambda functions, depending on the HTTP request method.

new-04-sqs-dynamodb

The Lambda code, written in Node.js, contains five function handlers. Each handler corresponds to an HTTP method, including GET (DynamoDB get) POST (put), PUT (update), DELETE (delete), and SCAN (scan). Below is an example of the getMessage handler function. The function accepts two inputs. First, a path parameter, the date, which is the primary partition key for the DynamoDB table. Second, a query parameter, the time, which is the primary sort key for the DynamoDB table. Both the primary partition key and sort key must be passed to DynamoDB to retrieve the requested record (gist).


exports.getMessage = async (event, context) => {
if (tableName == null) {
tableName = process.env.TABLE_NAME;
}
params = {
TableName: tableName,
Key: {
"date": event.pathParameters.date,
"time": event.queryStringParameters.time
}
};
console.debug(params.Key);
return await new Promise((resolve, reject) => {
docClient.get(params, (error, data) => {
if (error) {
console.error(`getMessage ERROR=${error.stack}`);
resolve({
statusCode: 400,
error: `Could not get messages: ${error.stack}`
});
} else {
console.info(`getMessage data=${JSON.stringify(data)}`);
resolve({
statusCode: 200,
body: JSON.stringify(data)
});
}
});
});
};

view raw

app.js

hosted with ❤ by GitHub

Test the API

To test the Lambda functions, called by our API, we can use the sam local invoke command, part of the SAM CLI. Using this command, we can test the local Lambda functions, without them being deployed to AWS. The command allows us to trigger events, which the Lambda functions will handle. This is useful as we continue to develop, test, and re-deploy the Lambda functions to our Development, Staging, and Production environments.

The local Node.js-based, API-related Lambda functions, just like their deployed copies, will execute commands against the actual DynamoDB on AWS. The Github project contains a set of five sample events, corresponding to the five Lambda functions, which in turn are associated with five different HTTP methods and API resources. For example, the event_getMessage.json event is associated with the GET HTTP method and calls the /message/{date}?time={time} resource endpoint, to return a single item. This event, shown below, triggers the GetMessageFunction Lambda (gist).


{
"body": "",
"resource": "/",
"path": "/message",
"httpMethod": "GET",
"isBase64Encoded": false,
"queryStringParameters": {
"time": "06:45:43"
},
"pathParameters": {
"date": "2000-01-01"
},
"stageVariables": {}
}

We can trigger all the events from using the CLI. The local Lambda expects the DynamoDB table name to exist as an environment variable. Make sure you set it locally, first, before executing the sam local invoke commands (gist).


# variables (required by local lambda functions)
TABLE_NAME=your-dynamodb-table-name
# local testing (All CRUD functions)
sam local invoke PostMessageFunction \
–event lambda_apigtw_to_dynamodb/events/event_postMessage.json
sam local invoke GetMessageFunction \
–event lambda_apigtw_to_dynamodb/events/event_getMessage.json
sam local invoke GetMessagesFunction \
–event lambda_apigtw_to_dynamodb/events/event_getMessages.json
sam local invoke PutMessageFunction \
–event lambda_apigtw_to_dynamodb/events/event_putMessage.json
sam local invoke DeleteMessageFunction \
–event lambda_apigtw_to_dynamodb/events/event_deleteMessage.json

If the events were successfully handled by the local Lambda functions, in the terminal, you should see the same HTTP response status codes you would expect from calling the RESTful resources via the API Gateway. Below, for example, we see the POST event being handled by the PostMessageFunction Lambda, adding a record to the DynamoDB table, and returning a successful status of 201 Created.

screen_shot_2019-10-04_at_2_50_41_pm.png

Testing the Deployed API

To test the actual deployed API, we can call one of the API’s resources using an HTTP client, such as Postman. To locate the URL used to invoke the API resource, look at the ‘Prod’ Stage for the new API. This can be found in the Stages tab of the API Gateway console. For example, note the Invoke URL for the POST HTTP method of the /message resource, shown below.

screen_shot_2019-10-04_at_3_02_21_pm

Below, we see an example of using Postman to make an HTTP GET request the /message/{date}?time={time} resource. We pass the required query and path parameters for date and for time. The request should receive a single item in response from DynamoDB, via the API Gateway and the associated Lambda. Here, the request was successful, and the Lambda function returns a 200 OK status.

screen_shot_2019-09-30_at_9_20_45_pm

Similarly, below, we see an example of calling the same /message endpoint using the HTTP POST method. In the body of the POST request, we pass the DynamoDB table name and the Item object. Again, the POST is successful, and the Lambda function returns a 201 Created status.

screen_shot_2019-10-03_at_10_05_31_pm

Cleaning Up

To complete the demonstration and remove the AWS resources, run the following commands. It is necessary to delete all objects from the S3 data bucket, first, before deleting the CloudFormation stack. Else, the stack deletion will fail.

S3_DATA_BUCKET=your_data_bucket_name
STACK_NAME=your_stack_name

aws s3 rm s3://$S3_DATA_BUCKET/data.csv # and any other objects

aws cloudformation delete-stack \
  --stack-name $STACK_NAME

Conclusion

In this post, we explored a simple example of building a modern application using an event-driven serverless architecture on AWS. We used several services, all part of the AWS Serverless Computing platform, including Lambda, API Gateway, SQS, S3, and DynamoDB. In addition to these, AWS has additional serverless services, which could enhance this demonstration, in particular, Amazon Kinesis, AWS Step Functions, Amazon SNS, and AWS AppSync.

In a future post, we will look at how to further test the individual components within this demonstration’s application stack, and how to automate its deployment using DevOps and CI/CD principals on AWS.

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , , ,

2 Comments

Architecting Cloud-Optimized Apps with AKS (Azure’s Managed Kubernetes), Azure Service Bus, and Cosmos DB

An earlier post, Eventual Consistency: Decoupling Microservices with Spring AMQP and RabbitMQ, demonstrated the use of a message-based, event-driven, decoupled architectural approach for communications between microservices, using Spring AMQP and RabbitMQ. This distributed computing model is known as eventual consistency. To paraphrase microservices.io, ‘using an event-driven, eventually consistent approach, each service publishes an event whenever it updates its data. Other services subscribe to events. When an event is received, a service (subscriber) updates its data.

That earlier post illustrated a fairly simple example application, the Voter API, consisting of a set of three Spring Boot microservices backed by MongoDB and RabbitMQ, and fronted by an API Gateway built with HAProxy. All API components were containerized using Docker and designed for use with Docker CE for AWS as the Container-as-a-Service (CaaS) platform.

Optimizing for Kubernetes on Azure

This post will demonstrate how a modern application, such as the Voter API, is optimized for Kubernetes in the Cloud (Kubernetes-as-a-Service), in this case, AKS, Azure’s new public preview of Managed Kubernetes for Azure Container Service. According to Microsoft, the goal of AKS is to simplify the deployment, management, and operations of Kubernetes. I wrote about AKS in detail, in my last post, First Impressions of AKS, Azure’s New Managed Kubernetes Container Service.

In addition to migrating to AKS, the Voter API will take advantage of additional enterprise-grade Azure’s resources, including Azure’s Service Bus and Cosmos DB, replacements for the Voter API’s RabbitMQ and MongoDB. There are several architectural options for the Voter API’s messaging and NoSQL data source requirements when moving to Azure.

  1. Keep Dockerized RabbitMQ and MongoDB – Easy to deploy to Kubernetes, but not easily scalable, highly-available, or manageable. Would require storage optimized Azure VMs for nodes, node affinity, and persistent storage for data.
  2. Replace with Cloud-based Non-Azure Equivalents – Use SaaS-based equivalents, such as CloudAMQP (RabbitMQ-as-a-Service) and MongoDB Atlas, which will provide scalability, high-availability, and manageability.
  3. Replace with Azure Service Bus and Cosmos DB – Provides all the advantages of  SaaS-based equivalents, and additionally as Azure resources, benefits from being in the Azure Cloud alongside AKS.

Source Code

The Kubernetes resource files and deployment scripts used in this post are all available on GitHub. This is the only project you need to clone to reproduce the AKS example in this post.

git clone \
  --branch master --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/azure-aks-sb-cosmosdb-demo.git

The Docker images for the three Spring microservices deployed to AKS, Voter, Candidate, and Election, are available on Docker Hub. Optionally, the source code, including Dockerfiles, for the Voter, Candidate, and Election microservices, as well as the Voter Client are available on GitHub, in the kub-aks branch.

git clone \
  --branch kub-aks --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/candidate-service.git

git clone \
  --branch kub-aks --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/election-service.git

git clone \
  --branch kub-aks --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/voter-service.git

git clone \
  --branch kub-aks --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/voter-client.git

Azure Service Bus

To demonstrate the capabilities of Azure’s Service Bus, the Voter API’s Spring microservice’s source code has been re-written to work with Azure Service Bus instead of RabbitMQ. A future post will explore the microservice’s messaging code. It is more likely that a large application, written specifically for a technology that is easily portable such as RabbitMQ or MongoDB, would likely remain on that technology, even if the application was lifted and shifted to the Cloud or moved between Cloud Service Providers (CSPs). Something important to keep in mind when choosing modern technologies – portability.

Service Bus is Azure’s reliable cloud Messaging-as-a-Service (MaaS). Service Bus is an original Azure resource offering, available for several years. The core components of the Service Bus messaging infrastructure are queues, topics, and subscriptions. According to Microsoft, ‘the primary difference is that topics support publish/subscribe capabilities that can be used for sophisticated content-based routing and delivery logic, including sending to multiple recipients.

Since the three Voter API’s microservices are not required to produce messages for more than one other service consumer, Service Bus queues are sufficient, as opposed to a pub/sub model using Service Bus topics.

Cosmos DB

Cosmos DB, Microsoft’s globally distributed, multi-model database, offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs). Ideal for the Voter API, Cosmos DB supports MongoDB’s data models through the MongoDB API, a MongoDB database service built on top of Cosmos DB. The MongoDB API is compatible with existing MongoDB libraries, drivers, tools, and applications. Therefore, there are no code changes required to convert the Voter API from MongoDB to Cosmos DB. I simply had to change the database connection string.

NGINX Ingress Controller

Although the Voter API’s HAProxy-based API Gateway could be deployed to AKS, it is not optimal for Kubernetes. Instead, the Voter API will use an NGINX-based Ingress Controller. NGINX will serve as an API Gateway, as HAProxy did, previously.

According to NGINX, ‘an Ingress is a Kubernetes resource that lets you configure an HTTP load balancer for your Kubernetes services. Such a load balancer usually exposes your services to clients outside of your Kubernetes cluster.

An Ingress resource requires an Ingress Controller to function. Continuing from NGINX, ‘an Ingress Controller is an application that monitors Ingress resources via the Kubernetes API and updates the configuration of a load balancer in case of any changes. Different load balancers require different Ingress controller implementations. In the case of software load balancers, such as NGINX, an Ingress controller is deployed in a pod along with a load balancer.

There are currently two NGINX-based Ingress Controllers available, one from Kubernetes and one directly from NGINX. Both being equal, for this post, I chose the Kubernetes version, without RBAC (Kubernetes offers a version with and without RBAC). RBAC should always be used for actual cluster security. There are several advantages of using either version of the NGINX Ingress Controller for Kubernetes, including Layer 4 TCP and UDP and Layer 7 HTTP load balancing, reverse proxying, ease of SSL termination, dynamically-configurable path-based rules, and support for multiple hostnames.

Azure Web App

Lastly, the Voter Client application, not really part of the Voter API, but useful for demonstration purposes, will be converted from a containerized application to an Azure Web App. Since it is not part of the Voter API, separating the Client application from AKS makes better architectural sense. Web Apps are a powerful, richly-featured, yet incredibly simple way to host applications and services on Azure. For more information on using Azure Web Apps, read my recent post, Developing Applications for the Cloud with Azure App Services and MongoDB Atlas.

Revised Component Architecture

Below is a simplified component diagram of the new architecture, including Azure Service Bus, Cosmos DB, and the NGINX Ingress Controller. The new architecture looks similar to the previous architecture, but as you will see, it is actually very different.

AKS-r8.png

Process Flow

To understand the role of each API component, let’s look at one of the event-driven, decoupled process flows, the creation of a new election candidate. In the simplified flow diagram below, an API consumer executes an HTTP POST request containing the new candidate object as JSON. The Candidate microservice receives the HTTP request and creates a new document in the Cosmos DB Voter database. A Spring RepositoryEventHandler within the Candidate microservice responds to the document creation and publishes a Create Candidate event message, containing the new candidate object as JSON, to the Azure Service Bus Candidate Queue.

Candidate_Producer

Independently, the Voter microservice is listening to the Candidate Queue. Whenever a new message is produced by the Candidate microservice, the Voter microservice retrieves the message off the queue. The Voter microservice then transforms the new candidate object contained in the incoming message to its own candidate data model and creates a new document in its own Voter database.

Voter_Consumer

The same process flows exist between the Election and the Candidate microservices. The Candidate microservice maintains current elections in its database, which are retrieved from the Election queue.

Data Models

It is useful to understand, the Candidate microservice’s candidate domain model is not necessarily identical to the Voter microservice’s candidate domain model. Each microservice may choose to maintain its own representation of a vote, a candidate, and an election. The Voter service transforms the new candidate object in the incoming message based on its own needs. In this case, the Voter microservice is only interested in a subset of the total fields in the Candidate microservice’s model. This is the beauty of decoupling microservices, their domain models, and their datastores.

Other Events

The versions of the Voter API microservices used for this post only support Election Created events and Candidate Created events. They do not handle Delete or Update events, which would be necessary to be fully functional. For example, if a candidate withdraws from an election, the Voter service would need to be notified so no one places votes for that candidate. This would normally happen through a Candidate Delete or Candidate Update event.

Provisioning Azure Service Bus

First, the Azure Service Bus is provisioned. Provisioning the Service Bus may be accomplished using several different methods, including manually using the Azure Portal or programmatically using Azure Resource Manager (ARM) with PowerShell or Terraform. I chose to provision the Azure Service Bus and the two queues using the Azure Portal for expediency. I chose the Basic Service Bus Tier of service, of which there are three tiers, Basic, Standard, and Premium.

Azure_006_Full_ServiceBus

The application requires two queues, the candidate.queue, and the election.queue.

Azure_008_ServiceBus

Provisioning Cosmos DB

Next, Cosmos DB is provisioned. Like Azure Service Bus, Cosmos DB may be provisioned using several methods, including manually using the Azure Portal, programmatically using Azure Resource Manager (ARM) with PowerShell or Terraform, or using the Azure CLI, which was my choice.

az cosmosdb create \
  --name cosmosdb_instance_name_goes_here \
  --resource-group resource_group_name_goes_here \
  --location "East US=0" \
  --kind MongoDB

The post’s Cosmos DB instance exists within the single East US Region, with no failover. In a real Production environment, you would configure Cosmos DB with multi-region failover. I chose MongoDB as the type of Cosmos DB database account to create. The allowed values are GlobalDocumentDB, MongoDB, Parse. All other settings were left to the default values.

The three Spring microservices each have their own database. You do not have to create the databases in advance of consuming the Voter API. The databases and the database collections will be automatically created when new documents are first inserted by the microservices. Below, the three databases and their collections have been created and populated with documents.

Azure_001_CosmosDB

The GitHub project repository also contains three shell scripts to generate sample vote, candidate, and election documents. The scripts will delete any previous documents from the database collections and generate new sets of sample documents. To use, you will have to update the scripts with your own Voter API URL.

AKS_012_AddVotes2

MongoDB Aggregation Pipeline

Each of the three Spring microservices uses Spring Data MongoDB, which takes advantage of MongoDB’s Aggregation Framework. According to MongoDB, ‘the aggregation framework is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.’ Below is an example of aggregation from the Candidate microservice’s VoterContoller class.

Aggregation aggregation = Aggregation.newAggregation(
    match(Criteria.where("election").is(election)),
    group("candidate").count().as("votes"),
    project("votes").and("candidate").previousOperation(),
    sort(Sort.Direction.DESC, "votes")
);

To use MongoDB’s aggregation framework with Cosmos DB, it is currently necessary to activate the MongoDB Aggregation Pipeline Preview Feature of Cosmos DB. The feature can be activated from the Azure Portal, as shown below.

Azure_002_CosmosDB

Cosmos DB Emulator

Be warned, Cosmos DB can be very expensive, even without database traffic or any Production-grade bells and whistles. Be careful when spinning up instances on Azure for learning purposes, the cost adds up quickly! In less than ten days, while writing this post, my cost was almost US$100 for the Voter API’s Cosmos DB instance.

I strongly recommend downloading the free Azure Cosmos DB Emulator to develop and test applications from your local machine. Although certainly not as convenient, it will save you the cost of developing for Cosmos DB directly on Azure.

With Cosmos DB, you pay for reserved throughput provisioned and data stored in containers (a collection of documents or a table or a graph). Yes, that’s right, Azure charges you per MongoDB collection, not even per database. Azure Cosmos DB’s current pricing model seems less than ideal for microservice architectures, each with their own database instance.

By default the reserved throughput, billed as Request Units (RU) per second or RU/s, is set to 1,000 RU/s per collection. For development and testing, you can reduce each collection to a minimum of 400 RU/s. The Voter API creates five collections at 1,000 RU/s or 5,000 RU/s total. Reducing this to a total of 2,000 RU/s makes Cosmos DB marginally more affordable to explore.

Azure_003_CosmosDB.PNG

Building the AKS Cluster

An existing Azure Resource Group is required for AKS. I chose to use the latest available version of Kubernetes, 1.8.2.

# login to azure
az login \
  --username your_username  \
  --password your_password

# create resource group
az group create \
  --resource-group resource_group_name_goes_here \
  --location eastus

# create aks cluster
az aks create \
  --name cluser_name_goes_here \
  --resource-group resource_group_name_goes_here \
  --ssh-key-value path_to_your_public_key \
  --kubernetes-version 1.8.2

# get credentials to access aks cluster
az aks get-credentials \
  --name cluser_name_goes_here \
  --resource-group resource_group_name_goes_here

# display cluster's worker nodes
kubectl get nodes --output=wide

By default, AKS will provision a three-node Kubernetes cluster using Azure’s Standard D1 v2 Virtual Machines. According to Microsoft, ‘D series VMs are general purpose VM sizes provide balanced CPU-to-memory ratio. They are ideal for testing and development, small to medium databases, and low to medium traffic web servers.’ Azure D1 v2 VM’s are based on Linux OS images, currently Debian 8 (Jessie), with 1 vCPU and 3.5 GB of memory. By default with AKS, each VM receives 30 GiB of Standard HDD attached storage.

AKS_002_CreateCluster
You should always select the type and quantity of the cluster’s VMs and their attached storage, optimized for estimated traffic volumes and the specific workloads you are running. This can be done using the --node-count--node-vm-size, and --node-osdisk-size arguments with the az aks create command.

Deployment

The Voter API resources are deployed to its own Kubernetes Namespace, voter-api. The NGINX Ingress Controller resources are deployed to a different namespace, ingress-nginx. Separate namespaces help organize individual Kubernetes resources and separate different concerns.

Voter API

First, the voter-api namespace is created. Then, five required Kubernetes Secrets are created within the namespace. These secrets all contain sensitive information, such as passwords, that should not be shared. There is one secret for each of the three Cosmos DB database connection strings, one secret for the Azure Service Bus connection string, and one secret for the Let’s Encrypt SSL/TLS certificate and private key, used for secure HTTPS access to the Voter API.

AKS_004_CreateVoterAPI

Secrets

The Voter API’s secrets are used to populate environment variables within the pod’s containers. The environment variables are then available for use within the containers. Below is a snippet of the Voter pods resource file showing how the Cosmos DB and Service Bus connection strings secrets are used to populate environment variables.

env:
  - name: AZURE_SERVICE_BUS_CONNECTION_STRING
    valueFrom:
      secretKeyRef:
        name: azure-service-bus
        key: connection-string
  - name: SPRING_DATA_MONGODB_URI
    valueFrom:
      secretKeyRef:
        name: azure-cosmosdb-voter
        key: connection-string

Shown below, the Cosmos DB and Service Bus connection strings secrets have been injected into the Voter container and made available as environment variables to the microservice’s executable JAR file on start-up. As environment variables, the secrets are visible in plain text. Access to containers should be tightly controlled through Kubernetes RBAC and Azure AD, to ensure sensitive information, such as secrets, remain confidential.

AKS_014_EnvVars

Next, the three Kubernetes ReplicaSet resources, corresponding to the three Spring microservices, are created using Deployment controllers. According to Kubernetes, a Deployment that configures a ReplicaSet is now the recommended way to set up replication. The Deployments specify three replicas of each of the three Spring Services, resulting in a total of nine Kubernetes Pods.

Each pod, by default, will be scheduled on a different node if possible. According to Kubernetes, ‘the scheduler will automatically do a reasonable placement (e.g. spread your pods across nodes, not place the pod on a node with insufficient free resources, etc.).’ Note below how each of the three microservice’s three replicas has been scheduled on a different node in the three-node AKS cluster.

AKS_005B_CreateVoterPods

Next, the three corresponding Kubernetes ClusterIP-type Services are created. And lastly, the Kubernetes Ingress is created. According to Kubernetes, the Ingress resource is an API object that manages external access to the services in a cluster, typically HTTP. Ingress provides load balancing, SSL termination, and name-based virtual hosting.

The Ingress configuration contains the routing rules used with the NGINX Ingress Controller. Shown below are the routing rules for each of the three microservices within the Voter API. Incoming API requests are routed to the appropriate pod and service port by NGINX.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: voter-ingress
  namespace: voter-api
  annotations:
    ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - api.voter-demo.com
    secretName: api-voter-demo-secret
  rules:
  - http:
      paths:
      - path: /candidate
        backend:
          serviceName: candidate
          servicePort: 8080
      - path: /election
        backend:
          serviceName: election
          servicePort: 8080
      - path: /voter
        backend:
          serviceName: voter
          servicePort: 8080

The screengrab below shows all of the Voter API resources created on AKS.

AKS_005B_CreateVoterAPI

NGINX Ingress Controller

After completing the deployment of the Voter API, the NGINX Ingress Controller is created. It starts with creating the ingress-nginx namespace. Next, the NGINX Ingress Controller is created, consisting of the NGINX Ingress Controller, three Kubernetes ConfigMap resources, and a default back-end application. The Controller and backend each have their own Service resources. Like the Voter API, each has three replicas, for a total of six pods. Together, the Ingress resource and NGINX Ingress Controller manage traffic to the Spring microservices.

The screengrab below shows all of the NGINX Ingress Controller resources created on AKS.

AKS_015_NGINX_Resources.PNG

The  NGINX Ingress Controller Service, shown above, has an external public IP address associated with itself.  This is because that Service is of the type, Load Balancer. External requests to the Voter API will be routed through the NGINX Ingress Controller, on this IP address.

kind: Service
apiVersion: v1
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app: ingress-nginx
spec:
  externalTrafficPolicy: Local
  type: LoadBalancer
  selector:
    app: ingress-nginx
  ports:
  - name: http
    port: 80
    targetPort: http
  - name: https
    port: 443
    targetPort: https

If you are only using HTTPS, not HTTP, then the references to HTTP and port 80 in the Ingress configuration are unnecessary. The NGINX Ingress Controller’s resources are explained in detail in the GitHub documentation, along with further configuration instructions.

DNS

To provide convenient access to the Voter API and the Voter Client, my domain, voter-demo.com, is associated with the public IP address associated with the Voter API Ingress Controller and with the public IP address associated with the Voter Client Azure Web App. DNS configuration is done through Azure’s DNS Zone resource.

AKS_008_FinalDNS

The two TXT type records might not look as familiar as the SOA, NS, and A type records. The TXT records are required to associate the domain entries with the Voter Client Azure Web App. Browsing to http://www.voter-demo.com or simply http://voter-demo.com brings up the Voter Client.

VoterClientAngular5_small.png

The Client sends and receives data via the Voter API, available securely at https://api.voter-demo.com.

Routing API Requests

With the Pods, Services, Ingress, and NGINX Ingress Controller created and configured, as well as the Azure Layer 4 Load Balancer and DNS Zone, HTTP requests from API consumers are properly and securely routed to the appropriate microservices. In the example below, three back-to-back requests are made to the voter/info API endpoint. HTTP requests are properly routed to one of the three Voter pod replicas using the default round-robin algorithm, as proven by the observing the different hostnames (pod names) and the IP addresses (private pod IPs) in each HTTP response.

AKS_013B_LoadBalancingReplicaSets.PNG

Final Architecture

Shown below is the final Voter API Azure architecture. To simplify the diagram, I have deliberately left out the three microservice’s ClusterIP-type Services, the three default back-end application pods, and the default back-end application’s ClusterIP-type Service. All resources shown below are within the single East US Azure region, except DNS, which is a global resource.

kub-aks-v10-small.png

Shown below is the new Azure Resource Group created by Azure during the AKS provisioning process. The Resource Group contains the various resources required to support the AKS cluster, NGINX Ingress Controller, and the Voter API. Necessary Azure resources were automatically provisioned when I provisioned AKS and when I created the new Voter API and NGINX resources.

AKS_006_FinalRG

In addition to the Resource Group above, the original Resource Group contains the AKS Container Service resource itself, Service Bus, Cosmos DB, and the DNS Zone resource.

AKS_007_FinalRG

The Voter Client Web App, consisting of the Azure App Service and App Service plan resource, is located in a third, separate Resource Group, not shown here.

Cleaning Up AKS

A nice feature of AKS, running a az aks delete command will delete all the Azure resources created as part of provisioning AKS, the API, and the Ingress Controller. You will have to delete the Cosmos DB, Service Bus, and DNS Zone resources, separately.

az aks delete \
  --name cluser_name_goes_here \
  --resource-group resource_group_name_goes_here

Conclusion

Taking advantage of Kubernetes with AKS, and the array of Azure’s enterprise-grade resources, the Voter API was shifted from a simple Docker architecture to a production-ready solution. The Voter API is now easier to manage, horizontally scalable, fault-tolerant, and marginally more secure. It is capable of reliably supporting dozens more microservices, with multiple replicas. The Voter API will handle a high volume of data transactions and event messages.

There is much more that needs to be done to productionalize the Voter API on AKS, including:

  • Add multi-region failover of Cosmos DB
  • Upgrade to Service Bus Standard or Premium Tier
  • Optimized Azure VMs and storage for anticipated traffic volumes and application-specific workloads
  • Implement Kubernetes RBAC
  • Add Monitoring, logging, and alerting with Envoy or similar
  • Secure end-to-end TLS communications with Itsio or similar
  • Secure the API with OAuth and Azure AD
  • Automate everything with DevOps – AKS provisioning, testing code, creating resources, updating microservices, and managing data

All opinions in this post are my own, and not necessarily the views of my current or past employers or their clients.

, , , , , , , , , , , , , ,

4 Comments