Posts Tagged Serverless
Event-driven, Serverless Architectures with AWS Lambda, SQS, DynamoDB, and API Gateway
Posted by Gary A. Stafford in AWS, Bash Scripting, Cloud, DevOps, JavaScript, Python, Software Development on October 4, 2019
Introduction
In this post, we will explore modern application development using an event-driven, serverless architecture on AWS. To demonstrate this architecture, we will integrate several fully-managed services, all part of the AWS Serverless Computing platform, including Lambda, API Gateway, SQS, S3, and DynamoDB. The result will be an application composed of small, easily deployable, loosely coupled, independently scalable, serverless components.
What is ‘Event-Driven’?
According to Otavio Ferreira, Manager, Amazon SNS, and James Hood, Senior Software Development Engineer, in their AWS Compute Blog, Enriching Event-Driven Architectures with AWS Event Fork Pipelines, “Many customers are choosing to build event-driven applications in which subscriber services automatically perform work in response to events triggered by publisher services. This architectural pattern can make services more reusable, interoperable, and scalable.” This description of an event-driven architecture perfectly captures the essence of the following post. All interactions between application components in this post will be as a direct result of triggering an event.
What is ‘Serverless’?
Mistakingly, many of us think of serverless as just functions (aka Function-as-a-Service or FaaS). When it comes to functions on AWS, Lambda is just one of many fully-managed services that make up the AWS Serverless Computing platform. So, what is ‘serverless’? According to AWS, “Serverless applications don’t require provisioning, maintaining, and administering servers for backend components such as compute, databases, storage, stream processing, message queueing, and more.”
As a Developer, one of my favorite features of serverless is the cost, or lack thereof. With serverless on AWS, you pay for consistent throughput or execution duration rather than by server unit, and, at least on AWS, you don’t pay for idle resources. This is not always true of ‘serverless’ offerings on other leading Cloud platforms. Remember, if you’re paying for it but not using it, it’s not serverless.
If you’re paying for it but not using it, it’s not serverless.
Demonstration
To demonstrate an event-driven, serverless architecture, we will build, package, and deploy an application capable of extracting messages from CSV files placed in S3, transforming those messages, queueing them to SQS, and finally, writing the messages to DynamoDB, using Lambda functions throughout. We will also expose a RESTful API, via API Gateway, to perform CRUD-like operations on those messages in DynamoDB.
AWS Technologies
In this demonstration, we will use several AWS serverless services, including the following.
- AWS Lambda
- Amazon Simple Storage Service (S3)
- Amazon DynamoDB
- Amazon API Gateway
- Amazon Simple Queue Service (SQS)
Each Lambda will use function-specific execution roles, part of AWS Identity and Access Management (IAM). We will log the event details and monitor services using Amazon CloudWatch.
To codify, build, package, deploy, and manage the Lambda functions and other AWS resources in a fully automated fashion, we will also use the following AWS services:
- AWS Serverless Application Model (SAM)
- AWS CloudFormation
- AWS Command Line Interface (CLI)
- AWS SDK for JavaScript in Node.js
- AWS SDK for Python (boto3)
Architecture
The high-level architecture for the platform provisioned and deployed in this post is illustrated in the diagram below. There are two separate workflows. In the first workflow (top), data is extracted from CSV files placed in S3, transformed, queued to SQS, and written to DynamoDB, using Python-based Lambda functions throughout. In the second workflow (bottom), data is manipulated in DynamoDB through interactions with a RESTful API, exposed via an API Gateway, and backed by Node.js-based Lambda functions.
Using the vast array of current AWS services, there are several ways we could extract, transform, and load data from static files into DynamoDB. The demonstration’s event-driven, serverless architecture represents just one possible approach.
Source Code
All source code for this post is available on GitHub in a single public repository, serverless-sqs-dynamo-demo. To clone the GitHub repository, execute the following command.
git clone --branch master --single-branch --depth 1 --no-tags \ https://github.com/garystafford/serverless-sqs-dynamo-demo.git
The project files relevant to this demonstration are organized as follows.
. ├── README.md ├── lambda_apigtw_to_dynamodb │ ├── app.js │ ├── events │ ├── node_modules │ ├── package.json │ └── tests ├── lambda_s3_to_sqs │ ├── __init__.py │ ├── app.py │ ├── requirements.txt │ └── tests ├── lambda_sqs_to_dynamodb │ ├── __init__.py │ ├── app.py │ ├── requirements.txt │ └── tests ├── requirements.txt ├── template.yaml └── sample_data ├── data.csv ├── data_bad_msg.csv └── data_good_msg.csv
Some source code samples in this post are GitHub Gists, which may not display correctly on all social media browsers, such as LinkedIn.
Prerequisites
The demonstration assumes you already have an AWS account. You will need the latest copy of the AWS CLI, SAM CLI, and Python 3 installed on your development machine.
Additionally, you will need two existing S3 buckets. One bucket will be used to store the packaged project files for deployment. The second bucket is where we will place CSV data files, which, in turn, will trigger events that invoke multiple Lambda functions.
Deploying the Project
Before diving into the code, we will deploy the project to AWS. Conveniently, the entire project’s resources are codified in an AWS SAM template. We are using the AWS Serverless Application Model (SAM). AWS SAM is a model used to define serverless applications on AWS. According to the official SAM GitHub project documentation, AWS SAM is based on AWS CloudFormation. A serverless application is defined in a CloudFormation template and deployed as a CloudFormation stack.
Template Parameter
CloudFormation will create and uniquely name the SQS queues and the DynamoDB table. However, to avoid circular references, a common issue when creating resources associated with S3 event notifications, it is easier to use a pre-existing bucket. To start, you will need to change the SAM template’s DataBucketName
parameter’s default value to your own S3 bucket name. Again, this bucket is where we will eventually push the CSV data files. Alternately, override the default values using the sam build command
, next.
Parameters: DataBucketName: Type: String Description: S3 bucket where CSV files are processed Default: your-data-bucket-name
SAM CLI Commands
With the DataBucketName
parameter set, proceed to validate, build, package, and deploy the project using the SAM CLI and the commands below. In addition to the sam validate
command, I also like to use the aws cloudformation validate-template
command to validate templates and catch any potential, additional errors.
Note the S3_BUCKET_BUILD
variable, below, refers to the name of the S3 bucket SAM will use package and deploy the project from, as opposed to the S3 bucket, which the CSV data files will be placed into (gist).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# variables | |
S3_BUILD_BUCKET=your_build_bucket_name | |
STACK_NAME=your_cloudformation_stack_name | |
# validate | |
sam validate –template template.yaml | |
aws cloudformation validate-template \ | |
–template-body file://template.yaml | |
# build | |
sam build –template template.yaml | |
# package | |
sam package \ | |
–output-template-file packaged.yaml \ | |
–s3-bucket $S3_BUILD_BUCKET | |
# deploy | |
sam deploy –template-file packaged.yaml \ | |
–stack-name $STACK_NAME \ | |
–capabilities CAPABILITY_IAM \ | |
–debug |
After validating the template, SAM will build and package each individual Lambda function and its associated dependencies. Below, we see each individual Lambda function being packaged with a copy of its dependencies.
Once packaged, SAM will deploy the project and create the AWS resources as a CloudFormation stack.
Once the stack creation is complete, use the CloudFormation management console to review the AWS resources created by SAM. There are approximately 14 resources defined in the SAM template, which result in 33 individual resources deployed as part of the CloudFormation stack.
Note the stack’s output values. You will need these values to interact with the deployed platform, later in the demonstration.
Test the Deployed Application
Once the CloudFormation stack has deployed without error, copying a CSV file to the S3 bucket is the quickest way to confirm everything is working. The project includes test data files with 20 rows of test message data. Below is a sample of the CSV file, which is included in the project. The data was collected from IoT devices that measured response time from wired versus wireless devices on a LAN network; the message details are immaterial to this demonstration (gist).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
timestamp | location | source | local_dest | local_avg | remote_dest | remote_avg | |
---|---|---|---|---|---|---|---|
1559040909.3853335 | location-03 | wireless | router-1 | 4.39 | device-1 | 9.09 | |
1559040919.5273902 | location-03 | wireless | router-1 | 0.49 | device-1 | 16.75 | |
1559040929.6446512 | location-03 | wireless | router-1 | 0.56 | device-1 | 8.31 | |
1559040939.7712135 | location-03 | wireless | router-1 | 1.64 | device-1 | 9.4 | |
1559040949.891723 | location-03 | wireless | router-1 | 1.18 | device-1 | 9.07 | |
1559040960.011338 | location-03 | wireless | router-1 | 0.42 | device-1 | 8.4 | |
1559040970.1319716 | location-03 | wireless | router-1 | 1.73 | device-1 | 8.66 | |
1559040980.2533505 | location-03 | wireless | router-1 | 0.67 | device-1 | 8.61 | |
1559040990.3816211 | location-03 | wireless | router-1 | 1.27 | device-1 | 10.87 | |
1559041000.5105414 | location-03 | wireless | router-1 | 1.63 | device-1 | 10.08 |
Run the following commands to copy the test data file to your S3 bucket.
S3_DATA_BUCKET=your_data_bucket_name aws s3 cp sample_data/data.csv s3://$S3_DATA_BUCKET
Visit the DynamoDB management console. You should observe a new DynamoDB table.
Within the new DynamoDB table, you should observe twenty items, corresponding to each of the twenty rows in the CSV file, uploaded to S3.
Drill into an individual item within the table and review its attributes. They should match the rows in the CSV file.
Both the Python- and Node.js-based Lambda functions have their default logging levels set to debug. The debug-level output from each Lambda function is streamed to individual Amazon CloudWatch Log Groups. We can use the CloudWatch logs to troubleshoot any issues with the deployed application. Below we see an example of CloudWatch log entries for the request and response payloads generated from GetMessageFunction Lambda function, which is querying DynamoDB for a single Item.
Event-Driven Patterns
There are three distinct and discrete event-driven dataflows within the demonstration’s architecture
- S3 Event Source for Lambda (S3 to SQS)
- SQS Event Source for Lambda (SQS to DynamoDB)
- API Gateway Event Source for Lambda (API Gateway to DynamoDB)
Let’s examine each event-driven dataflow and the Lambda code associated with that part of the architecture.
S3 Event Source for Lambda
Whenever a file is copied into the target S3 bucket, an S3 Event Notification triggers an asynchronous invocation of a Lambda. According to AWS, when you invoke a function asynchronously, the Lambda sends the event to the SQS queue. A separate process reads events from the queue and executes your Lambda function.
The Lambda’s function handler, written in Python, reads the CSV file, whose filename is contained in the event. The Lambda extracts the rows in the CSV file, transforms the data, and pushes each message to the SQS queue (gist).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def lambda_handler(event, context): | |
bucket = event['Records'][0]['s3']['bucket']['name'] | |
key = urllib.parse.unquote_plus( | |
event['Records'][0]['s3']['object']['key'], | |
encoding='utf-8' | |
) | |
messages = read_csv_file(bucket, key) | |
process_messages(messages) |
Below is an example of a message body, part an SQS message, extracted from a single row of the CSV file, and sent by the Lambda to the SQS queue. The timestamp has been converted to separate date and time fields by the Lambda. The DynamoDB table is part of the SQS message body. The key/value pairs in the Item JSON object reflect the schema of the DynamoDB table (gist).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"TableName": "your-dynamodb-table-name", | |
"Item": { | |
"date": { | |
"S": "2001-01-01" | |
}, | |
"time": { | |
"S": "09:01:05" | |
}, | |
"location": { | |
"S": "location-03" | |
}, | |
"source": { | |
"S": "wireless" | |
}, | |
"local_dest": { | |
"S": "router-1" | |
}, | |
"local_avg": { | |
"N": "5.55" | |
}, | |
"remote_dest": { | |
"S": "device-1" | |
}, | |
"remote_avg": { | |
"N": "10.10" | |
} | |
} | |
} |
SQS Event Source for Lambda
According to AWS, SQS offers two types of message queues, Standard and FIFO (First-In-First-Out). An SQS FIFO queue is designed to guarantee that messages are processed exactly once, in the exact order that they are sent. A Standard SQS queue offers maximum throughput, best-effort ordering, and at-least-once delivery.
Examining the SQS management console, you should observe that the CloudFormation stack creates two SQS Standard queues—a primary queue and a Dead Letter Queue (DLQ). According to AWS, Amazon SQS supports dead-letter queues, which other queues (source queues) can target for messages that cannot be processed (consumed) successfully.
Examining the SQS Lambda Triggers tab, you should observe the Lambda, which will be triggered by the SQS events.
When a message is pushed into the SQS queue by the previous process, an SQS event is fired, which synchronously triggers an invocation of the Lambda using the SQS Event Source for Lambda functionality. When a function is invoked synchronously, Lambda runs the function and waits for a response.
In the demonstration, the Lambda’s function handler, also written in Python, pulls the message off of the SQS queue and writes the message (DynamoDB put) to the DynamoDB table. Although writing is the primary use case in this demonstration, an event could also trigger a get, scan, update, or delete command to be executed on the DynamoDB table (gist).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def lambda_handler(event, context): | |
operations = { | |
'DELETE': lambda dynamo, x: dynamo.delete_item(**x), | |
'POST': lambda dynamo, x: dynamo.put_item(**x), | |
'PUT': lambda dynamo, x: dynamo.update_item(**x), | |
'GET': lambda dynamo, x: dynamo.get_item(**x), | |
'GET_ALL': lambda dynamo, x: dynamo.scan(**x), | |
} | |
for record in event['Records']: | |
payload = loads(record['body'], parse_float=str) | |
operation = record['messageAttributes']['Method']['stringValue'] | |
if operation in operations: | |
try: | |
operations[operation](dynamo_client, payload) | |
except Exception as e: | |
logger.error(e) | |
else: | |
logger.error('Unsupported method \'{}\''.format(operation)) |
API Gateway Event Source for Lambda
Examining the API Gateway management console, you should observe that CloudFormation created a new Edge-optimized API. The API contains several resources and their associated HTTP methods.
Each API resource is associated with a deployed Lambda function. Switching to the Lambda console, you should observe a total of seven new Lambda functions. There are five Lambda functions related to the API, in addition to the Lambda called by the S3 event notifications and the Lambda called by the SQS event notifications.
Examining one of the Lambda functions associated with the API Gateway, we should observe that the API Gateway trigger for the Lambda (lower left and bottom).
When an end-user makes an HTTP(S) request via the RESTful API exposed by the API Gateway, an event is fired, which synchronously invokes a Lambda using the API Gateway Event Source for Lambda functionality. The event contains details about the HTTP request that is received. The event triggers any one of five different Lambda functions, depending on the HTTP request method.
The Lambda code, written in Node.js, contains five function handlers. Each handler corresponds to an HTTP method, including GET (DynamoDB get) POST (put), PUT (update), DELETE (delete), and SCAN (scan). Below is an example of the getMessage
handler function. The function accepts two inputs. First, a path parameter, the date, which is the primary partition key for the DynamoDB table. Second, a query parameter, the time, which is the primary sort key for the DynamoDB table. Both the primary partition key and sort key must be passed to DynamoDB to retrieve the requested record (gist).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
exports.getMessage = async (event, context) => { | |
if (tableName == null) { | |
tableName = process.env.TABLE_NAME; | |
} | |
params = { | |
TableName: tableName, | |
Key: { | |
"date": event.pathParameters.date, | |
"time": event.queryStringParameters.time | |
} | |
}; | |
console.debug(params.Key); | |
return await new Promise((resolve, reject) => { | |
docClient.get(params, (error, data) => { | |
if (error) { | |
console.error(`getMessage ERROR=${error.stack}`); | |
resolve({ | |
statusCode: 400, | |
error: `Could not get messages: ${error.stack}` | |
}); | |
} else { | |
console.info(`getMessage data=${JSON.stringify(data)}`); | |
resolve({ | |
statusCode: 200, | |
body: JSON.stringify(data) | |
}); | |
} | |
}); | |
}); | |
}; |
Test the API
To test the Lambda functions, called by our API, we can use the sam local invoke
command, part of the SAM CLI. Using this command, we can test the local Lambda functions, without them being deployed to AWS. The command allows us to trigger events, which the Lambda functions will handle. This is useful as we continue to develop, test, and re-deploy the Lambda functions to our Development, Staging, and Production environments.
The local Node.js-based, API-related Lambda functions, just like their deployed copies, will execute commands against the actual DynamoDB on AWS. The Github project contains a set of five sample events, corresponding to the five Lambda functions, which in turn are associated with five different HTTP methods and API resources. For example, the event_getMessage.json event is associated with the GET HTTP method and calls the /message/{date}?time={time}
resource endpoint, to return a single item. This event, shown below, triggers the GetMessageFunction Lambda (gist).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"body": "", | |
"resource": "/", | |
"path": "/message", | |
"httpMethod": "GET", | |
"isBase64Encoded": false, | |
"queryStringParameters": { | |
"time": "06:45:43" | |
}, | |
"pathParameters": { | |
"date": "2000-01-01" | |
}, | |
"stageVariables": {} | |
} |
We can trigger all the events from using the CLI. The local Lambda expects the DynamoDB table name to exist as an environment variable. Make sure you set it locally, first, before executing the sam local invoke
commands (gist).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# variables (required by local lambda functions) | |
TABLE_NAME=your-dynamodb-table-name | |
# local testing (All CRUD functions) | |
sam local invoke PostMessageFunction \ | |
–event lambda_apigtw_to_dynamodb/events/event_postMessage.json | |
sam local invoke GetMessageFunction \ | |
–event lambda_apigtw_to_dynamodb/events/event_getMessage.json | |
sam local invoke GetMessagesFunction \ | |
–event lambda_apigtw_to_dynamodb/events/event_getMessages.json | |
sam local invoke PutMessageFunction \ | |
–event lambda_apigtw_to_dynamodb/events/event_putMessage.json | |
sam local invoke DeleteMessageFunction \ | |
–event lambda_apigtw_to_dynamodb/events/event_deleteMessage.json |
If the events were successfully handled by the local Lambda functions, in the terminal, you should see the same HTTP response status codes you would expect from calling the RESTful resources via the API Gateway. Below, for example, we see the POST event being handled by the PostMessageFunction Lambda, adding a record to the DynamoDB table, and returning a successful status of 201 Created
.
Testing the Deployed API
To test the actual deployed API, we can call one of the API’s resources using an HTTP client, such as Postman. To locate the URL used to invoke the API resource, look at the ‘Prod’ Stage for the new API. This can be found in the Stages tab of the API Gateway console. For example, note the Invoke URL for the POST HTTP method of the /message
resource, shown below.
Below, we see an example of using Postman to make an HTTP GET request the /message/{date}?time={time}
resource. We pass the required query and path parameters for date and for time. The request should receive a single item in response from DynamoDB, via the API Gateway and the associated Lambda. Here, the request was successful, and the Lambda function returns a 200 OK
status.
Similarly, below, we see an example of calling the same /message
endpoint using the HTTP POST method. In the body of the POST request, we pass the DynamoDB table name and the Item object. Again, the POST is successful, and the Lambda function returns a 201 Created
status.
Cleaning Up
To complete the demonstration and remove the AWS resources, run the following commands. It is necessary to delete all objects from the S3 data bucket, first, before deleting the CloudFormation stack. Else, the stack deletion will fail.
S3_DATA_BUCKET=your_data_bucket_name STACK_NAME=your_stack_name aws s3 rm s3://$S3_DATA_BUCKET/data.csv # and any other objects aws cloudformation delete-stack \ --stack-name $STACK_NAME
Conclusion
In this post, we explored a simple example of building a modern application using an event-driven serverless architecture on AWS. We used several services, all part of the AWS Serverless Computing platform, including Lambda, API Gateway, SQS, S3, and DynamoDB. In addition to these, AWS has additional serverless services, which could enhance this demonstration, in particular, Amazon Kinesis, AWS Step Functions, Amazon SNS, and AWS AppSync.
In a future post, we will look at how to further test the individual components within this demonstration’s application stack, and how to automate its deployment using DevOps and CI/CD principals on AWS.
All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.
IoT Telemetry Collection using Google Protocol Buffers, Google Cloud Functions, Cloud Pub/Sub, and MongoDB Atlas
Posted by Gary A. Stafford in Big Data, Cloud, GCP, Python, Serverless, Software Development on May 21, 2019
Collect IoT sensor telemetry using Google Protocol Buffers’ serialized binary format over HTTPS, serverless Google Cloud Functions, Google Cloud Pub/Sub, and MongoDB Atlas on GCP, as an alternative to integrated Cloud IoT platforms and standard IoT protocols. Aggregate, analyze, and build machine learning models with the data using tools such as MongoDB Compass, Jupyter Notebooks, and Google’s AI Platform Notebooks.
Introduction
Most of the dominant Cloud providers offer IoT (Internet of Things) and IIotT (Industrial IoT) integrated services. Amazon has AWS IoT, Microsoft Azure has multiple offering including IoT Central, IBM’s offering including IBM Watson IoT Platform, Alibaba Cloud has multiple IoT/IIoT solutions for different vertical markets, and Google offers Google Cloud IoT platform. All of these solutions are marketed as industrial-grade, highly-performant, scalable technology stacks. They are capable of scaling to tens-of-thousands of IoT devices or more and massive amounts of streaming telemetry.
In reality, not everyone needs a fully integrated IoT solution. Academic institutions, research labs, tech start-ups, and many commercial enterprises want to leverage the Cloud for IoT applications, but may not be ready for a fully-integrated IoT platform or are resistant to Cloud vendor platform lock-in.
Similarly, depending on the performance requirements and the type of application, organizations may not need or want to start out using IoT/IIOT industry standard data and transport protocols, such as MQTT (Message Queue Telemetry Transport) or CoAP (Constrained Application Protocol), over UDP (User Datagram Protocol). They may prefer to transmit telemetry over HTTP using TCP, or securely, using HTTPS (HTTP over TLS).
Demonstration
In this demonstration, we will collect environmental sensor data from a number of IoT device sensors and stream that telemetry over the Internet to Google Cloud. Each IoT device is installed in a different physical location. The devices contain a variety of common sensors, including humidity and temperature, motion, and light intensity.

Prototype IoT Devices used in this Demonstration
We will transmit the sensor telemetry data as JSON over HTTP to serverless Google Cloud Function HTTPS endpoints. We will then switch to using Google’s Protocol Buffers to transmit binary data over HTTP. We should observe a reduction in the message size contained in the request payload as we move from JSON to Protobuf, which should reduce system latency and cost.
Data received by Cloud Functions over HTTP will be published asynchronously to Google Cloud Pub/Sub. A second Cloud Function will respond to all published events and push the messages to MongoDB Atlas on GCP. Once in Atlas, we will aggregate, transform, analyze, and build machine learning models with the data, using tools such as MongoDB Compass, Jupyter Notebooks, and Google’s AI Platform Notebooks.
For this demonstration, the architecture for JSON over HTTP will look as follows. All sensors will transmit data to a single Cloud Function HTTPS endpoint.
For Protobuf over HTTP, the architecture will look as follows in the demonstration. Each type of sensor will transmit data to a different Cloud Function HTTPS endpoint.
Although the Cloud Functions will automatically scale horizontally to accommodate additional load created by the volume of telemetry being received, there are also other options to scale the system. For example, we could create individual pipelines of functions and topic/subscriptions for each sensor type. We could also split the telemetry data across multiple MongoDB Atlas Collections, based on sensor type, instead of a single collection. In all cases, we will still benefit from the Cloud Function’s horizontal scaling capabilities.
Source Code
All source code is all available on GitHub. Use the following command to clone the project.
git clone \ --branch master --single-branch --depth 1 --no-tags \ https://github.com/garystafford/iot-protobuf-demo.git
You will need to adjust the project’s environment variables to fit your own development and Cloud environments. All source code for this post is written in Python. It is intended for Python 3 interpreters but has been tested using Python 2 interpreters. The project’s Jupyter Notebooks can be viewed from within the project on GitHub or using the free, online Jupyter nbviewer.
Technologies
Protocol Buffers
According to Google, Protocol Buffers (aka Protobuf) are a language- and platform-neutral, efficient, extensible, automated mechanism for serializing structured data for use in communications protocols, data storage, and more. Protocol Buffers are 3 to 10 times smaller and 20 to 100 times faster than XML.
Each protocol buffer message is a small logical record of information, containing a series of strongly-typed name-value pairs. Once you have defined your messages, you run the protocol buffer compiler for your application’s language on your .proto
file to generate data access classes.
Google Cloud Functions
According to Google, Cloud Functions is Google’s event-driven, serverless compute platform. Key features of Cloud Functions include automatic scaling, high-availability, fault-tolerance,
no servers to provision, manage, patch or update, only
pay while your code runs, and they easily connect and extend other cloud services. Cloud Functions natively support multiple event-types, including HTTP, Cloud Pub/Sub, Cloud Storage, and Firebase. Current language support includes Python, Go, and Node.
Google Cloud Pub/Sub
According to Google, Cloud Pub/Sub is an enterprise message-oriented middleware for the Cloud. It is a scalable, durable event ingestion and delivery system. By providing many-to-many, asynchronous messaging that decouples senders and receivers, it allows for secure and highly available communication among independent applications. Cloud Pub/Sub delivers low-latency, durable messaging that integrates with systems hosted on the Google Cloud Platform and externally.
MongoDB Atlas
MongoDB Atlas is a fully-managed MongoDB-as-a-Service, available on AWS, Azure, and GCP. Atlas, a mature SaaS product, offers high-availability, uptime service-level agreements, elastic scalability, cross-region replication, enterprise-grade security, LDAP integration, BI Connector, and much more.
MongoDB Atlas currently offers four pricing plans, Free, Basic, Pro, and Enterprise. Plans range from the smallest, free M0-sized MongoDB cluster, with shared RAM and 512 MB storage, up to the massive M400 MongoDB cluster, with 488 GB of RAM and 3 TB of storage.
Cost Effectiveness of Cloud Functions
At true IIoT scale, Google Cloud Functions may not be the most efficient or cost-effective method of ingesting telemetry data. Based on Google’s pricing model, you get two million free function invocations per month, with each additional million invocations costing USD $0.40. The total cost also includes memory usage, total compute time, and outbound data transfer. If your system is comprised of tens or hundreds of IoT devices, Cloud Functions may prove cost-effective.
However, with thousands of devices or more, each transmitting data multiple times per minutes, you could quickly outgrow the cost-effectiveness of Google Functions. In that case, you might look to Google’s Google Cloud IoT platform. Alternately, you can build your own platform with Google products such as Knative, letting you choose to run your containers either fully managed with the newly-released Cloud Run, or in your Google Kubernetes Engine cluster with Cloud Run on GKE.
Sensor Scripts
For each sensor type, I have developed separate Python scripts, which run on each IoT device. There are two versions of each script, one for JSON over HTTP and one for Protobuf over HTTP.
JSON over HTTPS
Below we see the script, dht_sensor_http_json.py, used to transmit humidity and temperature data via JSON over HTTP to a Google Cloud Function running on GCP. The JSON request payload contains a timestamp, IoT device ID, device type, and the temperature and humidity sensor readings. The URL for the Google Cloud Function is stored as an environment variable, local to the IoT devices, and set when the script is deployed.
import json import logging import os import socket import sys import time import Adafruit_DHT import requests URL = os.environ.get('GCF_URL') JWT = os.environ.get('JWT') SENSOR = Adafruit_DHT.DHT22 TYPE = 'DHT22' PIN = 18 FREQUENCY = 15 def main(): if not URL or not JWT: sys.exit("Are the Environment Variables set?") get_sensor_data(socket.gethostname()) def get_sensor_data(device_id): while True: humidity, temperature = Adafruit_DHT.read_retry(SENSOR, PIN) payload = {'device': device_id, 'type': TYPE, 'timestamp': time.time(), 'data': {'temperature': temperature, 'humidity': humidity}} post_data(payload) time.sleep(FREQUENCY) def post_data(payload): payload = json.dumps(payload) headers = { 'Content-Type': 'application/json; charset=utf-8', 'Authorization': JWT } try: requests.post(URL, json=payload, headers=headers) except requests.exceptions.ConnectionError: logging.error('Error posting data to Cloud Function!') except requests.exceptions.MissingSchema: logging.error('Error posting data to Cloud Function! Are Environment Variables set?') if __name__ == '__main__': sys.exit(main())
Telemetry Frequency
Although the sensors are capable of producing data many times per minute, for this demonstration, sensor telemetry is intentionally limited to only being transmitted every 15 seconds. To reduce system complexity, potential latency, back-pressure, and cost, in my opinion, you should only produce telemetry data at the frequency your requirements dictate.
JSON Web Tokens
For security, in addition to the HTTPS endpoints exposed by the Google Cloud Functions, I have incorporated the use of a JSON Web Token (JWT). JSON Web Tokens are an open, industry standard RFC 7519 method for representing claims securely between two parties. In this case, the JWT is used to verify the identity of the sensor scripts sending telemetry to the Cloud Functions. The JWT contains an id, password, and expiration, all encrypted with a secret key, which is known to each Cloud Function, in order to verify the IoT device’s identity. Without the correct JWT being passed in the Authorization header, the request to the Cloud Function will fail with an HTTP status code of 401 Unauthorized. Below is an example of the JWT’s payload data.
{ "sub": "IoT Protobuf Serverless Demo", "id": "iot-demo-key", "password": "t7J2gaQHCFcxMD6584XEpXyzWhZwRrNJ", "iat": 1557407124, "exp": 1564664724 }
For this demonstration, I created a temporary JWT using jwt.io. The HTTP Functions are using PyJWT
, a Python library which allows you to encode and decode the JWT. The PyJWT library allows the Function to decode and validate the JWT (Bearer Token) from the incoming request’s Authorization header. The JWT token is stored as an environment variable. Deployment instructions are included in the GitHub project.
JSON Payload
Below is a typical JSON request payload (pretty-printed), containing DHT sensor data. This particular message is 148 bytes in size. The message format is intentionally reader-friendly. We could certainly shorten the message’s key fields, to reduce the payload size by an additional 15-20 bytes.
{ "device": "rp829c7e0e", "type": "DHT22", "timestamp": 1557585090.476025, "data": { "temperature": 17.100000381469727, "humidity": 68.0999984741211 } }
Protocol Buffers
For the demonstration, I have built a Protocol Buffers file, sensors.proto
, to support the data output by three sensor types: digital humidity and temperature (DHT), passive infrared sensor (PIR), and digital light intensity (DLI). I am using the newer proto3
version of the protocol buffers language. I have created a common Protobuf sensor message schema, with the variable sensor telemetry stored in the nested data
object, within each message type.
It is important to use the correct Protobuf Scalar Value Type to maintain numeric precision in the language you compile for. For simplicity, I am using a double
to represent the timestamp, as well as the numeric humidity and temperature readings. Alternately, you could choose Google’s Protobuf WellKnownTypes
, Timestamp to store timestamp.
syntax = "proto3"; package sensors; // DHT22 message SensorDHT { string device = 1; string type = 2; double timestamp = 3; DataDHT data = 4; } message DataDHT { double temperature = 1; double humidity = 2; } // Onyehn_PIR message SensorPIR { string device = 1; string type = 2; double timestamp = 3; DataPIR data = 4; } message DataPIR { bool motion = 1; } // Anmbest_MD46N message SensorDLI { string device = 1; string type = 2; double timestamp = 3; DataDLI data = 4; } message DataDLI { bool light = 1; }
Since the sensor data will be captured with scripts written in Python 3, the Protocol Buffers file is compiled for Python, resulting in the file, sensors_pb2.py
.
protoc --python_out=. sensors.proto
Protocol Buffers over HTTPS
Below we see the alternate DHT sensor script, dht_sensor_http_pb.py, which transmits a Protocol Buffers-based binary request payload over HTTPS to a Google Cloud Function running on GCP. Note the request’s Content-Type
header has been changed from application/json
to application/x-protobuf
. In this case, instead of JSON, the same data fields are stored in an instance of the Protobuf’s SensorDHT
message type (sensors_pb2.SensorDHT()
). Note the import sensors_pb2
statement. This statement imports the compiled Protocol Buffers file, which is stored locally to the script on the IoT device.
import logging import os import socket import sys import time import Adafruit_DHT import requests import sensors_pb2 URL = os.environ.get('GCF_DHT_URL') JWT = os.environ.get('JWT') SENSOR = Adafruit_DHT.DHT22 TYPE = 'DHT22' PIN = 18 FREQUENCY = 15 def main(): if not URL or not JWT: sys.exit("Are the Environment Variables set?") get_sensor_data(socket.gethostname()) def get_sensor_data(device_id): while True: try: humidity, temperature = Adafruit_DHT.read_retry(SENSOR, PIN) sensor_dht = sensors_pb2.SensorDHT() sensor_dht.device = device_id sensor_dht.type = TYPE sensor_dht.timestamp = time.time() sensor_dht.data.temperature = temperature sensor_dht.data.humidity = humidity payload = sensor_dht.SerializeToString() post_data(payload) time.sleep(FREQUENCY) except TypeError: logging.error('Error getting sensor data!') def post_data(payload): headers = { 'Content-Type': 'application/x-protobuf', 'Authorization': JWT } try: requests.post(URL, data=payload, headers=headers) except requests.exceptions.ConnectionError: logging.error('Error posting data to Cloud Function!') except requests.exceptions.MissingSchema: logging.error('Error posting data to Cloud Function! Are Environment Variables set?') if __name__ == '__main__': sys.exit(main())
Protobuf Binary Payload
To understand the binary Protocol Buffers-based payload, we can write a sample SensorDHT
message to a file on disk as a byte array.
message = sensorDHT.SerializeToString() binary_file_output = open("./data_binary.txt", "wb") file_byte_array = bytearray(message) binary_file_output.write(file_byte_array)
Then, using the hexdump
command, we can view a representation of the binary data file.
> hexdump -C data_binary.txt 00000000 0a 08 38 32 39 63 37 65 30 65 12 05 44 48 54 32 |..829c7e0e..DHT2| 00000010 32 1d 05 a0 b9 4e 22 0a 0d ec 51 b2 41 15 cd cc |2....N"...Q.A...| 00000020 38 42 |8B| 00000022
The binary data file size is 48 bytes on disk, as compared to the equivalent JSON file size of 148 bytes on disk (32% the size). As a test, we could then send that binary data file as the payload of a POST to the Cloud Function, as shown below using Postman. Postman will serialize the binary data file’s contents to a binary string before transmitting.
Similarly, we can serialize the same binary Protocol Buffers-based SensorDHT
message to a binary string using the SerializeToString
method.
message = sensorDHT.SerializeToString() print(message)
The resulting binary string resembles the following.
b'\n\nrp829c7e0e\x12\x05DHT22\x19c\xee\xbcg\xf5\x8e\xccA"\x12\t\x00\x00\x00\xa0\x99\x191@\x11\x00\x00\x00`f\x06Q@'
The binary string length of the serialized message, and therefore the request payload sent by Postman and received by the Cloud Function for this particular message, is 111 bytes, as compared to the JSON payload size of 148 bytes (75% the size).
Validate Protobuf Payload
To validate the data contained in the Protobuf payload is identical to the JSON payload, we can parse the payload from the serialized binary string using the Protobuf ParseFromString
method. We then convert it to JSON using the Protobuf MessageToJson
method.
message = sensorDHT.SerializeToString() message_parsed = sensors_pb2.SensorDHT() message_parsed.ParseFromString(message) print(MessageToJson(message_parsed))
The resulting JSON object is identical to the JSON payload sent using JSON over HTTPS, earlier in the demonstration.
{ "device": "rp829c7e0e", "type": "DHT22", "timestamp": 1557585090.476025, "data": { "temperature": 17.100000381469727, "humidity": 68.0999984741211 } }
Google Cloud Functions
There are a series of Google Cloud Functions, specifically four HTTP Functions, which accept the sensor data over HTTP from the IoT devices. Each function exposes an HTTPS endpoint. According to Google, you use HTTP functions when you want to invoke your function via an HTTP(S) request. To allow for HTTP semantics, HTTP function signatures accept HTTP-specific arguments.
Below, I have deployed a single function that accepts JSON sensor telemetry from all sensor types, and three functions for Protobuf, one for each sensor type: DHT, PIR, and DLI.
JSON Message Processing
Below, we see the Cloud Function, main.py, which processes the incoming JSON over HTTPS payload from all sensor types. Once the request’s JWT is validated, the JSON message payload is serialized to a byte string and sent to a common Google Cloud Pub/Sub Topic. Note the JWT secret key, id, and password, and the Google Cloud Pub/Sub Topic are all stored as environment variables, local to the Cloud Functions. In my tests, the JSON-based HTTP Functions took an average of 9–18 ms to execute successfully.
import logging import os import jwt from flask import make_response, jsonify from flask_api import status from google.cloud import pubsub_v1 TOPIC = os.environ.get('TOPIC') SECRET_KEY = os.getenv('SECRET_KEY') ID = os.getenv('ID') PASSWORD = os.getenv('PASSWORD') def incoming_message(request): if not validate_token(request): return make_response(jsonify({'success': False}), status.HTTP_401_UNAUTHORIZED, {'ContentType': 'application/json'}) request_json = request.get_json() if not request_json: return make_response(jsonify({'success': False}), status.HTTP_400_BAD_REQUEST, {'ContentType': 'application/json'}) send_message(request_json) return make_response(jsonify({'success': True}), status.HTTP_201_CREATED, {'ContentType': 'application/json'}) def validate_token(request): auth_header = request.headers.get('Authorization') if not auth_header: return False auth_token = auth_header.split(" ")[1] if not auth_token: return False try: payload = jwt.decode(auth_token, SECRET_KEY) if payload['id'] == ID and payload['password'] == PASSWORD: return True except jwt.ExpiredSignatureError: return False except jwt.InvalidTokenError: return False def send_message(message): publisher = pubsub_v1.PublisherClient() publisher.publish(topic=TOPIC, data=bytes(str(message), 'utf-8'))
The Cloud Functions are deployed to GCP using the gcloud functions deploy
CLI command (I use Jenkins to automate the deployments). I have wrapped the deploy commands into bash scripts. The script also copies over a common environment variables YAML file, consumed by the Cloud Function. Each Function has a deployment script, included in the project.
# get latest env vars file cp -f ./../env_vars_file/env.yaml . # deploy function gcloud functions deploy http_json_to_pubsub \ --runtime python37 \ --trigger-http \ --region us-central1 \ --memory 256 \ --entry-point incoming_message \ --env-vars-file env.yaml
Using a .gcloudignore
file, the gcloud functions deploy
CLI command deploys three files: the cloud function (main.py
), required Python packages file (requirements.txt
), the environment variables file (env.yaml
). Google automatically installs dependencies using the requirements.txt
file.
Protobuf Message Processing
Below, we see the Cloud Function, main.py, which processes the incoming Protobuf over HTTPS payload from DHT sensor types. Once the sensor data Protobuf message payload is received by the HTTP Function, it is deserialized to JSON and then serialized to a byte string. The byte string is then sent to a Google Cloud Pub/Sub Topic. In my tests, the Protobuf-based HTTP Functions took an average of 7–14 ms to execute successfully.
As before, note the import sensors_pb2
statement. This statement imports the compiled Protocol Buffers file, which is stored locally to the script on the IoT device. It is used to parse a serialized message into its original Protobuf’s SensorDHT
message type.
import logging import os import jwt import sensors_pb2 from flask import make_response, jsonify from flask_api import status from google.cloud import pubsub_v1 from google.protobuf.json_format import MessageToJson TOPIC = os.environ.get('TOPIC') SECRET_KEY = os.getenv('SECRET_KEY') ID = os.getenv('ID') PASSWORD = os.getenv('PASSWORD') def incoming_message(request): if not validate_token(request): return make_response(jsonify({'success': False}), status.HTTP_401_UNAUTHORIZED, {'ContentType': 'application/json'}) data = request.get_data() if not data: return make_response(jsonify({'success': False}), status.HTTP_400_BAD_REQUEST, {'ContentType': 'application/json'}) sensor_pb = sensors_pb2.SensorDHT() sensor_pb.ParseFromString(data) sensor_json = MessageToJson(sensor_pb) send_message(sensor_json) return make_response(jsonify({'success': True}), status.HTTP_201_CREATED, {'ContentType': 'application/json'}) def validate_token(request): auth_header = request.headers.get('Authorization') if not auth_header: return False auth_token = auth_header.split(" ")[1] if not auth_token: return False try: payload = jwt.decode(auth_token, SECRET_KEY) if payload['id'] == ID and payload['password'] == PASSWORD: return True except jwt.ExpiredSignatureError: return False except jwt.InvalidTokenError: return False def send_message(message): publisher = pubsub_v1.PublisherClient() publisher.publish(topic=TOPIC, data=bytes(message, 'utf-8'))
Cloud Pub/Sub Functions
In addition to HTTP Functions, the demonstration uses a function triggered by Google Cloud Pub/Sub Triggers. According to Google, Cloud Functions can be triggered by messages published to Cloud Pub/Sub Topics in the same GCP project as the function. The function automatically subscribes to the Topic. Below, we see that the function has automatically subscribed to iot-data-demo
Cloud Pub/Sub Topic.
Sending Telemetry to MongoDB Atlas
The common Cloud Function, triggered by messages published to Cloud Pub/Sub, then sends the messages to MongoDB Atlas. There is a minimal amount of cleanup required to re-format the Cloud Pub/Sub messages to BSON (binary JSON). Interestingly, according to bsonspec.org, BSON can be compared to binary interchange formats, like Protocol Buffers. BSON is more schema-less than Protocol Buffers, which can give it an advantage in flexibility but also a slight disadvantage in space efficiency (BSON has overhead for field names within the serialized data).
The function uses the PyMongo to connect to MongoDB Atlas. According to their website, PyMongo is a Python distribution containing tools for working with MongoDB and is the recommended way to work with MongoDB from Python.
import base64 import json import logging import os import pymongo MONGODB_CONN = os.environ.get('MONGODB_CONN') MONGODB_DB = os.environ.get('MONGODB_DB') MONGODB_COL = os.environ.get('MONGODB_COL') def read_message(event, context): message = base64.b64decode(event['data']).decode('utf-8') message = message.replace("'", '"') message = message.replace('True', 'true') message = json.loads(message) client = pymongo.MongoClient(MONGODB_CONN) db = client[MONGODB_DB] col = db[MONGODB_COL] col.insert_one(message)
The function responds to the published events and sends the messages to the MongoDB Atlas cluster, running in the same Region, us-central1, as the Cloud Functions and Pub/Sub Topic. Below, we see the current options available when provisioning an Atlas cluster.
MongoDB Atlas provides a rich, web-based UI for managing and monitoring MongoDB clusters, databases, collections, security, and performance.
Although Cloud Pub/Sub to Atlas function execution times are longer in duration than the HTTP functions, the latency is greatly reduced by locating the Cloud Pub/Sub Topic, Cloud Functions, and MongoDB Atlas cluster into the same GCP Region. Cross-region execution times were as high as 500-600 ms, while same-region execution times averaged 200-225 ms. Selecting a more performant Atlas cluster would likely result in even lower function execution times.
Aggregating Data with MongoDB Compass
MongoDB Compass is a free, convenient, desktop application for interacting with your MongoDB databases. You can view the collected sensor data, review message (document) schema, manage indexes, and build complex MongoDB aggregations.
When performing analytics or machine learning, I primarily use MongoDB Compass to preview the captured telemetry data and build aggregation pipelines. Aggregation operations process data records and returns computed results. This feature saves a ton of time, filtering and preparing data for further analysis, visualization, and machine learning with Jupyter Notebooks.
Aggregation pipelines can be directly exported to Java, Node, C#, and Python 3. The exported aggregation pipeline code can be placed directly into your Python applications and Jupyter Notebooks.
Below, the exported aggregation pipelines code from MongoDB Compass is used to load a resultset directly into a Pandas DataFrame. This particular aggregation returns time-series DHT sensor data from a specific IoT device over a 72-hour period.
DEVICE_1 = 'rp59adf374' pipeline = [ { '$match': { 'type': 'DHT22', 'device': DEVICE_1, 'timestamp': { '$gt': 1557619200, '$lt': 1557792000 } } }, { '$project': { '_id': 0, 'timestamp': 1, 'temperature': '$data.temperature', 'humidity': '$data.humidity' } }, { '$sort': { 'timestamp': 1 } } ] aggResult = iot_data.aggregate(pipeline) df1 = pd.DataFrame(list(aggResult))
MongoDB Atlas Performance
In this demonstration, from Python3-based Jupyter Notebooks, I was able to consistently query a MongoDB Atlas collection of almost 70k documents for resultsets containing 3 days (72 hours) worth of digital temperature and humidity data, roughly 10.2k documents, in an average of 825 ms. That is round trip from my local development laptop to MongoDB Atlas running on GCP, in a different geographic region.
Query times on GCP are much faster, such as when running a Notebook in JupyterLab on Google’s AI Platform, or a PySpark job with Cloud Dataproc, against Atlas. Running the same Jupyter Notebook directly on Google’s AI Platform, the same MongoDB Atlas query took an average of 450 ms versus 825 ms (1.83x faster). This was across two different GCP Regions; same Region times should be even faster.
GCP Observability
There are several choices for observing the system’s Google Cloud Functions, Google Cloud Pub/Sub, and MongoDB Atlas. As shown above, the GCP Cloud Functions interface lets you see the individual function executions, execution times, memory usage, and active instances, over varying time intervals.
For a more detailed view of Google Cloud Functions and Google Cloud Pub/Sub, I built two custom dashboards using Stackdriver. According to Google, Stackdriver aggregates metrics, logs, and events from infrastructure, giving developers and operators a rich set of observable signals. I built a custom Stackdriver Cloud Functions dashboard (shown below) and a Cloud Pub/Sub Topics and Subscriptions dashboard.
For functions, I chose to display execution times, memory usage, the number of executions, and network egress, all in a single pane of glass, using four graphs. Below, I am using the 95th percentile average for monitoring. The 95th percentile asserts that 95% of the time, the observed values are below this amount and the remaining 5% of the time, the observed values are above that amount.
Data Analysis using Jupyter Notebooks
According to jupyter.org, the Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. The widespread use of Jupyter Notebooks has grown significantly, as Big Data, AI, and ML have all experienced explosive growth.
PyCharm
JetBrains PyCharm, my favorite Python IDE, has direct integrations with Jupyter Notebooks. In fact, PyCharm’s most recent updates to the Professional Edition greatly enhanced those integrations. PyCharm offers round-trip editing in the IDE and the Jupyter Notebook web browser interface. PyCharm allows you to run and debug individual cells within the notebook. PyCharm automatically starts the Jupyter Server and appropriate kernel for the Notebook you have opened. And, one of my favorite features, PyCharm’s variable viewer tracks the current value of a variable, automatically.
Below, we see the example Analytics Notebook, included in the demonstration’s project, displayed in PyCharm 19.1.2 (Professional Edition). To effectively work with Notebooks in PyCharm really requires a full-size monitor. Working on a laptop with PyCharm’s crowded Notebook UI is workable, but certainly not as effective as on a larger monitor.
Jupyter Notebook Server
Below, we see the same Analytics Notebook, shown above in PyCharm, opened in Jupyter Notebook Server’s web-based client interface, running locally on the development workstation. The web browser-based interface also offers a rich set of features for Notebook development.
From within the Notebook, we are able to query the data from MongoDB Atlas, again using PyMongo, and load the resultsets into Panda DataFrames. As an alternative to hard-coded values and environment variables, with Notebooks, I use the python-dotenv Python package. This package allows me to place my environment variables in a common .env
file and reference them from any Notebook. The package has many options for managing environment variables.
We can then analyze the data using a number of common frameworks, including Pandas, Matplotlib, SciPy, PySpark, and NumPy, to name but a few. Below, we see time series data from four different sensors, on the same IoT device. Viewing the data together, we can study the causal effect of one environment variable on another, such as the impact of light on temperature or humidity.
Below, we can use histograms to visualize temperature frequencies for
intervals, over time, for a given device location.
Machine Learning using Jupyter Notebooks
In addition to data analytics, we can use Jupyter Notebooks with tools such as scikit-learn to build machine learning models based on our sensor telemetry. Scikit-learn is a set of machine learning tools in Python, built on NumPy, SciPy, and matplotlib. Below, I have used JupyterLab on Google’s AI Platform and scikit-learn to build several models, based on the sensor data.
Using scikit-learn, we can build models to predict such things as which IoT device generated a specific temperature and humidity reading, or the temperature and humidity, given the time of day, device location, and external environment variables, or discover anomalies in the sensor telemetry.
Scikit-learn makes it easy to construct randomized training and test datasets, to build models, using data from multiple IoT devices, as shown below.
The project includes a Jupyter Notebook that demonstrates how to build several ML models using sensor data. Examples of supervised learning algorithms used to build the classification models in this demonstration include Support Vector Machine (SVM), k-nearest neighbors (k-NN), and Random Forest Classifier.
Having data from multiple sensors, we are able to enrich the ML models by adding additional categorical (discrete) features to our training data. For example, we could look at the effect of light, motion, and time of day on temperature and humidity.
Conclusion
Hopefully, this post has demonstrated how to efficiently collect telemetry data from IoT devices using Google Protocol Buffers over HTTPS, serverless Google Cloud Functions, Cloud Pub/Sub, and MongoDB Atlas, all on the Google Cloud Platform. Once captured, the telemetry data was easily aggregated and analyzed using common tools, such as MongoDB Compass and Jupyter Notebooks. Further, we used the data and tools to build machine learning models for prediction and anomaly detection.
All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.
Image: everythingpossible © 123RF.com
Integrating Search Capabilities with Actions for Google Assistant, using GKE and Elasticsearch: Part 2
Posted by Gary A. Stafford in Cloud, GCP, Java Development, JavaScript, Serverless, Software Development on September 24, 2018
Voice and text-based conversational interfaces, such as chatbots, have recently seen tremendous growth in popularity. Much of this growth can be attributed to leading Cloud providers, such as Google, Amazon, and Microsoft, who now provide affordable, end-to-end development, machine learning-based training, and hosting platforms for conversational interfaces.
Cloud-based machine learning services greatly improve a conversational interface’s ability to interpret user intent with greater accuracy. However, the ability to return relevant responses to user inquiries, also requires interfaces have access to rich informational datastores, and the ability to quickly and efficiently query and analyze that data.
In this two-part post, we will enhance the capabilities of a voice and text-based conversational interface by integrating it with a search and analytics engine. By interfacing an Action for Google Assistant conversational interface with Elasticsearch, we will improve the Action’s ability to provide relevant results to the end-user. Instead of querying a traditional database for static responses to user intent, our Action will access a Near Real-time (NRT) Elasticsearch index of searchable documents. The Action will leverage Elasticsearch’s advanced search and analytics capabilities to optimize and shape user responses, based on their intent.
Action Preview
Here is a brief YouTube video preview of the final Action for Google Assistant, integrated with Elasticsearch, running on an Apple iPhone.
Architecture
If you recall from part one of this post, the high-level architecture of our search engine-enhanced Action for Google Assistant resembles the following. Most of the components are running on Google Cloud.
Source Code
All open-sourced code for this post can be found on GitHub in two repositories, one for the Spring Boot Service and one for the Action for Google Assistant. Code samples in this post are displayed as GitHub Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
Development Process
In part two of this post, we will tie everything together by creating and integrating our Action for Google Assistant:
- Create the new Actions for Google Assistant project using the Actions on Google console;
- Develop the Action’s Intents and Entities using the Dialogflow console;
- Develop, deploy, and test the Cloud Function to GCP;
Let’s explore each step in more detail.
New ‘Actions on Google’ Project
With Elasticsearch running and the Spring Boot Service deployed to our GKE cluster, we can start building our Actions for Google Assistant. Using the Actions on Google web console, we first create a new Actions project.
The Directory Information tab is where we define metadata about the project. This information determines how it will look in the Actions directory and is required to publish your project. The Actions directory is where users discover published Actions on the web and mobile devices.
The Directory Information tab also includes sample invocations, which may be used to invoke our Actions.
Actions and Intents
Our project will contain a series of related Actions. According to Google, an Action is ‘an interaction you build for the Assistant that supports a specific intent and has a corresponding fulfillment that processes the intent.’ To build our Actions, we first want to create our Intents. To do so, we will want to switch from the Actions on Google console to the Dialogflow console. Actions on Google provides a link for switching to Dialogflow in the Actions tab.
We will build our Action’s Intents in Dialogflow. The term Intent, used by Dialogflow, is standard terminology across other voice-assistant platforms, such as Amazon’s Alexa and Microsoft’s Azure Bot Service and LUIS. In Dialogflow, will be building Intents — the Find Multiple Posts Intent, Find Post Intent, Find By ID Intent, and so forth.
Below, we see the Find Post Intent. The Find Post Intent is responsible for handling our user’s requests for a single post about a topic, for example, ‘Find a post about Docker.’ The Intent shown below contains a fair number, but indeed not an exhaustive list, of training phrases. These represent possible ways a user might express intent when invoking the Action.
Below, we see the Find Multiple Posts Intent. The Find Multiple Posts Intent is responsible for handling our user’s requests for a list of posts about a topic, for example, ‘I’m interested in Docker.’ Similar to the Find Post Intent above, the Find Multiple Posts Intent contains a list of training phrases.
Dialog Model Training
According to Google, the greater the number of natural language examples in the Training Phrases section of Intents, the better the classification accuracy. Every time a user interacts with our Action, the user’s utterances are logged. Using the Training tab in the Dialogflow console, we can train our model by reviewing and approving or correcting how the Action handled the user’s utterances.
Below we see the user’s utterances, part of an interaction with the Action. We have the option to review and approve the Intent that was called to handle the utterance, re-assign it, or delete it. This helps improve our accuracy of our dialog model.
Dialogflow Entities
Each of the highlighted words in the training phrases maps to the facts parameter, which maps to a collection of @topic Entities. Entities represent a list of intents the Action is trained to understand. According to Google, there are three types of entities: ‘system’ (defined by Dialogflow), ‘developer’ (defined by a developer), and ‘user’ (built for each individual end-user in every request) objects. We will be creating ‘developer’ type entities for our Action’s Intents.
Automated Expansion
We do not have to define all possible topics a user might search for, as an entity. By enabling the Allow Automated Expansion option, an Agent will recognize values that have not been explicitly listed in the entity list. Google describes Agents as NLU (Natural Language Understanding) modules.
Entity Synonyms
An entity may contain synonyms. Multiple synonyms are mapped to a single reference value. The reference value is the value passed to the Cloud Function by the Action. For example, take the reference value of ‘GCP.’ The user might ask Google about ‘GCP’. However, the user might also substitute the words ‘Google Cloud’ or ‘Google Cloud Platform.’ Using synonyms, if the user utters any of these three synonymous words or phrase in their intent, the reference value, ‘GCP’, is passed in the request.
But, what if the post contains the phrase, ‘Google Cloud Platform’ more frequently than, or instead of, ‘GCP’? If the acronym, ‘GCP’, is defined as the entity reference value, then it is the value passed to the function, even if you ask for ‘Google Cloud Platform’. In the use case of searching blog posts by topic, entity synonyms are not an effective search strategy.
Elasticsearch Synonyms
A better way to solve for synonyms is by using the synonyms feature of Elasticsearch. Take, for example, the topic of ‘Istio’, Istio is also considered a Service Mesh. If I ask for posts about ‘Service Mesh’, I would like to get back posts that contain the phrase ‘Service Mesh’, but also the word ‘Istio’. To accomplish this, you would define an association between ‘Istio’ and ‘Service Mesh’, as part of the Elasticsearch WordPress posts index.
Searches for ‘Istio’ against that index would return results that contain ‘Istio’ and/or contain ‘Service Mesh’; the reverse is also true. Having created and applied a custom synonyms filter to the index, we see how Elasticsearch responds to an analysis of the natural language style phrase, ‘What is a Service Mesh?’. As shown by the tokens output in Kibana’s Dev Tools Console, Elasticsearch understands that ‘service mesh’ is synonymous with ‘istio’.
If we query the same five fields as our Action, for the topic of ‘service mesh’, we get four hits for posts (indexed documents) that contain ‘service mesh’ and/or ‘istio’.
Actions on Google Integration
Another configuration item in Dialogflow that needs to be completed is the Dialogflow’s Actions on Google integration. This will integrate our Action with Google Assistant. Google currently provides more than fifteen different integrations, including Google Assistant, Slack, Facebook Messanger, Twitter, and Twilio, as shown below.
To configure the Google Assistant integration, choose the Welcome Intent as our Action’s Explicit Invocation intent. Then we designate our other Intents as Implicit Invocation intents. According to Google, this Google Assistant Integration allows our Action to reach users on every device where the Google Assistant is available.
Action Fulfillment
When a user’s intent is received, it is fulfilled by the Action. In the Dialogflow Fulfillment console, we see the Action has two fulfillment options, a Webhook or an inline-editable Cloud Function, edited inline. A Webhook allows us to pass information from a matched intent into a web service and get a result back from the service. Our Action’s Webhook will call our Cloud Function on GCP, using the Cloud Function’s URL endpoint (we’ll get this URL in the next section).
Google Cloud Functions
Our Cloud Function, called by our Action, is written in Node.js. Our function, index.js, is divided into four sections, which are: constants and environment variables, intent handlers, helper functions, and the function’s entry point. The helper functions are part of the Helper module, contained in the helper.js file.
Constants and Environment Variables
The section, in both index.js
and helper.js
, defines the global constants and environment variables used within the function. Values that reference environment variables, such as SEARCH_API_HOSTNAME
are defined in the .env.yaml
file. All environment variables in the .env.yaml
file will be set during the Cloud Function’s deployment, described later in this post. Environment variables were recently released, and are still considered beta functionality (gist).
// author: Gary A. Stafford | |
// site: https://programmaticponderings.com | |
// license: MIT License | |
'use strict'; | |
/* CONSTANTS AND GLOBAL VARIABLES */ | |
const Helper = require('./helper'); | |
let helper = new Helper(); | |
const { | |
dialogflow, | |
Button, | |
Suggestions, | |
BasicCard, | |
SimpleResponse, | |
List | |
} = require('actions-on-google'); | |
const functions = require('firebase-functions'); | |
const app = dialogflow({debug: true}); | |
app.middleware(conv => { | |
conv.hasScreen = | |
conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT'); | |
conv.hasAudioPlayback = | |
conv.surface.capabilities.has('actions.capability.AUDIO_OUTPUT'); | |
}); | |
const SUGGESTION_1 = 'tell me about Docker'; | |
const SUGGESTION_2 = 'help'; | |
const SUGGESTION_3 = 'cancel'; |
The npm module dependencies declared in this section are defined in the dependencies section of the package.json
file. Function dependencies include Actions on Google, Firebase Functions, Winston, and Request (gist).
{ | |
"name": "functionBlogSearchAction", | |
"description": "Programmatic Ponderings Search Action for Google Assistant", | |
"version": "1.0.0", | |
"private": true, | |
"license": "MIT License", | |
"author": "Gary A. Stafford", | |
"engines": { | |
"node": ">=8" | |
}, | |
"scripts": { | |
"deploy": "sh ./deploy-cloud-function.sh" | |
}, | |
"dependencies": { | |
"@google-cloud/logging-winston": "^0.9.0", | |
"actions-on-google": "^2.2.0", | |
"dialogflow": "^0.6.0", | |
"dialogflow-fulfillment": "^0.5.0", | |
"firebase-admin": "^6.0.0", | |
"firebase-functions": "^2.0.2", | |
"request": "^2.88.0", | |
"request-promise-native": "^1.0.5", | |
"winston": "2.4.4" | |
} | |
} |
Intent Handlers
The intent handlers in this section correspond to the intents in the Dialogflow console. Each handler responds with a SimpleResponse, BasicCard, and Suggestion Chip response types, or Simple Response, List, and Suggestion Chip response types. These response types were covered in part one of this post. (gist).
/* INTENT HANDLERS */ | |
app.intent('Welcome Intent', conv => { | |
const WELCOME_TEXT_SHORT = 'What topic are you interested in reading about?'; | |
const WELCOME_TEXT_LONG = `You can say things like: \n` + | |
` _'Find a post about GCP'_ \n` + | |
` _'I'd like to read about Kubernetes'_ \n` + | |
` _'I'm interested in Docker'_`; | |
conv.ask(new SimpleResponse({ | |
speech: WELCOME_TEXT_SHORT, | |
text: WELCOME_TEXT_SHORT, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
text: WELCOME_TEXT_LONG, | |
title: 'Programmatic Ponderings Search', | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Fallback Intent', conv => { | |
const FACTS_LIST = "Kubernetes, Docker, Cloud, DevOps, AWS, Spring, Azure, Messaging, and GCP"; | |
const HELP_TEXT_SHORT = 'Need a little help?'; | |
const HELP_TEXT_LONG = `Some popular topics include: ${FACTS_LIST}.`; | |
conv.ask(new SimpleResponse({ | |
speech: HELP_TEXT_LONG, | |
text: HELP_TEXT_SHORT, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
text: HELP_TEXT_LONG, | |
title: 'Programmatic Ponderings Search Help', | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Find Post Intent', async (conv, {topic}) => { | |
let postTopic = topic.toString(); | |
let posts = await helper.getPostsByTopic(postTopic, 1); | |
if (posts !== undefined && posts.length < 1) { | |
helper.topicNotFound(conv, postTopic); | |
return; | |
} | |
let post = posts[0]; | |
let formattedDate = helper.convertDate(post.post_date); | |
const POST_SPOKEN = `The top result for '${postTopic}' is the post, '${post.post_title}', published ${formattedDate}, with a relevance score of ${post._score.toFixed(2)}`; | |
const POST_TEXT = `Description: ${post.post_excerpt} \nPublished: ${formattedDate} \nScore: ${post._score.toFixed(2)}`; | |
conv.ask(new SimpleResponse({ | |
speech: POST_SPOKEN, | |
text: post.title, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
title: post.post_title, | |
text: POST_TEXT, | |
buttons: new Button({ | |
title: `Read Post`, | |
url: post.guid, | |
}), | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Find Multiple Posts Intent', async (conv, {topic}) => { | |
let postTopic = topic.toString(); | |
let postCount = 6; | |
let posts = await helper.getPostsByTopic(postTopic, postCount); | |
if (posts !== undefined && posts.length < 1) { | |
helper.topicNotFound(conv, postTopic); | |
return; | |
} | |
const POST_SPOKEN = `Here's a list of the top ${posts.length} posts about '${postTopic}'`; | |
conv.ask(new SimpleResponse({ | |
speech: POST_SPOKEN, | |
})); | |
let itemsArray = {}; | |
posts.forEach(function (post) { | |
itemsArray[post.ID] = { | |
title: `Post ID ${post.ID}`, | |
description: `${post.post_title.substring(0,80)}... \nScore: ${post._score.toFixed(2)}`, | |
}; | |
}); | |
if (conv.hasScreen) { | |
conv.ask(new List({ | |
title: 'Top Results', | |
items: itemsArray | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Find By ID Intent', async (conv, {topic}) => { | |
let postId = topic.toString(); | |
let post = await helper.getPostById(postId); | |
if (post === undefined) { | |
helper.postIdNotFound(conv, postId); | |
return; | |
} | |
let formattedDate = helper.convertDate(post.post_date); | |
const POST_SPOKEN = `Okay, I found that post`; | |
const POST_TEXT = `Description: ${post.post_excerpt} \nPublished: ${formattedDate}`; | |
conv.ask(new SimpleResponse({ | |
speech: POST_SPOKEN, | |
text: post.title, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
title: post.post_title, | |
text: POST_TEXT, | |
buttons: new Button({ | |
title: `Read Post`, | |
url: post.guid, | |
}), | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Option Intent', async (conv, params, option) => { | |
let postId = option.toString(); | |
let post = await helper.getPostById(postId); | |
if (post === undefined) { | |
helper.postIdNotFound(conv, postId); | |
return; | |
} | |
let formattedDate = helper.convertDate(post.post_date); | |
const POST_SPOKEN = `Sure, here's that post`; | |
const POST_TEXT = `Description: ${post.post_excerpt} \nPublished: ${formattedDate}`; | |
conv.ask(new SimpleResponse({ | |
speech: POST_SPOKEN, | |
text: post.title, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
title: post.post_title, | |
text: POST_TEXT, | |
buttons: new Button({ | |
title: `Read Post`, | |
url: post.guid, | |
}), | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); |
The Welcome Intent handler handles explicit invocations of our Action. The Fallback Intent handler handles both help requests, as well as cases when Dialogflow is unable to handle the user’s request.
As described above in the Dialogflow section, the Find Post Intent handler is responsible for handling our user’s requests for a single post about a topic. For example, ‘Find a post about Docker’. To fulfill the user request, the Find Post Intent handler, calls the Helper module’s getPostByTopic
function, passing the topic requested and specifying a result set size of one post with the highest relevance score higher than an arbitrary value of 1.0.
Similarly, the Find Multiple Posts Intent handler is responsible for handling our user’s requests for a list of posts about a topic; for example, ‘I’m interested in Docker’. To fulfill the user request, the Find Multiple Posts Intent handler, calls the Helper module’s getPostsByTopic
function, passing the topic requested and specifying a result set size of a maximum of six posts with the highest relevance scores greater than 1.0
The Find By ID Intent handler is responsible for handling our user’s requests for a specific, unique posts ID; for example, ‘Post ID 22141’. To fulfill the user request, the Find By ID Intent handler, calls the Helper module’s getPostById
function, passing the unique Post ID (gist).
/* INTENT HANDLERS */ | |
app.intent('Welcome Intent', conv => { | |
const WELCOME_TEXT_SHORT = 'What topic are you interested in reading about?'; | |
const WELCOME_TEXT_LONG = `You can say things like: \n` + | |
` _'Find a post about GCP'_ \n` + | |
` _'I'd like to read about Kubernetes'_ \n` + | |
` _'I'm interested in Docker'_`; | |
conv.ask(new SimpleResponse({ | |
speech: WELCOME_TEXT_SHORT, | |
text: WELCOME_TEXT_SHORT, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
text: WELCOME_TEXT_LONG, | |
title: 'Programmatic Ponderings Search', | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Fallback Intent', conv => { | |
const FACTS_LIST = "Kubernetes, Docker, Cloud, DevOps, AWS, Spring, Azure, Messaging, and GCP"; | |
const HELP_TEXT_SHORT = 'Need a little help?'; | |
const HELP_TEXT_LONG = `Some popular topics include: ${FACTS_LIST}.`; | |
conv.ask(new SimpleResponse({ | |
speech: HELP_TEXT_LONG, | |
text: HELP_TEXT_SHORT, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
text: HELP_TEXT_LONG, | |
title: 'Programmatic Ponderings Search Help', | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Find Post Intent', async (conv, {topic}) => { | |
let postTopic = topic.toString(); | |
let posts = await helper.getPostsByTopic(postTopic, 1); | |
if (posts !== undefined && posts.length < 1) { | |
helper.topicNotFound(conv, postTopic); | |
return; | |
} | |
let post = posts[0]; | |
let formattedDate = helper.convertDate(post.post_date); | |
const POST_SPOKEN = `The top result for '${postTopic}' is the post, '${post.post_title}', published ${formattedDate}, with a relevance score of ${post._score.toFixed(2)}`; | |
const POST_TEXT = `Description: ${post.post_excerpt} \nPublished: ${formattedDate} \nScore: ${post._score.toFixed(2)}`; | |
conv.ask(new SimpleResponse({ | |
speech: POST_SPOKEN, | |
text: post.title, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
title: post.post_title, | |
text: POST_TEXT, | |
buttons: new Button({ | |
title: `Read Post`, | |
url: post.guid, | |
}), | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Find Multiple Posts Intent', async (conv, {topic}) => { | |
let postTopic = topic.toString(); | |
let postCount = 6; | |
let posts = await helper.getPostsByTopic(postTopic, postCount); | |
if (posts !== undefined && posts.length < 1) { | |
helper.topicNotFound(conv, postTopic); | |
return; | |
} | |
const POST_SPOKEN = `Here's a list of the top ${posts.length} posts about '${postTopic}'`; | |
conv.ask(new SimpleResponse({ | |
speech: POST_SPOKEN, | |
})); | |
let itemsArray = {}; | |
posts.forEach(function (post) { | |
itemsArray[post.ID] = { | |
title: `Post ID ${post.ID}`, | |
description: `${post.post_title.substring(0,80)}... \nScore: ${post._score.toFixed(2)}`, | |
}; | |
}); | |
if (conv.hasScreen) { | |
conv.ask(new List({ | |
title: 'Top Results', | |
items: itemsArray | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Find By ID Intent', async (conv, {topic}) => { | |
let postId = topic.toString(); | |
let post = await helper.getPostById(postId); | |
if (post === undefined) { | |
helper.postIdNotFound(conv, postId); | |
return; | |
} | |
let formattedDate = helper.convertDate(post.post_date); | |
const POST_SPOKEN = `Okay, I found that post`; | |
const POST_TEXT = `Description: ${post.post_excerpt} \nPublished: ${formattedDate}`; | |
conv.ask(new SimpleResponse({ | |
speech: POST_SPOKEN, | |
text: post.title, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
title: post.post_title, | |
text: POST_TEXT, | |
buttons: new Button({ | |
title: `Read Post`, | |
url: post.guid, | |
}), | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Option Intent', async (conv, params, option) => { | |
let postId = option.toString(); | |
let post = await helper.getPostById(postId); | |
if (post === undefined) { | |
helper.postIdNotFound(conv, postId); | |
return; | |
} | |
let formattedDate = helper.convertDate(post.post_date); | |
const POST_SPOKEN = `Sure, here's that post`; | |
const POST_TEXT = `Description: ${post.post_excerpt} \nPublished: ${formattedDate}`; | |
conv.ask(new SimpleResponse({ | |
speech: POST_SPOKEN, | |
text: post.title, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
title: post.post_title, | |
text: POST_TEXT, | |
buttons: new Button({ | |
title: `Read Post`, | |
url: post.guid, | |
}), | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); |
Entry Point
The entry point creates a way to handle the communication with Dialogflow’s fulfillment API (gist).
/* ENTRY POINT */ | |
exports.functionBlogSearchAction = functions.https.onRequest(app); |
Helper Functions
The helper functions are part of the Helper module, contained in the helper.js file. In addition to typical utility functions like formatting dates, there are two functions, which interface with Elasticsearch, via our Spring Boot API, getPostsByTopic
and getPostById
. As described above, the intent handlers call one of these functions to obtain search results from Elasticsearch.
The getPostsByTopic
function handles both the Find Post Intent handler and Find Multiple Posts Intent handler, described above. The only difference in the two calls is the size of the response set, either one result or six results maximum (gist).
// author: Gary A. Stafford | |
// site: https://programmaticponderings.com | |
// license: MIT License | |
'use strict'; | |
/* CONSTANTS AND GLOBAL VARIABLES */ | |
const { | |
dialogflow, | |
BasicCard, | |
SimpleResponse, | |
} = require('actions-on-google'); | |
const app = dialogflow({debug: true}); | |
app.middleware(conv => { | |
conv.hasScreen = | |
conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT'); | |
conv.hasAudioPlayback = | |
conv.surface.capabilities.has('actions.capability.AUDIO_OUTPUT'); | |
}); | |
const SEARCH_API_HOSTNAME = process.env.SEARCH_API_HOSTNAME; | |
const SEARCH_API_PORT = process.env.SEARCH_API_PORT; | |
const SEARCH_API_ENDPOINT = process.env.SEARCH_API_ENDPOINT; | |
const rpn = require('request-promise-native'); | |
const winston = require('winston'); | |
const Logger = winston.Logger; | |
const Console = winston.transports.Console; | |
const {LoggingWinston} = require('@google-cloud/logging-winston'); | |
const loggingWinston = new LoggingWinston(); | |
const logger = new Logger({ | |
level: 'info', // log at 'info' and above | |
transports: [ | |
new Console(), | |
loggingWinston, | |
], | |
}); | |
/* HELPER FUNCTIONS */ | |
module.exports = class Helper { | |
/** | |
* Returns an collection of ElasticsearchPosts objects based on a topic | |
* @param postTopic topic to search for | |
* @param responseSize | |
* @returns {Promise<any>} | |
*/ | |
getPostsByTopic(postTopic, responseSize = 1) { | |
return new Promise((resolve, reject) => { | |
const SEARCH_API_RESOURCE = `dismax-search?value=${postTopic}&start=0&size=${responseSize}&minScore=1`; | |
const SEARCH_API_URL = `http://${SEARCH_API_HOSTNAME}:${SEARCH_API_PORT}/${SEARCH_API_ENDPOINT}/${SEARCH_API_RESOURCE}`; | |
logger.info(`getPostsByTopic API URL: ${SEARCH_API_URL}`); | |
let options = { | |
uri: SEARCH_API_URL, | |
json: true | |
}; | |
rpn(options) | |
.then(function (posts) { | |
posts = posts.ElasticsearchPosts; | |
logger.info(`getPostsByTopic Posts: ${JSON.stringify(posts)}`); | |
resolve(posts); | |
}) | |
.catch(function (err) { | |
logger.error(`Error: ${err}`); | |
reject(err) | |
}); | |
}); | |
} | |
// truncated for brevity | |
}; |
Both functions use the request and request-promise-native npm modules to call the Spring Boot service’s RESTful API over HTTP. However, instead of returning a callback, the request-promise-native module allows us to return a native ES6 Promise. By returning a promise, we can use async/await with our Intent handlers. Using async/await with Promises is a newer way of handling asynchronous operations in Node.js. The asynchronous programming model, using promises, is described in greater detail in my previous post, Building Serverless Actions for Google Assistant with Google Cloud Functions, Cloud Datastore, and Cloud Storage.
ThegetPostById
function handles both the Find By ID Intent handler and Option Intent handler, described above. This function is similar to the getPostsByTopic
function, calling a Spring Boot service’s RESTful API endpoint and passing the Post ID (gist).
// author: Gary A. Stafford | |
// site: https://programmaticponderings.com | |
// license: MIT License | |
// truncated for brevity | |
module.exports = class Helper { | |
/** | |
* Returns a single result based in the Post ID | |
* @param postId ID of the Post to search for | |
* @returns {Promise<any>} | |
*/ | |
getPostById(postId) { | |
return new Promise((resolve, reject) => { | |
const SEARCH_API_RESOURCE = `${postId}`; | |
const SEARCH_API_URL = `http://${SEARCH_API_HOSTNAME}:${SEARCH_API_PORT}/${SEARCH_API_ENDPOINT}/${SEARCH_API_RESOURCE}`; | |
logger.info(`getPostById API URL: ${SEARCH_API_URL}`); | |
let options = { | |
uri: SEARCH_API_URL, | |
json: true | |
}; | |
rpn(options) | |
.then(function (post) { | |
post = post.ElasticsearchPosts; | |
logger.info(`getPostById Post: ${JSON.stringify(post)}`); | |
resolve(post); | |
}) | |
.catch(function (err) { | |
logger.error(`Error: ${err}`); | |
reject(err) | |
}); | |
}); | |
} | |
// truncated for brevity | |
}; |
Cloud Function Deployment
To deploy the Cloud Function to GCP, use the gcloud
CLI with the beta version of the functions deploy command. According to Google, gcloud
is a part of the Google Cloud SDK. You must download and install the SDK on your system and initialize it before you can use gcloud
. Currently, Cloud Functions are only available in four regions. I have included a shell script, deploy-cloud-function.sh
, to make this step easier. It is called using the npm run deploy
function. (gist).
#!/usr/bin/env sh | |
# author: Gary A. Stafford | |
# site: https://programmaticponderings.com | |
# license: MIT License | |
set -ex | |
# Set constants | |
REGION="us-east1" | |
FUNCTION_NAME="<your_function_name>" | |
# Deploy the Google Cloud Function | |
gcloud beta functions deploy ${FUNCTION_NAME} \ | |
--runtime nodejs8 \ | |
--region ${REGION} \ | |
--trigger-http \ | |
--memory 256MB \ | |
--env-vars-file .env.yaml |
The creation or update of the Cloud Function can take up to two minutes. Note the output indicates the environment variables, contained in the .env.yaml
file, have been deployed. The URL endpoint of the function and the function’s entry point are also both output.
If you recall, the URL endpoint of the Cloud Function is required in the Dialogflow Fulfillment tab. The URL can be retrieved from the deployment output (shown above). The Cloud Function is now deployed and will be called by the Action when a user invokes the Action.
What is Deployed
The .gcloudignore
file is created the first time you deploy a new function. Using the the .gcloudignore
file, you limit the files deployed to GCP. For this post, of all the files in the project, only four files, index.js
, helper.js
, package.js
, and the PNG file used in the Action’s responses, need to be deployed. All other project files are ear-marked in the .gcloudignore
file to avoid being deployed.
Simulation Testing and Debugging
With our Action and all its dependencies deployed and configured, we can test the Action using the Simulation console on Actions on Google. According to Google, the Action Simulation console allows us to manually test our Action by simulating a variety of Google-enabled hardware devices and their settings.
Below, in the Simulation console, we see the successful display of our Programmatic Ponderings Search Action for Google Assistant containing the expected Simple Response, List, and Suggestion Chips response types, triggered by a user’s invocation of the Action.
The simulated response indicates that the Google Cloud Function was called, and it responded successfully. That also indicates the Dialogflow-based Action successfully communicated with the Cloud Function, the Cloud Function successfully communicated with the Spring Boot service instances running on Google Kubernetes Engine, and finally, the Spring Boot services successfully communicated with Elasticsearch running on Google Compute Engine.
If we had issues with the testing, the Action Simulation console also contains tabs containing the request and response objects sent to and from the Cloud Function, the audio response, a debug console, any errors, and access to the logs.
Stackdriver Logging
In the log output below, from our Cloud Function, we see our Cloud Function’s activities. These activities including information log entries, which we explicitly defined in our Cloud Function using the winston and @google-cloud/logging-winston npm modules. According to Google, the author of the module, Stackdriver Logging for Winston provides an easy to use, higher-level layer (transport) for working with Stackdriver Logging, compatible with Winston. Developing an effective logging strategy is essential to maintaining and troubleshooting your code in Development, as well as Production.
Conclusion
In this two-part post, we observed how the capabilities of a voice and text-based conversational interface, such as an Action for Google Assistant, may be enhanced through integration with a search and analytics engine, such as Elasticsearch. This post barely scraped the surface of what could be achieved with such an integration. Elasticsearch, as well as other leading Lucene-based search and analytics engines, such as Apache Solr, have tremendous capabilities, which are easily integrated to machine learning-based conversational interfaces, resulting in a more powerful and a more intuitive end-user experience.
All opinions expressed in this post are my own and not necessarily the views of my current or past employers, their clients, or Google.
Integrating Search Capabilities with Actions for Google Assistant, using GKE and Elasticsearch: Part 1
Posted by Gary A. Stafford in Cloud, GCP, Java Development, JavaScript, Serverless, Software Development on September 20, 2018
Voice and text-based conversational interfaces, such as chatbots, have recently seen tremendous growth in popularity. Much of this growth can be attributed to leading Cloud providers, such as Google, Amazon, and Microsoft, who now provide affordable, end-to-end development, machine learning-based training, and hosting platforms for conversational interfaces.
Cloud-based machine learning services greatly improve a conversational interface’s ability to interpret user intent with greater accuracy. However, the ability to return relevant responses to user inquiries, also requires interfaces have access to rich informational datastores, and the ability to quickly and efficiently query and analyze that data.
In this two-part post, we will enhance the capabilities of a voice and text-based conversational interface by integrating it with a search and analytics engine. By interfacing an Action for Google Assistant conversational interface with Elasticsearch, we will improve the Action’s ability to provide relevant results to the end-user. Instead of querying a traditional database for static responses to user intent, our Action will access a Near Realtime (NRT) Elasticsearch index of searchable documents. The Action will leverage Elasticsearch’s advanced search and analytics capabilities to optimize and shape user responses, based on their intent.
Action Preview
Here is a brief YouTube video preview of the final Action for Google Assistant, integrated with Elasticsearch, running on an Apple iPhone.
Google Technologies
The high-level architecture of our search engine-enhanced Action for Google Assistant will look as follows.
Here is a brief overview of the key technologies we will incorporate into our architecture.
Actions on Google
According to Google, Actions on Google is the platform for developers to extend the Google Assistant. Actions on Google is a web-based platform that provides a streamlined user-experience to create, manage, and deploy Actions. We will use the Actions on Google platform to develop our Action in this post.
Dialogflow
According to Google, Dialogflow is an enterprise-grade NLU platform that makes it easy for developers to design and integrate conversational user interfaces into mobile apps, web applications, devices, and bots. Dialogflow is powered by Google’s machine learning for Natural Language Processing (NLP).
Google Cloud Functions
Google Cloud Functions are part of Google’s event-driven, serverless compute platform, part of the Google Cloud Platform (GCP). Google Cloud Functions are analogous to Amazon’s AWS Lambda and Azure Functions. Features include automatic scaling, high availability, fault tolerance, no servers to provision, manage, patch or update, and a payment model based on the function’s execution time.
Google Kubernetes Engine
Kubernetes Engine is a managed, production-ready environment, available on GCP, for deploying containerized applications. According to Google, Kubernetes Engine is a reliable, efficient, and secure way to run Kubernetes clusters in the Cloud.
Elasticsearch
Elasticsearch is a leading, distributed, RESTful search and analytics engine. Elasticsearch is a product of Elastic, the company behind the Elastic Stack, which includes Elasticsearch, Kibana, Beats, Logstash, X-Pack, and Elastic Cloud. Elasticsearch provides a distributed, multitenant-capable, full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is similar to Apache Solr in terms of features and functionality. Both Solr and Elasticsearch is based on Apache Lucene.
Other Technologies
In addition to the major technologies highlighted above, the project also relies on the following:
- Google Container Registry – As an alternative to Docker Hub, we will store the Spring Boot API service’s Docker Image in Google Container Registry, making deployment to GKE a breeze.
- Google Cloud Deployment Manager – Google Cloud Deployment Manager allows users to specify all the resources needed for application in a declarative format using YAML. The Elastic Stack will be deployed with Deployment Manager.
- Google Compute Engine – Google Compute Engine delivers scalable, high-performance virtual machines (VMs) running in Google’s data centers, on their worldwide fiber network.
- Google Stackdriver – Stackdriver aggregates metrics, logs, and events from our Cloud-based project infrastructure, for troubleshooting. We are also integrating Stackdriver Logging for Winston into our Cloud Function for fast application feedback.
- Google Cloud DNS – Hosts the primary project domain and subdomains for the search engine and API. Google Cloud DNS is a scalable, reliable and managed authoritative Domain Name System (DNS) service running on the same infrastructure as Google.
- Google VPC Network Firewall – Firewall rules provide fine-grain, secure access controls to our API and search engine. We will several firewall port openings to talk to the Elastic Stack.
- Spring Boot – Pivotal’s Spring Boot project makes it easy to create stand-alone, production-grade Spring-based Java applications, such as our Spring Boot service.
- Spring Data Elasticsearch – Pivotal Software’s Spring Data Elasticsearch project provides easy integration to Elasticsearch from our Java-based Spring Boot service.
Demonstration
To demonstrate an Action for Google Assistant with search engine integration, we need an index of content to search. In this post, we will build an informational Action, the Programmatic Ponderings Search Action, that responds to a user’s interests in certain technical topics, by returning post suggestions from the Programmatic Ponderings blog. For this demonstration, I have indexed the last two years worth of blog posts into Elasticsearch, using the ElasticPress WordPress plugin.
Source Code
All open-sourced code for this post can be found on GitHub in two repositories, one for the Spring Boot Service and one for the Action for Google Assistant. Code samples in this post are displayed as GitHub Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
Development Process
This post will focus on the development and integration of the Action for Google Assistant with Elasticsearch, via a Google Cloud Function, Kubernetes Engine, and the Spring Boot API service. The post is not intended to be a general how-to on developing for Actions for Google Assistant, Google Cloud Platform, Elasticsearch, or WordPress.
Building and integrating the Action will involve the following steps:
- Design the Action’s conversation model;
- Provision the Elastic Stack on Google Compute Engine using Deployment Manager;
- Create an Elasticsearch index of blog posts;
- Provision the Kubernetes cluster on GCP with GKE;
- Develop and deploy the Spring Boot API service to Kubernetes;
Covered in Part Two of the Post:
- Create a new Actions project using the Actions on Google;
- Develop the Action’s Intents using the Dialogflow;
- Develop, deploy, and test the Cloud Function to GCP;
Let’s explore each step in more detail.
Conversational Model
The conversational model design of the Programmatic Ponderings Search Action for Google Assistant will have the option to invoke the Action in two ways, with or without intent. Below on the left, we see an example of an invocation of the Action – ‘Talk to Programmatic Ponderings’. Google Assistant then responds to the user for more information (intent) – ‘What topic are you interested in reading about?’.
Below on the left, we see an invocation of the Action, which includes the intent – ‘Ask Programmatic Ponderings to find a post about Kubernetes’. Google Assistant will respond directly, both verbally and visually with the most relevant post.
When a user requests a single result, for example, ‘Find a post about Docker’, Google Assistant will include Simple Response, Basic Card, and Suggestion Chip response types for devices with a display. This is shown in the center, above. The user may continue to ask for additional facts or choose to cancel the Action at any time.
When a user requests multiple results, for example, ‘I’m interested in Docker’, Google Assistant will include Simple Response, List, and Suggestion Chip response types for devices with a display. An example of a List Response is shown in the center of the previous set of screengrabs, above. The user will receive up to six results in the list, with a relevance score of 1.0 or greater. The user may choose to click on any of the post results in the list, which will initiate a new search using the post’s unique ID, as shown on the right, in the first set of screengrabs, above.
The conversational model also understands a request for help and to cancel the interaction.
GCP Account and Project
The following steps assume you have an existing GCP account and you have created a project on GCP to house the Cloud Function, GKE Cluster, and Elastic Stack on Google Compute Engine. The post also assumes that you have the latest Google Cloud SDK installed on your development machine, and have authenticated your identity from the command line (gist).
# Authenticate with the Google Cloud SDK | |
export PROJECT_ID="<your_project_id>" | |
gcloud beta auth login | |
gcloud config set project ${PROJECT_ID} | |
# Update components or new runtime nodejs8 may be unknown | |
gcloud components update |
Elasticsearch on GCP
There are a number of options available to host Elasticsearch. Elastic, the company behind Elasticsearch, offers the Elasticsearch Service, a fully managed, scalable, and reliable service on AWS and GCP. AWS also offers their own managed Elasticsearch Service. I found some limitations with AWS’ Elasticsearch Service, which made integration with Spring Data Elasticsearch difficult. According to AWS, the service supports HTTP but does not support TCP transport.
For this post, we will stand up the Elastic Stack on GCP using an offering from the Google Cloud Platform Marketplace. A well-known provider of packaged applications for multiple Cloud platforms, Bitnami, offers the ELK Stack (the previous name for the Elastic Stack), running on Google Compute Engine.
GCP Marketplace Solutions are deployed using the Google Cloud Deployment Manager. The Bitnami ELK solution is a complete stack with all the necessary software and software-defined Cloud infrastructure to securely run Elasticsearch. You select the instance’s zone(s), machine type, boot disk size, and security and networking configurations. Using that configuration, the Deployment Manager will deploy the solution and provide you with information and credentials for accessing the Elastic Stack. For this demo, we will configure a minimally-sized, single VM instance to run the Elastic Stack.
Below we see the Bitnami ELK stack’s components being created on GCP, by the Deployment Manager.
Indexed Content
With the Elastic Stack fully provisioned, I then configured WordPress to index the last two years of the Programmatic Pondering blog posts to Elasticsearch on GCP. If you want to follow along with this post and content to index, there is plenty of open source and public domain indexable content available on the Internet – books, movie lists, government and weather data, online catalogs of products, and so forth. Anything in a document database is directly indexable in Elasticsearch. Elastic even provides a set of index samples, available on their GitHub site.
Firewall Ports for Elasticseach
The Deployment Manager opens up firewall ports 80 and 443. To index the WordPress posts, I also had to open port 9200. According to Elastic, Elasticsearch uses port 9200 for communicating with their RESTful API with JSON over HTTP. For security, I locked down this firewall opening to my WordPress server’s address as the source. (gist).
SOURCE_IP=<wordpress_ip_address> | |
PORT=9200 | |
gcloud compute \ | |
--project=wp-search-bot \ | |
firewall-rules create elk-1-tcp-${PORT} \ | |
--description=elk-1-tcp-${PORT} \ | |
--direction=INGRESS \ | |
--priority=1000 \ | |
--network=default \ | |
--action=ALLOW \ | |
--rules=tcp:${PORT} \ | |
--source-ranges=${SOURCE_IP} \ | |
--target-tags=elk-1-tcp-${PORT} |
The two existing firewall rules for port opening 80 and 443 should also be locked down to your own IP address as the source. Common Elasticsearch ports are constantly scanned by Hackers, who will quickly hijack your Elasticsearch contents and hold them for ransom, in addition to deleting your indexes. Similar tactics are used on well-known and unprotected ports for many platforms, including Redis, MySQL, PostgreSQL, MongoDB, and Microsoft SQL Server.
Kibana
Once the posts are indexed, the best way to view the resulting Elasticsearch documents is through Kibana, which is included as part of the Bitnami solution. Below we see approximately thirty posts, spread out across two years.
Each Elasticsearch document, representing an indexed WordPress blog post, contains over 125 fields of information. Fields include a unique post ID, post title, content, publish date, excerpt, author, URL, and so forth. All these fields are exposed through Elasticsearch’s API, and as we will see, will be available to our Spring Boot service to query.
Spring Boot Service
To ensure decoupling between the Action for Google Assistant and Elasticsearch, we will expose a RESTful search API, written in Java using Spring Boot and Spring Data Elasticsearch. The API will expose a tailored set of flexible endpoints to the Action. Google’s machine learning services will ensure our conversational model is trained to understand user intent. The API’s query algorithm and Elasticsearch’s rich Lucene-based search features will ensure the most relevant results are returned. We will host the Spring Boot service on Google Kubernetes Engine (GKE).
Will use a Spring Rest Controller to expose our RESTful web service’s resources to our Action’s Cloud Function. The current Spring Boot service contains five /elastic
resource endpoints exposed by the ElasticsearchPostController
class . Of those five, two endpoints will be called by our Action in this demo, the /{id}
and the /dismax-search
endpoints. The endpoints can be seen using the Swagger UI. Our Spring Boot service implements SpringFox, which has the option to expose the Swagger interactive API UI.
The /{id}
endpoint accepts a unique post ID as a path variable in the API call and returns a single ElasticsearchPost object wrapped in a Map object, and serialized to a JSON payload (gist).
@RequestMapping(value = "/{id}") | |
@ApiOperation(value = "Returns a post by id") | |
public Map<String, Optional<ElasticsearchPost>> findById(@PathVariable("id") long id) { | |
Optional<ElasticsearchPost> elasticsearchPost = elasticsearchPostRepository.findById(id); | |
Map<String, Optional<ElasticsearchPost>> elasticsearchPostMap = new HashMap<>(); | |
elasticsearchPostMap.put("ElasticsearchPosts", elasticsearchPost); | |
return elasticsearchPostMap; | |
} |
Below we see an example response from the Spring Boot service to an API call to the /{id}
endpoint, for post ID 22141. Since we are returning a single post, based on ID, the relevance score will always be 0.0 (gist).
# http http://api.chatbotzlabs.com/blog/api/v1/elastic/22141 | |
HTTP/1.1 200 | |
Content-Type: application/json;charset=UTF-8 | |
Date: Mon, 17 Sep 2018 23:15:01 GMT | |
Transfer-Encoding: chunked | |
{ | |
"ElasticsearchPosts": { | |
"ID": 22141, | |
"_score": 0.0, | |
"guid": "https://programmaticponderings.com/?p=22141", | |
"post_date": "2018-04-13 12:45:19", | |
"post_excerpt": "Learn to manage distributed applications, spanning multiple Kubernetes environments, using Istio on GKE.", | |
"post_title": "Managing Applications Across Multiple Kubernetes Environments with Istio: Part 1" | |
} | |
} |
This controller’s /{id}
endpoint relies on a method exposed by the ElasticsearchPostRepository
interface. The ElasticsearchPostRepository
is a Spring Data Repository , which extends ElasticsearchRepository. The repository exposes the findById()
method, which returns a single instance of the type, ElasticsearchPost
, from Elasticsearch (gist).
package com.example.elasticsearch.repository; | |
import com.example.elasticsearch.model.ElasticsearchPost; | |
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository; | |
public interface ElasticsearchPostRepository extends ElasticsearchRepository<ElasticsearchPost, Long> { | |
} |
The ElasticsearchPost
class is annotated as an Elasticsearch Document
, similar to other Spring Data Document
annotations, such as Spring Data MongoDB. The ElasticsearchPost
class is instantiated to hold deserialized JSON documents stored in ElasticSeach stores indexed data (gist).
package com.example.elasticsearch.model; | |
import com.fasterxml.jackson.annotation.JsonIgnoreProperties; | |
import com.fasterxml.jackson.annotation.JsonProperty; | |
import org.springframework.data.annotation.Id; | |
import org.springframework.data.elasticsearch.annotations.Document; | |
import java.io.Serializable; | |
@JsonIgnoreProperties(ignoreUnknown = true) | |
@Document(indexName = "<elasticsearch_index_name>", type = "post") | |
public class ElasticsearchPost implements Serializable { | |
@Id | |
@JsonProperty("ID") | |
private long id; | |
@JsonProperty("_score") | |
private float score; | |
@JsonProperty("post_title") | |
private String title; | |
@JsonProperty("post_date") | |
private String publishDate; | |
@JsonProperty("post_excerpt") | |
private String excerpt; | |
@JsonProperty("guid") | |
private String url; | |
// Setters removed for brevity... | |
} |
Dis Max Query
The second API endpoint called by our Action is the /dismax-search
endpoint. We use this endpoint to search for a particular post topic, such as ’Docker’. This type of search, as opposed to the Spring Data Repository method used by the /{id}
endpoint, requires the use of an ElasticsearchTemplate. The ElasticsearchTemplate
allows us to form more complex Elasticsearch queries than is possible using an ElasticsearchRepository
class. Below, the /dismax-search
endpoint accepts four input request parameters in the API call, which are the topic to search for, the starting point and size of the response to return, and the minimum relevance score (gist).
@RequestMapping(value = "/dismax-search") | |
@ApiOperation(value = "Performs dismax search and returns a list of posts containing the value input") | |
public Map<String, List<ElasticsearchPost>> dismaxSearch(@RequestParam("value") String value, | |
@RequestParam("start") int start, | |
@RequestParam("size") int size, | |
@RequestParam("minScore") float minScore) { | |
List<ElasticsearchPost> elasticsearchPosts = elasticsearchService.dismaxSearch(value, start, size, minScore); | |
Map<String, List<ElasticsearchPost>> elasticsearchPostMap = new HashMap<>(); | |
elasticsearchPostMap.put("ElasticsearchPosts", elasticsearchPosts); | |
return elasticsearchPostMap; | |
} |
The logic to create and execute the ElasticsearchTemplate is
handled by the ElasticsearchService
class. The ElasticsearchPostController
calls the ElasticsearchService
. The ElasticsearchService
handles querying Elasticsearch and returning a list of ElasticsearchPost
objects to the ElasticsearchPostController
. The dismaxSearch
method, called by the /dismax-search
endpoint’s method constructs the ElasticsearchTemplate instance, used to build the request to Elasticsearch’s RESTful API (gist).
public List<ElasticsearchPost> dismaxSearch(String value, int start, int size, float minScore) { | |
QueryBuilder queryBuilder = getQueryBuilder(value); | |
Client client = elasticsearchTemplate.getClient(); | |
SearchResponse response = client.prepareSearch() | |
.setQuery(queryBuilder) | |
.setSize(size) | |
.setFrom(start) | |
.setMinScore(minScore) | |
.addSort("_score", SortOrder.DESC) | |
.setExplain(true) | |
.execute() | |
.actionGet(); | |
List<SearchHit> searchHits = Arrays.asList(response.getHits().getHits()); | |
ObjectMapper mapper = new ObjectMapper(); | |
List<ElasticsearchPost> elasticsearchPosts = new ArrayList<>(); | |
searchHits.forEach(hit -> { | |
try { | |
elasticsearchPosts.add(mapper.readValue(hit.getSourceAsString(), ElasticsearchPost.class)); | |
elasticsearchPosts.get(elasticsearchPosts.size() - 1).setScore(hit.getScore()); | |
} catch (IOException e) { | |
e.printStackTrace(); | |
} | |
}); | |
return elasticsearchPosts; | |
} |
To obtain the most relevant search results, we will use Elasticsearch’s Dis Max Query combined with the Match Phrase Query. Elastic describes the Dis Max Query as:
‘a query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.’
In short, the Dis Max Query allows us to query and weight (boost importance) multiple indexed fields, across all documents. The Match Phrase Query analyzes the text (our topic) and creates a phrase query out of the analyzed text.
After some experimentation, I found the valid search results were returned by applying greater weighting (boost) to the post’s title and excerpt, followed by the post’s tags and categories, and finally, the actual text of the post. I also limited results to a minimum score of 1.0. Just because a word or phrase is repeated in a post, doesn’t mean it is indicative of the post’s subject matter. Setting a minimum score attempts to help ensure the requested topic is featured more prominently in the resulting post or posts. Increasing the minimum score will decrease the number of search results, but theoretically, increase their relevance (gist).
private QueryBuilder getQueryBuilder(String value) { | |
value = value.toLowerCase(); | |
return QueryBuilders.disMaxQuery() | |
.add(matchPhraseQuery("post_title", value).boost(3)) | |
.add(matchPhraseQuery("post_excerpt", value).boost(3)) | |
.add(matchPhraseQuery("terms.post_tag.name", value).boost(2)) | |
.add(matchPhraseQuery("terms.category.name", value).boost(2)) | |
.add(matchPhraseQuery("post_content", value).boost(1)); | |
} |
Below we see the results of a /dismax-search
API call to our service, querying for posts about the topic, ’Istio’, with a minimum score of 2.0. The search resulted in a serialized JSON payload containing three ElasticsearchPost
objects (gist).
http http://api.chatbotzlabs.com/blog/api/v1/elastic/dismax-search?minScore=2&size=3&start=0&value=Istio | |
HTTP/1.1 200 | |
Content-Type: application/json;charset=UTF-8 | |
Date: Tue, 18 Sep 2018 03:50:35 GMT | |
Transfer-Encoding: chunked | |
{ | |
"ElasticsearchPosts": [ | |
{ | |
"ID": 21867, | |
"_score": 5.91989, | |
"guid": "https://programmaticponderings.com/?p=21867", | |
"post_date": "2017-12-22 16:04:17", | |
"post_excerpt": "Learn to deploy and configure Istio on Google Kubernetes Engine (GKE).", | |
"post_title": "Deploying and Configuring Istio on Google Kubernetes Engine (GKE)" | |
}, | |
{ | |
"ID": 22313, | |
"_score": 3.6616292, | |
"guid": "https://programmaticponderings.com/?p=22313", | |
"post_date": "2018-04-17 07:01:38", | |
"post_excerpt": "Learn to manage distributed applications, spanning multiple Kubernetes environments, using Istio on GKE.", | |
"post_title": "Managing Applications Across Multiple Kubernetes Environments with Istio: Part 2" | |
}, | |
{ | |
"ID": 22141, | |
"_score": 3.6616292, | |
"guid": "https://programmaticponderings.com/?p=22141", | |
"post_date": "2018-04-13 12:45:19", | |
"post_excerpt": "Learn to manage distributed applications, spanning multiple Kubernetes environments, using Istio on GKE.", | |
"post_title": "Managing Applications Across Multiple Kubernetes Environments with Istio: Part 1" | |
} | |
] | |
} |
Understanding Relevance Scoring
When returning search results, such as in the example above, the top result is the one with the highest score. The highest score should denote the most relevant result to the search query. According to Elastic, in their document titled, The Theory Behind Relevance Scoring, scoring is explained this way:
‘Lucene (and thus Elasticsearch) uses the Boolean model to find matching documents, and a formula called the practical scoring function to calculate relevance. This formula borrows concepts from term frequency/inverse document frequency and the vector space model but adds more-modern features like a coordination factor, field length normalization, and term or query clause boosting.’
In order to better understand this technical explanation of relevance scoring, it is much easy to see it applied to our example. Note the first search result above, Post ID 21867, has the highest score, 5.91989. Knowing that we are searching five fields (title, excerpt, tags, categories, and content), and boosting certain fields more than others, how was this score determined? Conveniently, Spring Data Elasticsearch’s SearchRequestBuilder
class exposed the setExplain
method. We can see this on line 12 of the dimaxQuery
method, shown above. By passing a boolean value of true
to the setExplain
method, we are able to see the detailed scoring algorithms used by Elasticsearch for the top result, shown above (gist).
5.9198895 = max of: | |
5.8995476 = weight(post_title:istio in 3) [PerFieldSimilarity], result of: | |
5.8995476 = score(doc=3,freq=1.0 = termFreq=1.0), product of: | |
3.0 = boost | |
1.6739764 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: | |
1.0 = docFreq | |
7.0 = docCount | |
1.1747572 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: | |
1.0 = termFreq=1.0 | |
1.2 = parameter k1 | |
0.75 = parameter b | |
11.0 = avgFieldLength | |
7.0 = fieldLength | |
5.9198895 = weight(post_excerpt:istio in 3) [PerFieldSimilarity], result of: | |
5.9198895 = score(doc=3,freq=1.0 = termFreq=1.0), product of: | |
3.0 = boost | |
1.6739764 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: | |
1.0 = docFreq | |
7.0 = docCount | |
1.1788079 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: | |
1.0 = termFreq=1.0 | |
1.2 = parameter k1 | |
0.75 = parameter b | |
12.714286 = avgFieldLength | |
8.0 = fieldLength | |
3.3479528 = weight(terms.post_tag.name:istio in 3) [PerFieldSimilarity], result of: | |
3.3479528 = score(doc=3,freq=1.0 = termFreq=1.0), product of: | |
2.0 = boost | |
1.6739764 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: | |
1.0 = docFreq | |
7.0 = docCount | |
1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: | |
1.0 = termFreq=1.0 | |
1.2 = parameter k1 | |
0.75 = parameter b | |
16.0 = avgFieldLength | |
16.0 = fieldLength | |
2.52272 = weight(post_content:istio in 3) [PerFieldSimilarity], result of: | |
2.52272 = score(doc=3,freq=100.0 = termFreq=100.0), product of: | |
1.1631508 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: | |
2.0 = docFreq | |
7.0 = docCount | |
2.1688676 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: | |
100.0 = termFreq=100.0 | |
1.2 = parameter k1 | |
0.75 = parameter b | |
2251.1428 = avgFieldLength | |
2840.0 = fieldLength |
What this detail shows us is that of the five fields searched, the term ‘Istio’ was located in four of the five fields (all except ‘categories’). Using the practical scoring function described by Elasticsearch, and taking into account our boost values, we see that the post’s ‘excerpt’ field achieved the highest score of 5.9198895 (score of 1.6739764 * boost of 3.0).
Being able to view the scoring explanation helps us tune our search results. For example, according to the details, the term ‘Istio’ appeared 100 times (termFreq=100.0
) in the main body of the post (the ‘content’ field). We might ask ourselves if we are giving enough relevance to the content as opposed to other fields. We might choose to increase the boost or decrease other fields with respect to the ‘content’ field, to produce higher quality search results.
Google Kubernetes Engine
With the Elastic Stack running on Google Compute Engine, and the Spring Boot API service built, we can now provision a Kubernetes cluster to run our Spring Boot service. The service will sit between our Action’s Cloud Function and Elasticsearch. We will use Google Kubernetes Engine (GKE) to manage our Kubernete cluster on GCP. A GKE cluster is a managed group of uniform VM instances for running Kubernetes. The VMs are managed by Google Compute Engine. Google Compute Engine delivers virtual machines running in Google’s data centers, on their worldwide fiber network.
A GKE cluster can be provisioned using GCP’s Cloud Console or using the Cloud SDK, Google’s command-line interface for Google Cloud Platform products and services. I prefer using the CLI, which helps enable DevOps automation through tools like Jenkins and Travis CI (gist).
GCP_PROJECT="wp-search-bot" | |
GKE_CLUSTER="wp-search-cluster" | |
GCP_ZONE="us-east1-b" | |
NODE_COUNT="1" | |
INSTANCE_TYPE="n1-standard-1" | |
GKE_VERSION="1.10.7-gke.1" | |
gcloud beta container \ | |
--project ${GCP_PROJECT} clusters create ${GKE_CLUSTER} \ | |
--zone ${GCP_ZONE} \ | |
--username "admin" \ | |
--cluster-version ${GKE_VERION} \ | |
--machine-type ${INSTANCE_TYPE} --image-type "COS" \ | |
--disk-type "pd-standard" --disk-size "100" \ | |
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \ | |
--num-nodes ${NODE_COUNT} \ | |
--enable-cloud-logging --enable-cloud-monitoring \ | |
--network "projects/wp-search-bot/global/networks/default" \ | |
--subnetwork "projects/wp-search-bot/regions/us-east1/subnetworks/default" \ | |
--additional-zones "us-east1-b","us-east1-c","us-east1-d" \ | |
--addons HorizontalPodAutoscaling,HttpLoadBalancing \ | |
--no-enable-autoupgrade --enable-autorepair |
Below is the command I used to provision a minimally sized three-node GKE cluster, replete with the latest available version of Kubernetes. Although a one-node cluster is sufficient for early-stage development, testing should be done on a multi-node cluster to ensure the service will operate properly with multiple instances running behind a load-balancer (gist).
GCP_PROJECT="wp-search-bot" | |
GKE_CLUSTER="wp-search-cluster" | |
GCP_ZONE="us-east1-b" | |
NODE_COUNT="1" | |
INSTANCE_TYPE="n1-standard-1" | |
GKE_VERSION="1.10.7-gke.1" | |
gcloud beta container \ | |
--project ${GCP_PROJECT} clusters create ${GKE_CLUSTER} \ | |
--zone ${GCP_ZONE} \ | |
--username "admin" \ | |
--cluster-version ${GKE_VERION} \ | |
--machine-type ${INSTANCE_TYPE} --image-type "COS" \ | |
--disk-type "pd-standard" --disk-size "100" \ | |
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \ | |
--num-nodes ${NODE_COUNT} \ | |
--enable-cloud-logging --enable-cloud-monitoring \ | |
--network "projects/wp-search-bot/global/networks/default" \ | |
--subnetwork "projects/wp-search-bot/regions/us-east1/subnetworks/default" \ | |
--additional-zones "us-east1-b","us-east1-c","us-east1-d" \ | |
--addons HorizontalPodAutoscaling,HttpLoadBalancing \ | |
--no-enable-autoupgrade --enable-autorepair |
Below, we see the three n1-standard-1
instance type worker nodes, one in each of three different specific geographical locations, referred to as zones. The three zones are in the us-east1
region. Multiple instances spread across multiple zones provide single-region high-availability for our Spring Boot service. With GKE, the Master Node is fully managed by Google.
Building Service Image
In order to deploy our Spring Boot service, we must first build a Docker Image and make that image available to our Kubernetes cluster. For lowest latency, I’ve chosen to build and publish the image to Google Container Registry, in addition to Docker Hub. The Spring Boot service’s Docker image is built on the latest Debian-based OpenJDK 10 Slim base image, available on Docker Hub. The Spring Boot JAR file is copied into the image (gist).
FROM openjdk:10.0.2-13-jdk-slim | |
LABEL maintainer="Gary A. Stafford <garystafford@rochester.rr.com>" | |
ENV REFRESHED_AT 2018-09-08 | |
EXPOSE 8080 | |
WORKDIR /tmp | |
COPY /build/libs/*.jar app.jar | |
CMD ["java", "-jar", "-Djava.security.egd=file:/dev/./urandom", "-Dspring.profiles.active=gcp", "app.jar"] |
To automate the build and publish processes with tools such as Jenkins or Travis CI, we will use a simple shell script. The script builds the Spring Boot service using Gradle, then builds the Docker Image containing the Spring Boot JAR file, tags and publishes the Docker image to the image repository, and finally, redeploys the Spring Boot service container to GKE using kubectl (gist).
#!/usr/bin/env sh | |
# author: Gary A. Stafford | |
# site: https://programmaticponderings.com | |
# license: MIT License | |
IMAGE_REPOSITORY=<your_image_repo> | |
IMAGE_NAME=<your_image_name> | |
GCP_PROJECT=<your_project> | |
TAG=<your_image_tag> | |
# Build Spring Boot app | |
./gradlew clean build | |
# Build Docker file | |
docker build -f Docker/Dockerfile --no-cache -t ${IMAGE_REPOSITORY}/${IMAGE_NAME}:${TAG} . | |
# Push image to Docker Hub | |
docker push ${IMAGE_REPOSITORY}/${IMAGE_NAME}:${TAG} | |
# Push image to GCP Container Registry (GCR) | |
docker tag ${IMAGE_REPOSITORY}/${IMAGE_NAME}:${TAG} gcr.io/${GCP_PROJECT}/${IMAGE_NAME}:${TAG} | |
docker push gcr.io/${GCP_PROJECT}/${IMAGE_NAME}:${TAG} | |
# Re-deploy Workload (containerized app) to GKE | |
kubectl replace --force -f gke/${IMAGE_NAME}.yaml |
Below we see the latest version of our Spring Boot Docker image published to the Google Cloud Registry.
Deploying the Service
To deploy the Spring Boot service’s container to GKE, we will use a Kubernetes Deployment Controller. The Deployment Controller manages the Pods and ReplicaSets. As a deployment alternative, you could choose to use CoreOS’ Operator Framework to create an Operator or use Helm to create a Helm Chart. Along with the Deployment Controller, there is a ConfigMap and a Horizontal Pod Autoscaler. The ConfigMap contains environment variables that will be available to the Spring Boot service instances running in the Kubernetes Pods. Variables include the host and port of the Elasticsearch cluster on GCP and the name of the Elasticsearch index created by WordPress. These values will override any configuration values set in the service’s application.yml
Java properties file.
The Deployment Controller creates a ReplicaSet with three Pods, running the Spring Boot service, one on each worker node (gist).
--- | |
apiVersion: "v1" | |
kind: "ConfigMap" | |
metadata: | |
name: "wp-es-demo-config" | |
namespace: "dev" | |
labels: | |
app: "wp-es-demo" | |
data: | |
cluster_nodes: "<your_elasticsearch_instance_tcp_host_and_port>" | |
cluser_name: "elasticsearch" | |
--- | |
apiVersion: "extensions/v1beta1" | |
kind: "Deployment" | |
metadata: | |
name: "wp-es-demo" | |
namespace: "dev" | |
labels: | |
app: "wp-es-demo" | |
spec: | |
replicas: 3 | |
selector: | |
matchLabels: | |
app: "wp-es-demo" | |
template: | |
metadata: | |
labels: | |
app: "wp-es-demo" | |
spec: | |
containers: | |
- name: "wp-es-demo" | |
image: "gcr.io/wp-search-bot/wp-es-demo" | |
imagePullPolicy: Always | |
env: | |
- name: "SPRING_DATA_ELASTICSEARCH_CLUSTER-NODES" | |
valueFrom: | |
configMapKeyRef: | |
key: "cluster_nodes" | |
name: "wp-es-demo-config" | |
- name: "SPRING_DATA_ELASTICSEARCH_CLUSTER-NAME" | |
valueFrom: | |
configMapKeyRef: | |
key: "cluser_name" | |
name: "wp-es-demo-config" | |
--- | |
apiVersion: "autoscaling/v1" | |
kind: "HorizontalPodAutoscaler" | |
metadata: | |
name: "wp-es-demo-hpa" | |
namespace: "dev" | |
labels: | |
app: "wp-es-demo" | |
spec: | |
scaleTargetRef: | |
kind: "Deployment" | |
name: "wp-es-demo" | |
apiVersion: "apps/v1beta1" | |
minReplicas: 1 | |
maxReplicas: 3 | |
targetCPUUtilizationPercentage: 80 |
To properly load-balance the three Spring Boot service Pods, we will also deploy a Kubernetes Service of the Kubernetes ServiceType, LoadBalancer. According to Kubernetes, a Kubernetes Service is an abstraction which defines a logical set of Pods and a policy by which to access them (gist).
--- | |
apiVersion: "v1" | |
kind: "Service" | |
metadata: | |
name: "wp-es-demo-service" | |
namespace: "dev" | |
labels: | |
app: "wp-es-demo" | |
spec: | |
ports: | |
- protocol: "TCP" | |
port: 80 | |
targetPort: 8080 | |
selector: | |
app: "wp-es-demo" | |
type: "LoadBalancer" | |
loadBalancerIP: "" |
Below, we see three instances of the Spring Boot service deployed to the GKE cluster on GCP. Each Pod, containing an instance of the Spring Boot service, is in a load-balanced pool, behind our service load balancer, and exposed on port 80.
Testing the API
We can test our API and ensure it is talking to Elasticsearch, and returning expected results using the Swagger UI, shown previously, or tools like Postman, shown below.
Communication Between GKE and Elasticsearch
Similar to port 9200, which needed to be opened for indexing content over HTTP, we also need to open firewall port 9300 between the Spring Boot service on GKE and Elasticsearch. According to Elastic, Elasticsearch Java clients talk to the Elasticsearch cluster over port 9300, using the native Elasticsearch transport protocol (TCP).
Again, locking this port down to the GKE cluster as the source is critical for security (gist).
SOURCE_IP=<gke_cluster_public_ip_address> | |
PORT=9300 | |
gcloud compute \ | |
--project=wp-search-bot \ | |
firewall-rules create elk-1-tcp-${PORT} \ | |
--description=elk-1-tcp-${PORT} \ | |
--direction=INGRESS \ | |
--priority=1000 \ | |
--network=default \ | |
--action=ALLOW \ | |
--rules=tcp:${PORT} \ | |
--source-ranges=${SOURCE_IP} \ | |
--target-tags=elk-1-tcp-${PORT} |
Part Two
In part one we have examined the creation of the Elastic Stack, the provisioning of the GKE cluster, and the development and deployment of the Spring Boot service to Kubernetes. In part two of this post, we will tie everything together by creating and integrating our Action for Google Assistant:
- Create the new Actions project using the Actions on Google console;
- Develop the Action’s Intents using the Dialogflow console;
- Develop, deploy, and test the Cloud Function to GCP;
Related Posts
If you’re interested in comparing the development of an Action for Google Assistant with that of Amazon’s Alexa and Microsoft’s LUIS-enabled chatbots, in addition to this post, I would recommend the previous three posts in this conversation interface series:
- Building Serverless Actions for Google Assistant with Google Cloud Functions, Cloud Datastore, and Cloud Storage
- Building and Integrating LUIS-enabled Chatbots with Slack, using Azure Bot Service, Bot Builder SDK, and Cosmos DB,
- Building Asynchronous, Serverless Alexa Skills with AWS Lambda, DynamoDB, S3, and Node.js.
All three article’s demonstrations leverage their respective Cloud platform’s machine learning-based Natural language understanding (NLU) services. All three take advantage of their respective Cloud platform’s NoSQL database and object storage services. Lastly, all three of the article’s demonstrations are written in a common language, Node.js.
All opinions expressed in this post are my own and not necessarily the views of my current or past employers, their clients, or Google.
Building Serverless Actions for Google Assistant with Google Cloud Functions, Cloud Datastore, and Cloud Storage
Posted by Gary A. Stafford in Cloud, GCP, JavaScript, Serverless, Software Development on August 11, 2018
Introduction
In this post, we will create an Action for Google Assistant using the ‘Actions on Google’ development platform, Google Cloud Platform’s serverless Cloud Functions, Cloud Datastore, and Cloud Storage, and the current LTS version of Node.js. According to Google, Actions are pieces of software, designed to extend the functionality of the Google Assistant, Google’s virtual personal assistant, across a multitude of Google-enabled devices, including smartphones, cars, televisions, headphones, watches, and smart-speakers.
Here is a brief YouTube video preview of the final Action for Google Assistant, we will explore in this post, running on an Apple iPhone 8.
If you want to compare the development of an Action for Google Assistant with those of AWS and Azure, in addition to this post, please read my previous two posts in this series, Building and Integrating LUIS-enabled Chatbots with Slack, using Azure Bot Service, Bot Builder SDK, and Cosmos DB and Building Asynchronous, Serverless Alexa Skills with AWS Lambda, DynamoDB, S3, and Node.js. All three of the article’s demonstrations are written in Node.js, all three leverage their cloud platform’s machine learning-based Natural Language Understanding services, and all three take advantage of NoSQL database and storage services available on their respective cloud platforms.
Google Technologies
The final architecture of our Action for Google Assistant will look as follows.
Here is a brief overview of the key technologies we will incorporate into our architecture.
Actions on Google
According to Google, Actions on Google is the platform for developers to extend the Google Assistant. Similar to Amazon’s Alexa Skills Kit Development Console for developing Alexa Skills, Actions on Google is a web-based platform that provides a streamlined user-experience to create, manage, and deploy Actions. We will use the Actions on Google platform to develop our Action in this post.
Dialogflow
According to Google, Dialogflow is an enterprise-grade Natural language understanding (NLU) platform that makes it easy for developers to design and integrate conversational user interfaces into mobile apps, web applications, devices, and bots. Dialogflow is powered by Google’s machine learning for Natural Language Processing (NLP). Dialogflow was initially known as API.AI prior being renamed by Google in late 2017.
We will use the Dialogflow web-based development platform and version 2 of the Dialogflow API, which became GA in April 2018, to build our Action for Google Assistant’s rich, natural-language conversational interface.
Google Cloud Functions
Google Cloud Functions are the event-driven serverless compute platform, part of the Google Cloud Platform (GCP). Google Cloud Functions are comparable to Amazon’s AWS Lambda and Azure Functions. Cloud Functions is a relatively new service from Google, released in beta in March 2017, and only recently becoming GA at Cloud Next ’18 (July 2018). The main features of Cloud Functions include automatic scaling, high availability, fault tolerance, no servers to provision, manage, patch or update, and a payment model based on the function’s execution time. The programmatic logic behind our Action for Google Assistant will be handled by a Cloud Function.
Node.js LTS
We will write our Action’s Google Cloud Function using the Node.js 8 runtime. Google just released the ability to write Google Cloud Functions in Node 8.11.1 and Python 3.7.0, at Cloud Next ’18 (July 2018). It is still considered beta functionality. Previously, you had to write your functions in Node version 6 (currently, 6.14.0).
Node 8, also known as Project Carbon, was the first Long Term Support (LTS) version of Node to support async/await with Promises. Async/await is the new way of handling asynchronous operations in Node.js. We will make use of async/await and Promises within our Action’s Cloud Function.
Google Cloud Datastore
Google Cloud Datastore is a highly-scalable NoSQL database. Cloud Datastore is similar in features and capabilities to Azure Cosmos DB and Amazon DynamoDB. Datastore automatically handles sharding and replication and offers features like a RESTful interface, ACID transactions, SQL-like queries, and indexes. We will use Datastore to persist the information returned to the user from our Action for Google Assistant.
Google Cloud Storage
The last technology, Google Cloud Storage is secure and durable object storage, nearly identical to Amazon Simple Storage Service (Amazon S3) and Azure Blob Storage. We will store publicly accessible images in a Google Cloud Storage bucket, which will be displayed in Google Assistant Basic Card responses.
Demonstration
To demonstrate Actions for Google Assistant, we will build an informational Action that responds to the user with interesting facts about Azure, Microsoft’s Cloud computing platform (Google talking about Azure, ironic). Note this is not intended to be an official Microsoft bot and is only used for demonstration purposes.
Source Code
All open-sourced code for this post can be found on GitHub. Note code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
Development Process
This post will focus on the development and integration of an Action with Google Cloud Platform’s serverless and asynchronous Cloud Functions, Cloud Datastore, and Cloud Storage. The post is not intended to be a general how-to on developing and publishing Actions for Google Assistant, or how to specifically use services on the Google Cloud Platform.
Building the Action will involve the following steps.
- Design the Action’s conversation model;
- Import the Azure Facts Entities into Cloud Datastore on GCP;
- Create and upload the images to Cloud Storage on GCP;
- Create the new Actions on Google project using the Actions on Google console;
- Develop the Action’s Intent using the Dialogflow console;
- Bulk import the Action’s Entities using the Dialogflow console;
- Configure the Dialogflow Actions on Google Integration;
- Develop and deploy the Cloud Function to GCP;
- Test the Action using Actions on Google Simulator;
Let’s explore each step in more detail.
Conversational Model
The conversational model design of the Azure Tech Facts Action for Google Assistant is similar to the Azure Tech Facts Alexa Custom Skill, detailed in my previous post. We will have the option to invoke the Action in two ways, without initial intent (Explicit Invocation) and with intent (Implicit Invocation), as shown below. On the left, we see an example of an explicit invocation of the Action. Google Assistant then queries the user for more information. On the right, an implicit invocation of the Action includes the intent, being the Azure fact they want to learn about. Google Assistant responds directly, both verbally and visually with the fact.
Each fact returned by Google Assistant will include a Simple Response, Basic Card and Suggestions response types for devices with a display, as shown below. The user may continue to ask for additional facts or choose to cancel the Action at any time.
Lastly, as part of the conversational model, we will include the option of asking for a random fact, as well as asking for help. Examples of both are shown below. Again, Google Assistant responds to the user, vocally and, optionally, visually, for display-enabled devices.
GCP Account and Project
The following steps assume you have an existing GCP account and you have created a project on GCP to house the Cloud Function, Cloud Storage Bucket, and Cloud Datastore Entities. The post also assumes that you have the Google Cloud SDK installed on your development machine, and have authenticated your identity from the command line (gist).
# Authenticate with the Google Cloud SDK | |
export PROJECT_ID="<your_project_id>" | |
gcloud beta auth login | |
gcloud config set project ${PROJECT_ID} | |
# Update components or new runtime nodejs8 may be unknown | |
gcloud components update |
Google Cloud Storage
First, the images, actually Azure icons available from Microsoft, displayed in the responses shown above, are uploaded to a Google Storage Bucket. To handle these tasks, we will use the gsutil
CLI to create, upload, and manage the images. The gsutil CLI tool, like gcloud
, is part of the Google Cloud SDK. The gsutil mb
(make bucket) command creates the bucket, gsutil cp
(copy files and objects) command is used to copy the images to the new bucket, and finally, the gsutil iam
(get, set, or change bucket and/or object IAM permissions) command is used to make the images public. I have included a shell script, bucket-uploader.sh
, to make this process easier. (gist).
#!/usr/bin/env sh | |
# author: Gary A. Stafford | |
# site: https://programmaticponderings.com | |
# license: MIT License | |
set -ex | |
# Set constants | |
PROJECT_ID="<your_project_id>" | |
REGION="<your_region>" | |
IMAGE_BUCKET="<your_bucket_name>" | |
# Create GCP Storage Bucket | |
gsutil mb \ | |
-p ${PROJECT_ID} \ | |
-c regional \ | |
-l ${REGION} \ | |
gs://${IMAGE_BUCKET} | |
# Upload images to bucket | |
for file in pics/image-*; do | |
gsutil cp ${file} gs://${IMAGE_BUCKET} | |
done | |
# Make all images public in bucket | |
gsutil iam ch allUsers:objectViewer gs://${IMAGE_BUCKET} |
From the Storage Console on GCP, you should observe the images all have publicly accessible URLs. This will allow the Cloud Function to access the bucket, and retrieve and display the images. There are more secure ways to store and display the images from the function. However, this is the simplest method since we are not concerned about making the images public.
We will need the URL of the new Storage bucket, later, when we develop to our Action’s Cloud Function. The bucket URL can be obtained from the Storage Console on GCP, as shown below in the Link URL.
Google Cloud Datastore
In Cloud Datastore, the category data object is referred to as a Kind, similar to a Table in a relational database. In Datastore, we will have an ‘AzureFact’ Kind of data. In Datastore, a single object is referred to as an Entity, similar to a Row in a relational database. Each one of our entities represents a unique reference value from our Azure Facts Intent’s facts entities, such as ‘competition’ and ‘certifications’. Individual data is known as a Property in Datastore, similar to a Column in a relational database. We will have four Properties for each entity: name, response, title, and image. Lastly, a Key in Datastore is similar to a Primary Key in a relational database. The Key we will use for our entities is the unique reference value string from our Azure Facts Intent’s facts entities, such as ‘competition’ or ‘certifications’. The Key value is stored within the entity’s name Property.
There are a number of ways to create the Datastore entities for our Action, including manually from the Datastore console on GCP. However, to automate the process, we will use a script, written in Node.js and using the Google Cloud Datastore Node.js Client, to create the entities. We will use the Client API’s Datastore Class upsert method, which will create or update an entire collection of entities with one call and returns a callback. The script , upsert-entities.js
, is included in source control and can be run with the following command. Below is a snippet of the script, which shows the structure of the entities (gist).
# Upload Google Datastore entities | |
cd data | |
npm install | |
node ./upsert-entities.js |
Once the upsert
command completes successfully, you should observe a collection of ‘AzureFact’ Type Datastore Entities in the Datastore console on GCP.
Below, we see the structure of a single Datastore Entity, the ‘certifications’ Entity, containing the fact response, title, and name of the image, which is stored in our Google Storage bucket.
New ‘Actions on Google’ Project
With the images uploaded and the database entries created, we can start building our Actions for Google Assistant. Using the Actions on Google web console, we first create a new Actions project.
The Directory Information tab is where we define metadata about the project. This information determines how it will look in the Actions directory and is required to publish your project. The Actions directory is where users discover published Actions on the web and mobile devices.
Actions and Intents
Our project will contain a series of related Actions. According to Google, an Action is ‘an interaction you build for the Assistant that supports a specific intent and has a corresponding fulfillment that processes the intent.’ To build our Actions, we first want to create our Intents. To do so, we will want to switch from the Actions on Google console to the Dialogflow console. Actions on Google provides a link for switching to Dialogflow in the Actions tab.
We will build our Action’s Intents in Dialogflow. The term Intent, used by Dialogflow, is standard terminology across other voice-assistant platforms, such as Amazon’s Alexa and Microsoft’s Azure Bot Service and LUIS. In Dialogflow, will be building Intents—the Azure Facts Intent, Welcome Intent, and the Fallback Intent.
Below, we see the Azure Facts Intent. The Azure Facts Intent is the main Intent, responsible for handling our user’s requests for facts about Azure. The Intent includes a fair number, but certainly not an exhaustive list, of training phrases. These represent all the possible ways a user might express intent when invoking the Action. According to Google, the greater the number of natural language examples in the Training Phrases section of Intents, the better the classification accuracy.
Intent Entities
Each of the highlighted words in the training phrases maps to the facts parameter, which maps to a collection of @facts Entities. Entities represent a list of intents the Action is trained to understand. According to Google, there are three types of entities: system (defined by Dialogflow), developer (defined by a developer), and user (built for each individual end-user in every request) entities. We will be creating developer type entities for our Action’s Intent.
Synonyms
An entity contains Synonyms. Multiple synonyms may be mapped to a single reference value. The reference value is the value passed to the Cloud Function by the Action. For example, take the reference value of ‘competition’. A user might ask Google about Azure’s competition. However, the user might also substitute the words ‘competitor’ or ‘competitors’ for ‘competition’. Using synonyms, if the user utters any of these three words in their intent, they will receive the same response.
Although our Azure Facts Action is a simple example, typical Actions might contain hundreds of entities or more, each with several synonyms. Dialogflow provides the option of copy and pasting bulk entities, in either JSON or CSV format. The project’s source code includes both JSON or CSV formats, which may be input in this manner.
Automated Expansion
Not every possible fact, which will have a response, returned by Google Assistant, needs an entity defined. For example, we created a ‘compliance’ Cloud Datastore Entity. The Action understands the term ‘compliance’ and will return a response to the user if they ask about Azure compliance. However, ‘compliance’ is not defined as an Intent Entity, since we have chosen not to define any synonyms for the term ‘compliance’.
In order to allow this, you must enable Allow Automated Expansion. According to Google, this option allows an Agent to recognize values that have not been explicitly listed in the entity. Google describes Agents as NLU (Natural Language Understanding) modules.
Actions on Google Integration
Another configuration item in Dialogflow that needs to be completed is the Dialogflow’s Actions on Google integration. This will integrate the Azure Tech Facts Action with Google Assistant. Google provides more than a dozen different integrations, as shown below.
The Dialogflow’s Actions on Google integration configuration is simple, just choose the Azure Facts Intent as our Action’s Implicit Invocation intent, in addition to the default Welcome Intent, which is our Action’s Explicit Invocation intent. According to Google, integration allows our Action to reach users on every device where the Google Assistant is available.
Action Fulfillment
When an intent is received from the user, it is fulfilled by the Action. In the Dialogflow Fulfillment console, we see the Action has two fulfillment options, a Webhook or a Cloud Function, which can be edited inline. A Webhook allows us to pass information from a matched intent into a web service and get a result back from the service. In our example, our Action’s Webhook will call our Cloud Function, using the Cloud Function’s URL endpoint. We first need to create our function in order to get the endpoint, which we will do next.
Google Cloud Functions
Our Cloud Function, called by our Action, is written in Node.js 8. As stated earlier, Node 8 LTS was the first LTS version to support async/await with Promises. Async/await is the new way of handling asynchronous operations in Node.js, replacing callbacks.
Our function, index.js, is divided into four sections: constants, intent handlers, helper functions, and the function’s entry point. The Cloud Function attempts to follow many of the coding practices from Google’s code examples on Github.
Constants
The section defines the global constants used within the function. Note the constant for the URL of our new Cloud Storage bucket, on line 30 below, IMAGE_BUCKET
, references an environment variable, process.env.IMAGE_BUCKET
. This value is set in the .env.yaml
file. All environment variables in the .env.yaml
file will be set during the Cloud Function’s deployment, explained later in this post. Environment variables were recently released, and are still considered beta functionality (gist).
// author: Gary A. Stafford | |
// site: https://programmaticponderings.com | |
// license: MIT License | |
'use strict'; | |
/* CONSTANTS */ | |
const { | |
dialogflow, | |
Suggestions, | |
BasicCard, | |
SimpleResponse, | |
Image, | |
} = require('actions-on-google'); | |
const functions = require('firebase-functions'); | |
const Datastore = require('@google-cloud/datastore'); | |
const datastore = new Datastore({}); | |
const app = dialogflow({debug: true}); | |
app.middleware(conv => { | |
conv.hasScreen = | |
conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT'); | |
conv.hasAudioPlayback = | |
conv.surface.capabilities.has('actions.capability.AUDIO_OUTPUT'); | |
}); | |
const IMAGE_BUCKET = process.env.IMAGE_BUCKET; | |
const SUGGESTION_1 = 'tell me a random fact'; | |
const SUGGESTION_2 = 'help'; | |
const SUGGESTION_3 = 'cancel'; |
The npm package dependencies declared in the constants section, are defined in the dependencies section of the package.json
file. Function dependencies include Actions on Google, Firebase Functions, and Cloud Datastore (gist).
"dependencies": { | |
"@google-cloud/datastore": "^1.4.1", | |
"actions-on-google": "^2.2.0", | |
"dialogflow": "^0.6.0", | |
"dialogflow-fulfillment": "^0.5.0", | |
"firebase-admin": "^6.0.0", | |
"firebase-functions": "^2.0.2" | |
} |
Intent Handlers
The three intent handlers correspond to the three intents in the Dialogflow console: Azure Facts Intent, Welcome Intent, and Fallback Intent. Each handler responds in a very similar fashion. The handlers all return a SimpleResponse
for audio-only and display-enabled devices. Optionally, a BasicCard
is returned for display-enabled devices (gist).
/* INTENT HANDLERS */ | |
app.intent('Welcome Intent', conv => { | |
const WELCOME_TEXT_SHORT = 'What would you like to know about Microsoft Azure?'; | |
const WELCOME_TEXT_LONG = `What would you like to know about Microsoft Azure? ` + | |
`You can say things like: \n` + | |
` _'tell me about Azure certifications'_ \n` + | |
` _'when was Azure released'_ \n` + | |
` _'give me a random fact'_`; | |
const WELCOME_IMAGE = 'image-16.png'; | |
conv.ask(new SimpleResponse({ | |
speech: WELCOME_TEXT_SHORT, | |
text: WELCOME_TEXT_SHORT, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
text: WELCOME_TEXT_LONG, | |
title: 'Azure Tech Facts', | |
image: new Image({ | |
url: `${IMAGE_BUCKET}/${WELCOME_IMAGE}`, | |
alt: 'Azure Tech Facts', | |
}), | |
display: 'WHITE', | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Fallback Intent', conv => { | |
const FACTS_LIST = "Certifications, Cognitive Services, Competition, Compliance, First Offering, Functions, " + | |
"Geographies, Global Infrastructure, Platforms, Categories, Products, Regions, and Release Date"; | |
const WELCOME_TEXT_SHORT = 'Need a little help?'; | |
const WELCOME_TEXT_LONG = `Current facts include: ${FACTS_LIST}.`; | |
const WELCOME_IMAGE = 'image-15.png'; | |
conv.ask(new SimpleResponse({ | |
speech: WELCOME_TEXT_LONG, | |
text: WELCOME_TEXT_SHORT, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
text: WELCOME_TEXT_LONG, | |
title: 'Azure Tech Facts Help', | |
image: new Image({ | |
url: `${IMAGE_BUCKET}/${WELCOME_IMAGE}`, | |
alt: 'Azure Tech Facts', | |
}), | |
display: 'WHITE', | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); | |
app.intent('Azure Facts Intent', async (conv, {facts}) => { | |
let factToQuery = facts.toString(); | |
let fact = await buildFactResponse(factToQuery); | |
const AZURE_TEXT_SHORT = `Sure, here's a fact about ${fact.title}`; | |
conv.ask(new SimpleResponse({ | |
speech: fact.response, | |
text: AZURE_TEXT_SHORT, | |
})); | |
if (conv.hasScreen) { | |
conv.ask(new BasicCard({ | |
text: fact.response, | |
title: fact.title, | |
image: new Image({ | |
url: `${IMAGE_BUCKET}/${fact.image}`, | |
alt: fact.title, | |
}), | |
display: 'WHITE', | |
})); | |
conv.ask(new Suggestions([SUGGESTION_1, SUGGESTION_2, SUGGESTION_3])); | |
} | |
}); |
The Welcome Intent handler handles explicit invocations of our Action. The Fallback Intent handler handles both help requests, as well as cases when Dialogflow cannot match any of the user’s input. Lastly, the Azure Facts Intent handler handles implicit invocations of our Action, returning a fact to the user from Cloud Datastore, based on the user’s requested fact.
Helper Functions
The next section of the function contains two helper functions. The primary function is the buildFactResponse
function. This is the function that queries Google Cloud Datastore for the fact. The second function, the selectRandomFact
, handles the fact value of ‘random’, by selecting a random fact value to query Datastore. (gist).
/* HELPER FUNCTIONS */ | |
function selectRandomFact() { | |
const FACTS_ARRAY = ['description', 'released', 'global', 'regions', | |
'geographies', 'platforms', 'categories', 'products', 'cognitive', | |
'compliance', 'first', 'certifications', 'competition', 'functions']; | |
return FACTS_ARRAY[Math.floor(Math.random() * FACTS_ARRAY.length)]; | |
} | |
function buildFactResponse(factToQuery) { | |
return new Promise((resolve, reject) => { | |
if (factToQuery.toString().trim() === 'random') { | |
factToQuery = selectRandomFact(); | |
} | |