Archive for category Serverless
Building Serverless Actions for Google Assistant with Google Cloud Functions, Cloud Datastore, and Cloud Storage
In this post, we will create an Action for Google Assistant using the ‘Actions on Google’ development platform, Google Cloud Platform’s serverless Cloud Functions, Cloud Datastore, and Cloud Storage, and the current LTS version of Node.js. According to Google, Actions are pieces of software, designed to extend the functionality of the Google Assistant, Google’s virtual personal assistant, across a multitude of Google-enabled devices, including smartphones, cars, televisions, headphones, watches, and smart-speakers.
Here is a brief YouTube video preview of the final Action for Google Assistant, we will explore in this post, running on an Apple iPhone 8.
If you want to compare the development of an Action for Google Assistant with that of an Amazon Alexa Skill, using similar serverless technologies, then also read my previous post, Building Asynchronous, Serverless Alexa Skills with AWS Lambda, DynamoDB, S3, and Node.js.
The final architecture of our Action for Google Assistant will look as follows.
Here is a brief overview of the key technologies we will incorporate into our architecture.
Actions on Google
According to Google, Actions on Google is the platform for developers to extend the Google Assistant. Similar to Amazon’s Alexa Skills Kit Development Console for developing Alexa Skills, Actions on Google is a web-based platform that provides a streamlined user-experience to create, manage, and deploy Actions. We will use the Actions on Google platform to develop our Action in this post.
According to Google, Dialogflow is an enterprise-grade Natural language understanding (NLU) platform that makes it easy for developers to design and integrate conversational user interfaces into mobile apps, web applications, devices, and bots. Dialogflow is powered by Google’s machine learning for Natural Language Processing (NLP). Dialogflow was initially known as API.AI prior being renamed by Google in late 2017.
We will use the Dialogflow web-based development platform and version 2 of the Dialogflow API, which became GA in April 2018, to build our Action for Google Assistant’s rich, natural-language conversational interface.
Google Cloud Functions
Google Cloud Functions are the event-driven serverless compute platform, part of the Google Cloud Platform (GCP). Google Cloud Functions are comparable to Amazon’s AWS Lambda and Azure Functions. Cloud Functions is a relatively new service from Google, released in beta in March 2017, and only recently becoming GA at Cloud Next ’18 (July 2018). The main features of Cloud Functions include automatic scaling, high availability, fault tolerance, no servers to provision, manage, patch or update, and a payment model based on the function’s execution time. The programmatic logic behind our Action for Google Assistant will be handled by a Cloud Function.
We will write our Action’s Google Cloud Function using the Node.js 8 runtime. Google just released the ability to write Google Cloud Functions in Node 8.11.1 and Python 3.7.0, at Cloud Next ’18 (July 2018). It is still considered beta functionality. Previously, you had to write your functions in Node version 6 (currently, 6.14.0).
Node 8, also known as Project Carbon, was the first Long Term Support (LTS) version of Node to support async/await with Promises. Async/await is the new way of handling asynchronous operations in Node.js. We will make use of async/await and Promises within our Action’s Cloud Function.
Google Cloud Datastore
Google Cloud Datastore is a highly-scalable NoSQL database. Cloud Datastore is similar in features and capabilities to Azure Cosmos DB and Amazon DynamoDB. Datastore automatically handles sharding and replication and offers features like a RESTful interface, ACID transactions, SQL-like queries, and indexes. We will use Datastore to persist the information returned to the user from our Action for Google Assistant.
Google Cloud Storage
The last technology, Google Cloud Storage is secure and durable object storage, nearly identical to Amazon Simple Storage Service (Amazon S3) and Azure Blob Storage. We will store publicly accessible images in a Google Cloud Storage bucket, which will be displayed in Google Assistant Basic Card responses.
To demonstrate Actions for Google Assistant, we will build an educational Action that responds to the user with interesting facts about Azure¹, Microsoft’s Cloud computing platform (Google talking about Azure, ironic). This is not intended to be an official Microsoft Action; it is only used for this demonstration and has not been published.
All open-sourced code for this post can be found on GitHub. Note code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
This post will focus on the development and integration of an Action with Google Cloud Platform’s serverless and asynchronous Cloud Functions, Cloud Datastore, and Cloud Storage. The post is not intended to be a general how-to on developing and publishing Actions for Google Assistant, or how to specifically use services on the Google Cloud Platform.
Building the Action will involve the following steps.
- Design the Action’s conversation model;
- Import the Azure Facts Entities into Cloud Datastore on GCP;
- Create and upload the images to Cloud Storage on GCP;
- Create the new Actions project using the Actions on Google console;
- Develop the Action’s Intent using the Dialogflow console;
- Bulk import the Action’s Entities using the Dialogflow console;
- Configure the Dialogflow Actions on Google Integration;
- Develop the Cloud Function an IDE and deploy it to GCP;
- Test the Action using Actions on Google Simulator;
Let’s explore each step in more detail.
The conversational model design of the Azure Tech Facts Action for Google Assistant is similar to the Azure Tech Facts Alexa Custom Skill, detailed in my previous post. We will have the option to invoke the Action in two ways, without initial intent (Explicit Invocation) and with intent (Implicit Invocation), as shown below. On the left, we see an example of an explicit invocation of the Action. Google Assistant then queries the user for more information. On the right, an implicit invocation of the Action includes the intent, being the Azure fact they want to learn about. Google Assistant responds directly, both vocally and visually with the fact.
Each fact returned by Google Assistant will include a Simple Response, Basic Card and Suggestions response types for devices with a display, as shown below. The user may continue to ask for additional facts or choose to cancel the Action at any time.
Lastly, as part of the conversational model, we will include the option of asking for a random fact, as well as asking for help. Examples of both are shown below. Again, Google Assistant responds to the user, vocally and, optionally, visually, for display-enabled devices.
GCP Account and Project
The following steps assume you have an existing GCP account and you have created a project on GCP to house the Cloud Function, Cloud Storage Bucket, and Cloud Datastore Entities. The post also assumes that you have the Google Cloud SDK installed on your development machine, and have authenticated your identity from the command line (gist).
Google Cloud Storage
First, the images, actually Azure icons available from Microsoft, displayed in the responses shown above, are uploaded to a Google Storage Bucket. To handle these tasks, we will use the
gsutil CLI to create, upload, and manage the images. The gsutil CLI tool, like
gcloud, is part of the Google Cloud SDK. The
gsutil mb (make bucket) command creates the bucket,
gsutil cp (copy files and objects) command is used to copy the images to the new bucket, and finally, the
gsutil iam (get, set, or change bucket and/or object IAM permissions) command is used to make the images public. I have included a shell script,
bucket-uploader.sh, to make this process easier. (gist).
From the Storage Console on GCP, you should observe the images all have publicly accessible URLs. This will allow the Cloud Function to access the bucket, and retrieve and display the images. There are more secure ways to store and display the images from the function. However, this is the simplest method since we are not concerned about making the images public.
We will need the URL of the new Storage bucket, later, when we develop to our Action’s Cloud Function. The bucket URL can be obtained from the Storage Console on GCP, as shown below in the Link URL.
Google Cloud Datastore
In Cloud Datastore, the category data object is referred to as a Kind, similar to a Table in a relational database. In Datastore, we will have an ‘AzureFact’ Kind of data. In Datastore, a single object is referred to as an Entity, similar to a Row in a relational database. Each one of our entities represents a unique reference value from our Azure Facts Intent’s facts entities, such as ‘competition’ and ‘certifications’. Individual data is known as a Property in Datastore, similar to a Column in a relational database. We will have four Properties for each entity: name, response, title, and image. Lastly, a Key in Datastore is similar to a Primary Key in a relational database. The Key we will use for our entities is the unique reference value string from our Azure Facts Intent’s facts entities, such as ‘competition’ or ‘certifications’. The Key value is stored within the entity’s name Property.
There are a number of ways to create the Datastore entities for our Action, including manually from the Datastore console on GCP. However, to automate the process, we will use a script, written in Node.js and using the Google Cloud Datastore Node.js Client, to create the entities. We will use the Client API’s Datastore Class upsert method, which will create or update an entire collection of entities with one call and returns a callback. The script ,
upsert-entities.js, is included in source control and can be run with the following command. Below is a snippet of the script, which shows the structure of the entities (gist).
upsert command completes successfully, you should observe a collection of ‘AzureFact’ Type Datastore Entities in the Datastore console on GCP.
Below, we see the structure of a single Datastore Entity, the ‘certifications’ Entity, containing the fact response, title, and name of the image, which is stored in our Google Storage bucket.
New Actions on Google Project
With the images uploaded and the database entries created, we can start building our Action. Using the Actions on Google web console, we first create a new Actions project.
The Directory Information tab is where we define all the metadata about the Action. This information determines how the Action will look in the Actions directory and is required to publish your Action. The directory is where users discover published Actions on the web and mobile devices.
Actions and Intents
The first part of developing a Google Action is building your Actions, more commonly called Intents. I personally find the overloading of the word ‘Actions’ in the Actions on Google console to be rather confusing.
When developing the Actions, I suggest switching from the Actions on Google console to the Dialogflow console. Actions on Google provides a link to switch to Dialogflow. The first thing you will notice when switching to Dialogflow is what was called Actions in the Actions on Google console are now referred to as Intents in Dialogflow. The word Intent, used by Dialogflow, is standard terminology across other voice-assistant platforms, such as Alexa. To be clear, we are building an Action, which contains Intents — the Azure Facts Intent, Welcome Intent, and the Fallback Intent.
Below, we see the Azure Facts Intent. The Azure Facts Intent is the main Intent, responsible for handling our user’s requests for facts about Azure. The Intent includes a fair number, but certainly not an exhaustive list, of training phrases. These represent all the possible ways a user might express intent when invoking the Action. According to Google, the greater the number of natural language examples in the Training Phrases section of Intents, the better the classification accuracy.
Each of the highlighted words in the training phrases maps to the facts parameter, which maps to a collection of @facts Entities. Entities represent a list of intents the Action is trained to understand. According to Google, there are three types of entities: system (defined by Dialogflow), developer (defined by a developer), and user (built for each individual end-user in every request) entities. We will be creating developer type entities for our Action’s Intent.
An entity contains Synonyms. Multiple synonyms may be mapped to a single reference value. The reference value is the value passed to the Cloud Function by the Action. For example, take the reference value of ‘competition’. A user might ask Google about Azure’s competition. However, the user might also substitute the words ‘competitor’ or ‘competitors’ for ‘competition’. Using synonyms, if the user utters any of these three words in their intent, they will receive the same response.
Although our Azure Facts Action is a simple example, typical Actions might contain hundreds of entities or more, each with several synonyms. Dialogflow provides the option of copy and pasting bulk entities, in either JSON or CSV format. The project’s source code includes both JSON or CSV formats, which may be input in this manner.
Not every possible fact, which will have a response, returned by Google Assistant, needs an entity defined. For example, we created a ‘compliance’ Cloud Datastore Entity. The Action understands the term ‘compliance’ and will return a response to the user if they ask about Azure compliance. However, ‘compliance’ is not defined as an Intent Entity, since we have chosen not to define any synonyms for the term ‘compliance’.
In order to allow this, you must enable Allow Automated Expansion. According to Google, this option allows an Agent to recognize values that have not been explicitly listed in the entity. Google describes Agents as NLU (Natural Language Understanding) modules.
Actions on Google Integration
Another configuration item in Dialogflow that needs to be completed is the Dialogflow’s Actions on Google integration. This will integrate the Azure Tech Facts Action with Google Assistant. Google provides more than a dozen different integrations, as shown below.
The Dialogflow’s Actions on Google integration configuration is simple, just choose the Azure Facts Intent as our Action’s Implicit Invocation intent, in addition to the default Welcome Intent, which is our Action’s Explicit Invocation intent. According to Google, integration allows our Action to reach users on every device where the Google Assistant is available.
When an intent is received from the user, it is fulfilled by the Action. In the Dialogflow Fulfillment console, we see the Action has two fulfillment options, a Webhook or a Cloud Function, which can be edited inline. A Webhook allows us to pass information from a matched intent into a web service and get a result back from the service. In our example, our Action’s Webhook will call our Cloud Function, using the Cloud Function’s URL endpoint. We first need to create our function in order to get the endpoint, which we will do next.
Google Cloud Functions
Our Cloud Function, called by our Action, is written in Node.js 8. As stated earlier, Node 8 LTS was the first LTS version to support async/await with Promises. Async/await is the new way of handling asynchronous operations in Node.js, replacing callbacks.
Our function, index.js, is divided into four sections: constants, intent handlers, helper functions, and the function’s entry point. The Cloud Function attempts to follow many of the coding practices from Google’s code examples on Github.
The section defines the global constants used within the function. Note the constant for the URL of our new Cloud Storage bucket, on line 30 below,
IMAGE_BUCKET, references an environment variable,
process.env.IMAGE_BUCKET. This value is set in the
.env.yaml file. All environment variables in the
.env.yaml file will be set during the Cloud Function’s deployment, explained later in this post. Environment variables were recently released, and are still considered beta functionality (gist).
The npm package dependencies declared in the constants section, are defined in the dependencies section of the
package.json file. Function dependencies include Actions on Google, Firebase Functions, and Cloud Datastore (gist).
The three intent handlers correspond to the three intents in the Dialogflow console: Azure Facts Intent, Welcome Intent, and Fallback Intent. Each handler responds in a very similar fashion. The handlers all return a
SimpleResponse for audio-only and display-enabled devices. Optionally, a
BasicCard is returned for display-enabled devices (gist).
The Welcome Intent handler handles explicit invocations of our Action. The Fallback Intent handler handles both help requests, as well as cases when Dialogflow cannot match any of the user’s input. Lastly, the Azure Facts Intent handler handles implicit invocations of our Action, returning a fact to the user from Cloud Datastore, based on the user’s requested fact.
The next section of the function contains two helper functions. The primary function is the
buildFactResponse function. This is the function that queries Google Cloud Datastore for the fact. The second function, the
selectRandomFact, handles the fact value of ‘random’, by selecting a random fact value to query Datastore. (gist).
Async/Await, Promises, and Callbacks
Let’s look closer at the relationship and asynchronous nature of the Azure Facts Intent intent handler and
buildFactResponse function. Below, note the
async function on line 1 in the intent and the
await function on line 3, which is part of the
buildFactResponse function call. This is typically how we see async/await applied when calling an asynchronous function, such as
await function allows the intent’s execution to wait for the
buildFactResponse function’s Promise to be resolved, before attempting to use the resolved value to construct the response.
buildFactResponse function returns a Promise, as seen on line 28. The Promise’s payload contains the results of the successful callback from the Datastore API’s
runQuery function. The
runQuery function returns a callback, which is then resolved and returned by the Promise, as seen on line 40 (gist).
The payload returned by Google Datastore, through the resolved Promise to the intent handler, will resemble the example response, shown below. Note the
title key/value pairs in the
textPayload section of the response payload. These are what are used to format the
BasicCard responses (gist).
Cloud Function Deployment
To deploy the Cloud Function to GCP, use the
gcloud CLI with the beta version of the functions deploy command. According to Google,
gcloud is a part of the Google Cloud SDK. You must download and install the SDK on your system and initialize it before you can use
gcloud. You should ensure that your function is deployed to the same region as your Google Storage Bucket. Currently, Cloud Functions are only available in four regions. I have included a shell script,
deploy-cloud-function.sh, to make this step easier. (gist).
The creation or update of the Cloud Function can take up to two minutes. Note the
.gcloudignore file referenced in the verbose output below. This file is created the first time you deploy a new function. Using the the
.gcloudignore file, you can limit the deployed files to just the function (
index.js) and the
package.json file. There is no need to deploy any other files to GCP.
If you recall, the URL endpoint of the Cloud Function is required in the Dialogflow Fulfillment tab. The URL can be retrieved from the deployment output (shown above), or from the Cloud Functions Console on GCP (shown below). The Cloud Function is now deployed and will be called by the Action when a user invokes the Action.
Simulation Testing and Debugging
With our Action and all its dependencies deployed and configured, we can test the Action using the Simulation console on Actions on Google. According to Google, the Action Simulation console allows us to manually test our Action by simulating a variety of Google-enabled hardware devices and their settings. You can also access debug information such as the request and response that your fulfillment receives and sends.
Below, in the Action Simulation console, we see the successful display of the initial Azure Tech Facts containing the expected Simple Response, Basic Card, and Suggestions, triggered by a user’s explicit invocation of the Action.
The simulated response indicates that the Google Cloud Function was called, and it responded successfully. It also indicates that the Google Cloud Function was able to successfully retrieve the correct image from Google Cloud Storage.
Below, we see the successful response to the user’s implicit invocation of the Action, in which they are seeking a fact about Azure’s Cognitive Services. The simulated response indicates that the Google Cloud Function was called, and it responded successfully. It also indicates that the Google Cloud Function was able to successfully retrieve the correct Entity from Google Cloud Datastore, as well as the correct image from Google Cloud Storage.
If we had issues with the testing, the Action Simulation console also contains tabs containing the request and response objects sent to and from the Cloud Function, the audio response, a debug console, and any errors.
Logging and Analytics
In addition to the Simulation console’s ability to debug issues with our service, we also have Google Stackdriver Logging. The Stackdriver logs, which are viewed from the GCP management console, contain the complete requests and responses, to and from the Cloud Function, from the Google Assistant Action. The Stackdriver logs will also contain any logs entries you have explicitly placed in the Cloud Function.
We also have the ability to view basic Analytics about our Action from within the Dialogflow Analytics console. Analytics displays metrics, such as the number of sessions, the number of queries, the number of times each Intent was triggered, how often users exited the Action from an intent, and Sessions flows, shown below.
In simple Action such as this one, the Session flow is not very beneficial. However, in more complex Actions, with multiple Intents and a variety potential user interactions, being able to visualize Session flows becomes essential to understanding the user’s conversational path through the Action.
In this post, we have seen how to use the Actions on Google development platform and the latest version of the Dialogflow API to build Google Actions. Google Actions rather effortlessly integrate with the breath Google Cloud Platform’s many serverless offerings, including Google Cloud Functions, Cloud Datastore, and Cloud Storage.
We have seen how Google is quickly maturing their serverless functions, to compete with AWS and Azure, with the recently announced support of LTS version 8 of Node.js and Python, to create an Actions for Google Assistant.
Impact of Serverless
As an Engineer, I have spent endless days, late nights, and thankless weekends, building, deploying and managing servers, virtual machines, container clusters, persistent storage, and database servers. I think what is most compelling about platforms like Actions on Google, but even more so, serverless technologies on GCP, is that I spend the majority of my time architecting and developing compelling software. I don’t spend time managing infrastructure, worrying about capacity, configuring networking and security, and doing DevOps.
¹Azure is a trademark of Microsoft
All opinions expressed in this post are my own and not necessarily the views of my current or past employers, their clients, or Google and Microsoft.
In the following post, we will use the new version 2 of the Alexa Skills Kit, AWS Lambda, Amazon DynamoDB, Amazon S3, and the latest LTS version Node.js, to create an Alexa Custom Skill. According to Amazon, a custom skill allows you to define the requests the skill can handle (intents) and the words users say to invoke those requests (utterances).
Alexa Skills Kit
According to Amazon, the Alexa Skills Kit (ASK) is a collection of self-service APIs, tools, documentation, and code samples that makes it possible to add skills to Alexa. The Alexa Skills Kit supports building different types of skills. Currently, Alexa skill types include Custom, Smart Home, Video, Flash Briefing, and List Skills. Each skill type makes use of a different Alexa Skill API.
AWS Serverless Platform
To create a custom skill for Alexa, you currently have the choice of using an AWS Lambda function or a web service. The AWS Lambda is part of an ecosystem of Cloud services and Developer tools, Amazon refers to as the AWS Serverless Platform. The platform’s services are designed to support the development and hosting of highly-performant, enterprise-grade serverless applications.
In April 2018, AWS Lamba announced support for the Node.js 8.10 runtime, which is the current Long Term Support (LTS) version of Node.js. Node 8, also known as Project Carbon, was the first LTS version of Node to support async/await with Promises. Async/await is the new way of handling asynchronous operations in Node.js. We will make use of async/await and Promises with the custom skill.
To demonstrate Alexa Custom Skills we will build an informational skill that responds to the user with interesting facts about Azure¹, Microsoft’s Cloud computing platform (Alexa talking about Azure, ironic, I know). This is not an official Microsoft skill; it is only used for this demonstration and has not been published.
All open-source code for this post can be found on GitHub. Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
Important, this post and the associated source code were updated from v1.0 to v2.0 on 13 August 2018. You should clone the GitHub project again, to correspond with this revised post, if you originally cloned the project before 14 August 2018. Code changes were significant.
This objective of the fact-based skill will be to demonstrate the following.
- Build, deploy, and test an Alexa Custom Skill using AWS Lambda and Node.js;
- Use DynamoDB to store and retrieve Alexa voice responses;
- Maintain a count of user’s questions in DynamoDB using atomic counters;
- Use Amazon S3 to store and retrieve images, used in Display Cards;
- Log Alexa Skill activities using Amazon CloudWatch Logs;
Steps to Build
Building the Azure fact skill will involve the following steps.
- Design the Alexa skill’s voice interaction model;
- Design the skill’s Display Cards for Alexa-enabled products, to enhance the voice experience;
- Create the skill’s DynamoDB table and import the responses the skill will return;
- Create an S3 bucket and upload the images used for the Display Cards;
- Write the Alexa Skill, which involves mapping the user’s spoken input to the intents your cloud-based service can handle;
- Write the Lambda function, which involves responding to the user’s utterances, by building and returning appropriate voice and display card responses, from DynamoDB and S3;
- Extend the default ASK-generated AWS IAM Role, to allow the Lambda to update DynamoDB;
- Deploy the skill;
- Test the skill;
Let’s explore each step in detail.
Voice Interaction Model
First, we must design the fact skill’s voice interaction model. We need to consider the way we want the user to interact with the skill. What is the user’s conversational journey? How do they invoke your skill? How will the user provide intent?
This skill will require two intent slot values, the fact the user is interested in (i.e. ‘global infrastructure’) and the user’s first name (i.e. ‘Susan’). We will train the skill to allow Alexa to query the user for each slot value, but also allow the user to provide either or both values in the initial intent invocation. We will also allow the user to request a random fact.
Shown below in the Alexa Skills Kit Development Console Test tab are three examples of interactions the skill is trained to understand and handle:
- The first example on the left invokes the skill with no intent (‘Alexa, load Azure Tech Facts). The user is led through a series of three questions to obtain the full intent.
- The center example is similar, however, the initial invocation contains a partial intent (‘Alexa, ask Azure Tech Facts for a fact about certifications’). Alexa must still ask for the user’s name.
- Lastly, the example on the right is a so-called ‘one-shot’ invocation (‘Alexa, ask Azure Tech Facts about Azure’s platforms for Gary’). The user’s invocation of the skill contains a complete intent, allowing Alexa to respond immediately with a fact about Azure platforms.
In all cases, our skill has the ability to continue to provide the user with additional facts if they chose, or they may cancel at any time.
We also need to design how Alexa will respond. What is the persona will Alexa assume through her words, phrases, and use of Speech Synthesis Markup Language (SSML).
User Interaction Previews
Here are a few examples of interactions with the final Alexa skill using an iPhone 8 and the Alexa App. They are intended to show the rich conversational capabilities of custom skills more so the than the display, which is pretty poor on the Alexa App as compared to the Echo Show or even Echo Spot.
Example 1: Indirect Invocation
The first example shows a basic interaction with our Alexa skill. It demonstrates an indirect invocation, a user utterance without initial intent. It also illustrates several variations of user utterances (YouTube).
Example 2: Direct Invocation
The second example of an interaction our skill demonstrates a direct invocation, in which the initial user utterance contains intent. It also demonstrates the user following up with additional requests (YouTube).
Example 3: Direct Invocation, Help, Problem
Lastly, another direct invocation demonstrates the use of the Help Intent. You also see an example of when Alexa does not understand the user’s utterance. The user is able to repeat their request, more clearly (YouTube).
Visual Interaction Model
Many Alexa-enabled devices are capable of both vocal and visual responses. Designing for a multimodal user experience is important. The instructional skill will provide vocal responses, as well as Display Cards optimized for the Amazon Echo Show. The skill contains a basic design for the Display Card shown during the initial invocation, where there is no intent uttered by the user.
The fact skill also contains a Display Card, designed to present the final Alexa response to the user’s intent. The content of the vocal and visual response is returned from DynamoDB via the Lambda function. The random Azure icons, available from Microsoft, are hosted in an S3 bucket. Each fact response is unique, as well as the icon associated with the fact.
The Display Cards will also work on other Alexa-enabled screen-based products. Shown below is the same card on an iPhone 8 using the Amazon Alexa app. This is the same app shown in the videos, above.
Next, we create the DynamoDB table used to store the facts the Alexa skill will respond with when invoked by the user. DynamoDB is Amazon’s non-relational database that delivers reliable performance at any scale. DynamoDB consists of three basic components: tables, items, and attributes.
There are numerous ways to create a DynamoDB table. For simplicity, I created the
AzureFacts DynamoDB table using the AWS CLI (gist). You could also choose CloudFormation, or create the table using any of nine or more programming languages with an AWS SDK.
AzureFacts table’s schema has four key/value pair attributes per item: Fact, Response, Image, and Hits. The Fact attribute, a string, contains the name of the fact the user is seeking. The Fact attribute also serves as the table’s unique partition key. The Response attribute, a string, contains the conversational response Alexa will return. The Image attribute, a string, contains the name of the image in the S3 bucket displayed by Alexa. Lastly, the Hits attribute, a number, stores the number of user requests for a particular fact.
Importing Table Items
After the DynamoDB table is created, the pre-defined facts are imported into the empty table using AWS CLI (gist). The JSON-formatted data file, AzureFacts.json, is included with the source code on GitHub.
The resulting table should appear as follows in the AWS Management Console.
Note the imported items shown below. The Hits counts reflect the number of times each fact has been requested.
Shown below is a detailed view of a single item that was imported into the DynamoDB table.
Amazon S3 Image Bucket
Next, we create the Amazon S3 bucket, which will house the images, actually Azure icons as PNGs, returned by Alexa with each fact. Again, I used the AWS CLI for simplicity (gist).
The images can be uploaded manually to the bucket through a web browser, or programmatically, using the AWS CLI or SDKs. You will need to ensure the images are made public so they can be displayed by Alexa.
Next, we create the actual Alexa custom skill. I have used version 2 of the Alexa Skills Kit (ASK) Software Development Kit (SDK) for Node.js and the new ASK Command Line Interface (ASK CLI) to create the skill. The ASK SDK v2 for Node.js was recently released in April 2018. If you have previously written Alexa skills using version 1 of the Node.js SDK, the creation of a new project and the format of the Lambda Node.js code is somewhat different. I strongly suggest reviewing the example skills provided by Amazon on GitHub.
With version 1, I would have likely used the Alexa Skills Kit Development Console to develop and deploy the skill, and separate IDE, like JetBrains WebStorm, to write the Lambda. The JSON-format skill would live in the Alexa Skills Kit Development Console, and my Lambda in source control. I would have used AWS Serverless Application Model (AWS SAM) or Claudia.js to handle the deployment of Lambda functions.
With version 2 of ASK, you can easily create and manage the Alexa skill’s JSON-formatted code, as well as the Lambda, all from the command-line and a single IDE or text editor. All components that comprise the skill can be kept together in source control. I now only use the Alexa Skills Kit Development Console to preview my deployed skill and for testing. I am not going to go into detail about creating a new project using the ASK CLI, I suggest reviewing Amazon’s instructional guides.
Below, I have initiated a new AWS profile for the Alexa skill using the
ask init command.
There are three main parts to the new skill project created by the ASK CLI: the skill’s manifest (skill.json), model(s) (en-US.json), and API endpoint, the Lambda (index.js). The skill’s manifest, skill.json, contains information (metadata) about the skill. This is the same information you find in the Distribution tab of the Alexa Skills Kit Development Console. The manifest includes publishing information, example phrases to invoke the skill, the skill’s category, distribution locales, privacy information, and the location of the skill’s API endpoint, the Lambda. An end-user would most commonly see this information in Amazon Alexa app when adding skills to their Alexa-enabled devices.
Next, the skill’s model, en-US.json, is located the models sub-directory. This file defines the skill’s custom interaction model, it contains the skill’s interaction model written in JSON, which includes the invocation name, intents, standard and custom slots, sample utterances, slot values, and synonyms of those values. This is the same information you would find in the Build tab of the Alexa Skills Kit Development Console. Amazon has an excellent guide to creating your custom skill’s interaction model.
Intents and Intent Slots
The skill’s custom interaction model contains the
AzureFactsIntent intent, along with the boilerplate Cancel, Help and Stop intents. The
AzureFactsIntent intent contains two intent slots,
myName intent slot is a standard AMAZON.US_FIRST_NAME slot type. According to Amazon, this slot type understands thousands of popular first names commonly used by speakers in the United States. Shown below, I have included a short list of sample utterances in the intent model, which helps improve voice recognition for Alexa (gist).
Custom Slot Types and Entities
myQuestion intent slot is a custom slot type. According to Amazon, a custom slot type defines a list of representative values for the slot. The
myQuestion slot contains all the available facts the custom instructional skill understands and can retrieve from DynamoDB. Like
myName, the user can provide the fact intent in various ways (gist).
This slot also contains synonyms for each fact. Collectively, the slot value, it’s synonyms, and the optional ID are collectively referred to as an Entity. According to Amazon, entity resolution improves the way Alexa matches possible slot values in a user’s utterance with the slots defined in the skill’s interaction model.
An example of an entity in the
myQuestion custom slot type is ‘competition’. A user can ask Alexa to tell them about Azure’s competition. The slot value ‘competition’ returns a fact about Azure’s leading competitors, as reported on the G2 Crowd website’s Microsoft Azure Alternatives & Competitors page. However, the user might also substitute the words ‘competitor’ or ‘competitors’ for ‘competition’. Using synonyms, if the user utters any of these three words in their intent, they will receive the same response from Alexa (gist).
Initializing a skill with the ASK CLI also creates the default API endpoint, a Lambda (index.js). The serverless Lambda function is written in Node.js 8.10. As mentioned in the Introduction, AWS recently announced support for the Node.js 8.10 runtime, in April. This is the first LTS version of Node to support async/await with Promises. Node’s async/await is the new way of handling asynchronous operations in Node.js.
The layout of the custom skill’s Lambda’s code closely follows the custom Alexa Fact Skill example. I suggest closely reviewing this example. The Lambda has four main sections: constants, setup code, intent handlers, and helper functions.
In addition to the boilerplate Help, Stop, Error, and Session intent handlers, there are the
LaunchRequestHandler and the
AzureFactsIntent handlers. According to Amazon, a
LaunchRequestHandler fires when the Lambda receives a
LaunchRequest from Alexa, in which the user invokes the skill with the invocation name, but does not provide any command mapping to an intent.
AzureFactsIntent aligns with the custom intent we defined in the skill’s model (
en-US.json), of the same name. This handler handles an
IntentRequest from Alexa. This handler and the
buildFactResponse function the handler calls are what translate a request for a fact from the user into a request to DynamoDB for a response.
AzureFactsIntent handler checks the
IntentRequest for both the
myQuestion slot values. If the values are unfulfilled, the
AzureFactsIntent handler delegates responsibility back to Alexa, using a Dialog delegate directive (
addDelegateDirective). Alexa then requests the slot values from the user in a conversational interaction. Alexa then calls the
AzureFactsIntent handler again (gist).
Once both slot values are received by the
AzureFactsIntent handler, it calls the
buildFactResponse function, passing in the
myQuestion slot values. In turn, the
buildFactResponse function calls
AWS.DynamoDB.DocumentClient.update. The DynamoDB update returns a callback. In turn, the
What is unique about the DynamoDB
update call in this case, is it actually performs two functions. First, it implements an Atomic Counter. According to AWS, an atomic counter is a numeric DynamoDB attribute that is incremented, unconditionally, without interfering with other write requests. The update increments the numeric Hits attribute of the requested fact by exactly one. Secondly, the update returns the DynamoDB item. We can increment the count and get the response in a single call.
buildFactResponse function’s Promise returns the DynamoDB item, a JSON object, from the callback. An example of a JSON response payload is shown below. (gist).
AzureFactsIntent handler uses the async/await methods to perform the call to the
buildFactResponse function. Note line 7 of the
AzureFactsIntent handler below, where the
async method is applied directly to the handler. Note line 33 where the
await method is used with the call to the
buildFactResponse function (gist).
AzureFactsIntent handler awaits the Promise from the
buildFactResponse function. In an async function, you can await for any Promise or catch its rejection cause. If the update callback and the ensuing Promise were both returned successfully, the
AzureFactsIntent handler returns both a vocal and visual response to Alexa.
AWS IAM Role
By default, an AWS IAM Role was created by ASK when the project was initialized, the
ask-lambda-alexa-skill-azure-facts role. This role is automatically associated with the AWS Managed Policy,
AWSLambdaBasicExecutionRole. This managed policy simply allows the skill’s Lambda function to create Amazon CloudWatch Events (gist).
For the skill’s Lambda to read and write to DynamoDB, we must extend the default role’s permissions, by adding an additional policy. I have created a new
AzureFacts_Alexa_Skill IAM Policy, which allows the associated role to get and update items from the
AzureFacts DynamoDB table, and that is it. The role only has access to two of forty possible DynamoDB actions, and only for the
AzureFacts table, and nothing else. Following the principle of Least Privilege is a cornerstone of AWS Security (gist).
Below, we see the new IAM Policy in the AWS Management Console.
Below, we see the policy being applied to the skill’s IAM Role, along with the original AWS managed policy.
Deploying the Skill
Version 2 of the ASK CLI makes deploying the Alexa custom skill very easy. Using the ASK CLI’s
deploy command, we can validate and deploy the skill (manifest), model, and Lambda, all at once, as shown below. This makes DevOps automation of skill deployments with tools like Jenkins or AWS CodeDeploy straight-forward.
You can verify the skill has been deployed, from the Alexa Skills Kit Development Console. You should observe the skill’s model (intents, slots, entities, and endpoints) in the Build tab. You should observe the skill’s publishing details in the Distribution tab. Note deploying the skill does not submit the skill to Amazon’s for review and publishing, you must still submit the skill separately.
From the AWS Lambda Management Console, you should observe the skill’s Lambda was deployed. You should observe only the skill can trigger the Lambda. Lastly, you should observe that the correct IAM Role was applied to the Lambda, giving the Lambda access to Amazon CloudWatch Logs and Amazon DynamoDB.
Testing the Skill
The ASK CLI comes with the simulate command. According to Amazon, the
simulate command simulates an invocation of the skill with text-based input. Again, the ASK CLI makes DevOps test automation with tools like Jenkins or AWS CodeDeploy pretty easy (gist).
Below, are the results of simulating the invocation. The
simulate command returns the expected verbal response, including any SSML, and the visual responses (the Display Card). You could easily write an automation script to run a battery of these tests on every code commit, and prior to deployment.
I also like to manually test my skills from the Alexa Skills Kit Development Console Test tab. You may invoke the skill using your voice or by typing the skill invocation.
The Alexa Skills Kit Development Console Test tab both shows and speaks Alexa’s response. The console also displays the request and response body (JSON input/output), as well as the Display Card for an Echo Show and Echo Spot.
Lastly, the Alexa Skills Kit Development Console Test tab displays the Device Log. The log captures Alexa Directives and Events. I have found the Device Log to be very helpful in troubleshooting problems with deployed skills.
By default the custom skill outputs events to CloudWatch Logs. I have added the DynamoDB callback payload, as well as the slot values of myName and myQuestion to the logs, for each successful Alexa response. CloudWatch logs, like the Device Logs above, are very helpful in troubleshooting problems with deployed skills.
In this brief post, we have seen how to use the new ASK SDK/CLI version 2, services from the AWS Serverless Platform, and the LTS version of Node.js, to create an Alexa Custom Skill. Using the AWS Serverless Platform, we could easily extend the example to take advantage of additional serverless services, such as the use of Amazon SNS and SQS for notifications and messaging and Amazon Kinesis for analytics.
In a future post, we will extend this example, adding the capability to securely add and update our DynamoDB table’s items. We will use addition AWS services, including Amazon Cognito to authorize access to our API. We will also use AWS API Gateway to integrate with our Lambdas, producing a completely serverless API.
¹Azure is a trademark of Microsoft
All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.