In the following post, we will use the new version 2 of the Alexa Skills Kit, AWS Lambda, Amazon DynamoDB, Amazon S3, and the latest LTS version Node.js, to create an Alexa Custom Skill. According to Amazon, a custom skill allows you to define the requests the skill can handle (intents) and the words users say to invoke those requests (utterances).
If you want to compare the development of an Alexa Custom Skill with those of Google and Azure, in addition to this post, please read my previous two posts in this series, Building and Integrating LUIS-enabled Chatbots with Slack, using Azure Bot Service, Bot Builder SDK, and Cosmos DB and Building Serverless Actions for Google Assistant with Google Cloud Functions, Cloud Datastore, and Cloud Storage. All three of the article’s demonstrations are written in Node.js, all three leverage their cloud platform’s machine learning-based Natural Language Understanding services, and all three take advantage of NoSQL database and storage services available on their respective cloud platforms.
The final high-level architecture of our Alexa Custom Skill will look as follows.
Here is a brief overview of the key AWS technologies we will incorporate into our Skill’s architecture.
Alexa Skills Kit
According to Amazon, the Alexa Skills Kit (ASK) is a collection of self-service APIs, tools, documentation, and code samples that makes it possible to add skills to Alexa. The Alexa Skills Kit supports building different types of skills. Currently, Alexa skill types include Custom, Smart Home, Video, Flash Briefing, and List Skills. Each skill type makes use of a different Alexa Skill API.
AWS Serverless Platform
To create a custom skill for Alexa, you currently have the choice of using an AWS Lambda function or a web service. The AWS Lambda is part of an ecosystem of Cloud services and Developer tools, Amazon refers to as the AWS Serverless Platform. The platform’s services are designed to support the development and hosting of highly-performant, enterprise-grade serverless applications.
In April 2018, AWS Lamba announced support for the Node.js 8.10 runtime, which is the current Long Term Support (LTS) version of Node.js. Node 8, also known as Project Carbon, was the first LTS version of Node to support async/await with Promises. Async/await is the new way of handling asynchronous operations in Node.js. We will make use of async/await and Promises with the custom skill.
To demonstrate Alexa Custom Skills we will build an informational skill that responds to the user with interesting facts about Azure¹, Microsoft’s Cloud computing platform (Alexa talking about Azure, ironic, I know). This is not an official Microsoft skill; it is only used for this demonstration and has not been published.
All open-source code for this post can be found on GitHub. Code samples in this post are displayed as GitHub Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
Important, this post and the associated source code were updated from v1.0 to v2.0 on 13 August 2018. You should clone the GitHub project again, to correspond with this revised post, if you originally cloned the project before 14 August 2018. Code changes were significant.
This objective of the fact-based skill will be to demonstrate the following.
- Build, deploy, and test an Alexa Custom Skill using AWS Lambda and Node.js;
- Use DynamoDB to store and retrieve Alexa voice responses;
- Maintain a count of user’s questions in DynamoDB using atomic counters;
- Use Amazon S3 to store and retrieve images, used in Display Cards;
- Log Alexa Skill activities using Amazon CloudWatch Logs;
Steps to Build
Building the Azure fact skill will involve the following steps.
- Design the Alexa skill’s voice interaction model;
- Design the skill’s Display Cards for Alexa-enabled products, to enhance the voice experience;
- Create the skill’s DynamoDB table and import the responses the skill will return;
- Create an S3 bucket and upload the images used for the Display Cards;
- Write the Alexa Skill, which involves mapping the user’s spoken input to the intents your cloud-based service can handle;
- Write the Lambda function, which involves responding to the user’s utterances, by building and returning appropriate voice and display card responses, from DynamoDB and S3;
- Extend the default ASK-generated AWS IAM Role, to allow the Lambda to update DynamoDB;
- Deploy the skill;
- Test the skill;
Let’s explore each step in detail.
Voice Interaction Model
First, we must design the fact skill’s voice interaction model. We need to consider the way we want the user to interact with the skill. What is the user’s conversational journey? How do they invoke your skill? How will the user provide intent?
This skill will require two intent slot values, the fact the user is interested in (i.e. ‘global infrastructure’) and the user’s first name (i.e. ‘Susan’). We will train the skill to allow Alexa to query the user for each slot value, but also allow the user to provide either or both values in the initial intent invocation. We will also allow the user to request a random fact.
Shown below in the Alexa Skills Kit Development Console Test tab are three examples of interactions the skill is trained to understand and handle:
- The first example on the left invokes the skill with no intent (‘Alexa, load Azure Tech Facts). The user is led through a series of three questions to obtain the full intent.
- The center example is similar, however, the initial invocation contains a partial intent (‘Alexa, ask Azure Tech Facts for a fact about certifications’). Alexa must still ask for the user’s name.
- Lastly, the example on the right is a so-called ‘one-shot’ invocation (‘Alexa, ask Azure Tech Facts about Azure’s platforms for Gary’). The user’s invocation of the skill contains a complete intent, allowing Alexa to respond immediately with a fact about Azure platforms.
In all cases, our skill has the ability to continue to provide the user with additional facts if they chose, or they may cancel at any time.
We also need to design how Alexa will respond. What is the persona will Alexa assume through her words, phrases, and use of Speech Synthesis Markup Language (SSML).
User Interaction Previews
Here are a few examples of interactions with the final Alexa skill using an iPhone 8 and the Alexa App. They are intended to show the rich conversational capabilities of custom skills more so the than the display, which is pretty poor on the Alexa App as compared to the Echo Show or even Echo Spot.
Example 1: Indirect Invocation
The first example shows a basic interaction with our Alexa skill. It demonstrates an indirect invocation, a user utterance without initial intent. It also illustrates several variations of user utterances (YouTube).
Example 2: Direct Invocation
The second example of an interaction our skill demonstrates a direct invocation, in which the initial user utterance contains intent. It also demonstrates the user following up with additional requests (YouTube).
Example 3: Direct Invocation, Help, Problem
Lastly, another direct invocation demonstrates the use of the Help Intent. You also see an example of when Alexa does not understand the user’s utterance. The user is able to repeat their request, more clearly (YouTube).
Visual Interaction Model
Many Alexa-enabled devices are capable of both vocal and visual responses. Designing for a multimodal user experience is important. The instructional skill will provide vocal responses, as well as Display Cards optimized for the Amazon Echo Show. The skill contains a basic design for the Display Card shown during the initial invocation, where there is no intent uttered by the user.
The fact skill also contains a Display Card, designed to present the final Alexa response to the user’s intent. The content of the vocal and visual response is returned from DynamoDB via the Lambda function. The random Azure icons, available from Microsoft, are hosted in an S3 bucket. Each fact response is unique, as well as the icon associated with the fact.
The Display Cards will also work on other Alexa-enabled screen-based products. Shown below is the same card on an iPhone 8 using the Amazon Alexa app. This is the same app shown in the videos, above.
Next, we create the DynamoDB table used to store the facts the Alexa skill will respond with when invoked by the user. DynamoDB is Amazon’s non-relational database that delivers reliable performance at any scale. DynamoDB consists of three basic components: tables, items, and attributes.
There are numerous ways to create a DynamoDB table. For simplicity, I created the
AzureFacts DynamoDB table using the AWS CLI (gist). You could also choose CloudFormation, or create the table using any of nine or more programming languages with an AWS SDK.
AzureFacts table’s schema has four key/value pair attributes per item: Fact, Response, Image, and Hits. The Fact attribute, a string, contains the name of the fact the user is seeking. The Fact attribute also serves as the table’s unique partition key. The Response attribute, a string, contains the conversational response Alexa will return. The Image attribute, a string, contains the name of the image in the S3 bucket displayed by Alexa. Lastly, the Hits attribute, a number, stores the number of user requests for a particular fact.
Importing Table Items
After the DynamoDB table is created, the pre-defined facts are imported into the empty table using AWS CLI (gist). The JSON-formatted data file, AzureFacts.json, is included with the source code on GitHub.
The resulting table should appear as follows in the AWS Management Console.
Note the imported items shown below. The Hits counts reflect the number of times each fact has been requested.
Shown below is a detailed view of a single item that was imported into the DynamoDB table.
Amazon S3 Image Bucket
Next, we create the Amazon S3 bucket, which will house the images, actually Azure icons as PNGs, returned by Alexa with each fact. Again, I used the AWS CLI for simplicity (gist).
The images can be uploaded manually to the bucket through a web browser, or programmatically, using the AWS CLI or SDKs. You will need to ensure the images are made public so they can be displayed by Alexa.
Next, we create the actual Alexa custom skill. I have used version 2 of the Alexa Skills Kit (ASK) Software Development Kit (SDK) for Node.js and the new ASK Command Line Interface (ASK CLI) to create the skill. The ASK SDK v2 for Node.js was recently released in April 2018. If you have previously written Alexa skills using version 1 of the Node.js SDK, the creation of a new project and the format of the Lambda Node.js code is somewhat different. I strongly suggest reviewing the example skills provided by Amazon on GitHub.
With version 1, I would have likely used the Alexa Skills Kit Development Console to develop and deploy the skill, and separate IDE, like JetBrains WebStorm, to write the Lambda. The JSON-format skill would live in the Alexa Skills Kit Development Console, and my Lambda in source control. I would have used AWS Serverless Application Model (AWS SAM) or Claudia.js to handle the deployment of Lambda functions.
With version 2 of ASK, you can easily create and manage the Alexa skill’s JSON-formatted code, as well as the Lambda, all from the command-line and a single IDE or text editor. All components that comprise the skill can be kept together in source control. I now only use the Alexa Skills Kit Development Console to preview my deployed skill and for testing. I am not going to go into detail about creating a new project using the ASK CLI, I suggest reviewing Amazon’s instructional guides.
Below, I have initiated a new AWS profile for the Alexa skill using the
ask init command.
There are three main parts to the new skill project created by the ASK CLI: the skill’s manifest (skill.json), model(s) (en-US.json), and API endpoint, the Lambda (index.js). The skill’s manifest, skill.json, contains information (metadata) about the skill. This is the same information you find in the Distribution tab of the Alexa Skills Kit Development Console. The manifest includes publishing information, example phrases to invoke the skill, the skill’s category, distribution locales, privacy information, and the location of the skill’s API endpoint, the Lambda. An end-user would most commonly see this information in Amazon Alexa app when adding skills to their Alexa-enabled devices.
Next, the skill’s model, en-US.json, is located the models sub-directory. This file defines the skill’s custom interaction model, it contains the skill’s interaction model written in JSON, which includes the invocation name, intents, standard and custom slots, sample utterances, slot values, and synonyms of those values. This is the same information you would find in the Build tab of the Alexa Skills Kit Development Console. Amazon has an excellent guide to creating your custom skill’s interaction model.
Intents and Intent Slots
The skill’s custom interaction model contains the
AzureFactsIntent intent, along with the boilerplate Cancel, Help and Stop intents. The
AzureFactsIntent intent contains two intent slots,
myName intent slot is a standard AMAZON.US_FIRST_NAME slot type. According to Amazon, this slot type understands thousands of popular first names commonly used by speakers in the United States. Shown below, I have included a short list of sample utterances in the intent model, which helps improve voice recognition for Alexa (gist).
Custom Slot Types and Entities
myQuestion intent slot is a custom slot type. According to Amazon, a custom slot type defines a list of representative values for the slot. The
myQuestion slot contains all the available facts the custom instructional skill understands and can retrieve from DynamoDB. Like
myName, the user can provide the fact intent in various ways (gist).
This slot also contains synonyms for each fact. Collectively, the slot value, it’s synonyms, and the optional ID are collectively referred to as an Entity. According to Amazon, entity resolution improves the way Alexa matches possible slot values in a user’s utterance with the slots defined in the skill’s interaction model.
An example of an entity in the
myQuestion custom slot type is ‘competition’. A user can ask Alexa to tell them about Azure’s competition. The slot value ‘competition’ returns a fact about Azure’s leading competitors, as reported on the G2 Crowd website’s Microsoft Azure Alternatives & Competitors page. However, the user might also substitute the words ‘competitor’ or ‘competitors’ for ‘competition’. Using synonyms, if the user utters any of these three words in their intent, they will receive the same response from Alexa (gist).
Initializing a skill with the ASK CLI also creates the default API endpoint, a Lambda (index.js). The serverless Lambda function is written in Node.js 8.10. As mentioned in the Introduction, AWS recently announced support for the Node.js 8.10 runtime, in April. This is the first LTS version of Node to support async/await with Promises. Node’s async/await is the new way of handling asynchronous operations in Node.js.
The layout of the custom skill’s Lambda’s code closely follows the custom Alexa Fact Skill example. I suggest closely reviewing this example. The Lambda has four main sections: constants, setup code, intent handlers, and helper functions.
In addition to the boilerplate Help, Stop, Error, and Session intent handlers, there are the
LaunchRequestHandler and the
AzureFactsIntent handlers. According to Amazon, a
LaunchRequestHandler fires when the Lambda receives a
LaunchRequest from Alexa, in which the user invokes the skill with the invocation name, but does not provide any command mapping to an intent.
AzureFactsIntent aligns with the custom intent we defined in the skill’s model (
en-US.json), of the same name. This handler handles an
IntentRequest from Alexa. This handler and the
buildFactResponse function the handler calls are what translate a request for a fact from the user into a request to DynamoDB for a response.
AzureFactsIntent handler checks the
IntentRequest for both the
myQuestion slot values. If the values are unfulfilled, the
AzureFactsIntent handler delegates responsibility back to Alexa, using a Dialog delegate directive (
addDelegateDirective). Alexa then requests the slot values from the user in a conversational interaction. Alexa then calls the
AzureFactsIntent handler again (gist).
Once both slot values are received by the
AzureFactsIntent handler, it calls the
buildFactResponse function, passing in the
myQuestion slot values. In turn, the
buildFactResponse function calls
AWS.DynamoDB.DocumentClient.update. The DynamoDB update returns a callback. In turn, the
What is unique about the DynamoDB
update call in this case, is it actually performs two functions. First, it implements an Atomic Counter. According to AWS, an atomic counter is a numeric DynamoDB attribute that is incremented, unconditionally, without interfering with other write requests. The update increments the numeric Hits attribute of the requested fact by exactly one. Secondly, the update returns the DynamoDB item. We can increment the count and get the response in a single call.
buildFactResponse function’s Promise returns the DynamoDB item, a JSON object, from the callback. An example of a JSON response payload is shown below. (gist).
AzureFactsIntent handler uses the async/await methods to perform the call to the
buildFactResponse function. Note line 7 of the
AzureFactsIntent handler below, where the
async method is applied directly to the handler. Note line 33 where the
await method is used with the call to the
buildFactResponse function (gist).
AzureFactsIntent handler awaits the Promise from the
buildFactResponse function. In an async function, you can await for any Promise or catch its rejection cause. If the update callback and the ensuing Promise were both returned successfully, the
AzureFactsIntent handler returns both a vocal and visual response to Alexa.
AWS IAM Role
By default, an AWS IAM Role was created by ASK when the project was initialized, the
ask-lambda-alexa-skill-azure-facts role. This role is automatically associated with the AWS Managed Policy,
AWSLambdaBasicExecutionRole. This managed policy simply allows the skill’s Lambda function to create Amazon CloudWatch Events (gist).
For the skill’s Lambda to read and write to DynamoDB, we must extend the default role’s permissions, by adding an additional policy. I have created a new
AzureFacts_Alexa_Skill IAM Policy, which allows the associated role to get and update items from the
AzureFacts DynamoDB table, and that is it. The role only has access to two of forty possible DynamoDB actions, and only for the
AzureFacts table, and nothing else. Following the principle of Least Privilege is a cornerstone of AWS Security (gist).
Below, we see the new IAM Policy in the AWS Management Console.
Below, we see the policy being applied to the skill’s IAM Role, along with the original AWS managed policy.
Deploying the Skill
Version 2 of the ASK CLI makes deploying the Alexa custom skill very easy. Using the ASK CLI’s
deploy command, we can validate and deploy the skill (manifest), model, and Lambda, all at once, as shown below. This makes DevOps automation of skill deployments with tools like Jenkins or AWS CodeDeploy straight-forward.
You can verify the skill has been deployed, from the Alexa Skills Kit Development Console. You should observe the skill’s model (intents, slots, entities, and endpoints) in the Build tab. You should observe the skill’s publishing details in the Distribution tab. Note deploying the skill does not submit the skill to Amazon’s for review and publishing, you must still submit the skill separately.
From the AWS Lambda Management Console, you should observe the skill’s Lambda was deployed. You should observe only the skill can trigger the Lambda. Lastly, you should observe that the correct IAM Role was applied to the Lambda, giving the Lambda access to Amazon CloudWatch Logs and Amazon DynamoDB.
Testing the Skill
The ASK CLI comes with the simulate command. According to Amazon, the
simulate command simulates an invocation of the skill with text-based input. Again, the ASK CLI makes DevOps test automation with tools like Jenkins or AWS CodeDeploy pretty easy (gist).
Below, are the results of simulating the invocation. The
simulate command returns the expected verbal response, including any SSML, and the visual responses (the Display Card). You could easily write an automation script to run a battery of these tests on every code commit, and prior to deployment.
I also like to manually test my skills from the Alexa Skills Kit Development Console Test tab. You may invoke the skill using your voice or by typing the skill invocation.
The Alexa Skills Kit Development Console Test tab both shows and speaks Alexa’s response. The console also displays the request and response body (JSON input/output), as well as the Display Card for an Echo Show and Echo Spot.
Lastly, the Alexa Skills Kit Development Console Test tab displays the Device Log. The log captures Alexa Directives and Events. I have found the Device Log to be very helpful in troubleshooting problems with deployed skills.
By default the custom skill outputs events to CloudWatch Logs. I have added the DynamoDB callback payload, as well as the slot values of myName and myQuestion to the logs, for each successful Alexa response. CloudWatch logs, like the Device Logs above, are very helpful in troubleshooting problems with deployed skills.
In this brief post, we have seen how to use the new ASK SDK/CLI version 2, services from the AWS Serverless Platform, and the LTS version of Node.js, to create an Alexa Custom Skill. Using the AWS Serverless Platform, we could easily extend the example to take advantage of additional serverless services, such as the use of Amazon SNS and SQS for notifications and messaging and Amazon Kinesis for analytics.
In a future post, we will extend this example, adding the capability to securely add and update our DynamoDB table’s items. We will use addition AWS services, including Amazon Cognito to authorize access to our API. We will also use AWS API Gateway to integrate with our Lambdas, producing a completely serverless API.
¹Azure is a trademark of Microsoft
All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.