Posts Tagged Voice
Building Serverless Actions for Google Assistant with Google Cloud Functions, Cloud Datastore, and Cloud Storage
In this post, we will create an Action for Google Assistant using the ‘Actions on Google’ development platform, Google Cloud Platform’s serverless Cloud Functions, Cloud Datastore, and Cloud Storage, and the current LTS version of Node.js. According to Google, Actions are pieces of software, designed to extend the functionality of the Google Assistant, Google’s virtual personal assistant, across a multitude of Google-enabled devices, including smartphones, cars, televisions, headphones, watches, and smart-speakers.
Here is a brief YouTube video preview of the final Action for Google Assistant, we will explore in this post, running on an Apple iPhone 8.
If you want to compare the development of an Action for Google Assistant with those of AWS and Azure, in addition to this post, please read my previous two posts in this series, Building and Integrating LUIS-enabled Chatbots with Slack, using Azure Bot Service, Bot Builder SDK, and Cosmos DB and Building Asynchronous, Serverless Alexa Skills with AWS Lambda, DynamoDB, S3, and Node.js. All three of the article’s demonstrations are written in Node.js, all three leverage their cloud platform’s machine learning-based Natural Language Understanding services, and all three take advantage of NoSQL database and storage services available on their respective cloud platforms.
The final architecture of our Action for Google Assistant will look as follows.
Here is a brief overview of the key technologies we will incorporate into our architecture.
Actions on Google
According to Google, Actions on Google is the platform for developers to extend the Google Assistant. Similar to Amazon’s Alexa Skills Kit Development Console for developing Alexa Skills, Actions on Google is a web-based platform that provides a streamlined user-experience to create, manage, and deploy Actions. We will use the Actions on Google platform to develop our Action in this post.
According to Google, Dialogflow is an enterprise-grade Natural language understanding (NLU) platform that makes it easy for developers to design and integrate conversational user interfaces into mobile apps, web applications, devices, and bots. Dialogflow is powered by Google’s machine learning for Natural Language Processing (NLP). Dialogflow was initially known as API.AI prior being renamed by Google in late 2017.
We will use the Dialogflow web-based development platform and version 2 of the Dialogflow API, which became GA in April 2018, to build our Action for Google Assistant’s rich, natural-language conversational interface.
Google Cloud Functions
Google Cloud Functions are the event-driven serverless compute platform, part of the Google Cloud Platform (GCP). Google Cloud Functions are comparable to Amazon’s AWS Lambda and Azure Functions. Cloud Functions is a relatively new service from Google, released in beta in March 2017, and only recently becoming GA at Cloud Next ’18 (July 2018). The main features of Cloud Functions include automatic scaling, high availability, fault tolerance, no servers to provision, manage, patch or update, and a payment model based on the function’s execution time. The programmatic logic behind our Action for Google Assistant will be handled by a Cloud Function.
We will write our Action’s Google Cloud Function using the Node.js 8 runtime. Google just released the ability to write Google Cloud Functions in Node 8.11.1 and Python 3.7.0, at Cloud Next ’18 (July 2018). It is still considered beta functionality. Previously, you had to write your functions in Node version 6 (currently, 6.14.0).
Node 8, also known as Project Carbon, was the first Long Term Support (LTS) version of Node to support async/await with Promises. Async/await is the new way of handling asynchronous operations in Node.js. We will make use of async/await and Promises within our Action’s Cloud Function.
Google Cloud Datastore
Google Cloud Datastore is a highly-scalable NoSQL database. Cloud Datastore is similar in features and capabilities to Azure Cosmos DB and Amazon DynamoDB. Datastore automatically handles sharding and replication and offers features like a RESTful interface, ACID transactions, SQL-like queries, and indexes. We will use Datastore to persist the information returned to the user from our Action for Google Assistant.
Google Cloud Storage
The last technology, Google Cloud Storage is secure and durable object storage, nearly identical to Amazon Simple Storage Service (Amazon S3) and Azure Blob Storage. We will store publicly accessible images in a Google Cloud Storage bucket, which will be displayed in Google Assistant Basic Card responses.
To demonstrate Actions for Google Assistant, we will build an informational Action that responds to the user with interesting facts about Azure, Microsoft’s Cloud computing platform (Google talking about Azure, ironic). Note this is not intended to be an official Microsoft bot and is only used for demonstration purposes.
All open-sourced code for this post can be found on GitHub. Note code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
This post will focus on the development and integration of an Action with Google Cloud Platform’s serverless and asynchronous Cloud Functions, Cloud Datastore, and Cloud Storage. The post is not intended to be a general how-to on developing and publishing Actions for Google Assistant, or how to specifically use services on the Google Cloud Platform.
Building the Action will involve the following steps.
- Design the Action’s conversation model;
- Import the Azure Facts Entities into Cloud Datastore on GCP;
- Create and upload the images to Cloud Storage on GCP;
- Create the new Actions on Google project using the Actions on Google console;
- Develop the Action’s Intent using the Dialogflow console;
- Bulk import the Action’s Entities using the Dialogflow console;
- Configure the Dialogflow Actions on Google Integration;
- Develop and deploy the Cloud Function to GCP;
- Test the Action using Actions on Google Simulator;
Let’s explore each step in more detail.
The conversational model design of the Azure Tech Facts Action for Google Assistant is similar to the Azure Tech Facts Alexa Custom Skill, detailed in my previous post. We will have the option to invoke the Action in two ways, without initial intent (Explicit Invocation) and with intent (Implicit Invocation), as shown below. On the left, we see an example of an explicit invocation of the Action. Google Assistant then queries the user for more information. On the right, an implicit invocation of the Action includes the intent, being the Azure fact they want to learn about. Google Assistant responds directly, both verbally and visually with the fact.
Each fact returned by Google Assistant will include a Simple Response, Basic Card and Suggestions response types for devices with a display, as shown below. The user may continue to ask for additional facts or choose to cancel the Action at any time.
Lastly, as part of the conversational model, we will include the option of asking for a random fact, as well as asking for help. Examples of both are shown below. Again, Google Assistant responds to the user, vocally and, optionally, visually, for display-enabled devices.
GCP Account and Project
The following steps assume you have an existing GCP account and you have created a project on GCP to house the Cloud Function, Cloud Storage Bucket, and Cloud Datastore Entities. The post also assumes that you have the Google Cloud SDK installed on your development machine, and have authenticated your identity from the command line (gist).
Google Cloud Storage
First, the images, actually Azure icons available from Microsoft, displayed in the responses shown above, are uploaded to a Google Storage Bucket. To handle these tasks, we will use the
gsutil CLI to create, upload, and manage the images. The gsutil CLI tool, like
gcloud, is part of the Google Cloud SDK. The
gsutil mb (make bucket) command creates the bucket,
gsutil cp (copy files and objects) command is used to copy the images to the new bucket, and finally, the
gsutil iam (get, set, or change bucket and/or object IAM permissions) command is used to make the images public. I have included a shell script,
bucket-uploader.sh, to make this process easier. (gist).
From the Storage Console on GCP, you should observe the images all have publicly accessible URLs. This will allow the Cloud Function to access the bucket, and retrieve and display the images. There are more secure ways to store and display the images from the function. However, this is the simplest method since we are not concerned about making the images public.
We will need the URL of the new Storage bucket, later, when we develop to our Action’s Cloud Function. The bucket URL can be obtained from the Storage Console on GCP, as shown below in the Link URL.
Google Cloud Datastore
In Cloud Datastore, the category data object is referred to as a Kind, similar to a Table in a relational database. In Datastore, we will have an ‘AzureFact’ Kind of data. In Datastore, a single object is referred to as an Entity, similar to a Row in a relational database. Each one of our entities represents a unique reference value from our Azure Facts Intent’s facts entities, such as ‘competition’ and ‘certifications’. Individual data is known as a Property in Datastore, similar to a Column in a relational database. We will have four Properties for each entity: name, response, title, and image. Lastly, a Key in Datastore is similar to a Primary Key in a relational database. The Key we will use for our entities is the unique reference value string from our Azure Facts Intent’s facts entities, such as ‘competition’ or ‘certifications’. The Key value is stored within the entity’s name Property.
There are a number of ways to create the Datastore entities for our Action, including manually from the Datastore console on GCP. However, to automate the process, we will use a script, written in Node.js and using the Google Cloud Datastore Node.js Client, to create the entities. We will use the Client API’s Datastore Class upsert method, which will create or update an entire collection of entities with one call and returns a callback. The script ,
upsert-entities.js, is included in source control and can be run with the following command. Below is a snippet of the script, which shows the structure of the entities (gist).
upsert command completes successfully, you should observe a collection of ‘AzureFact’ Type Datastore Entities in the Datastore console on GCP.
Below, we see the structure of a single Datastore Entity, the ‘certifications’ Entity, containing the fact response, title, and name of the image, which is stored in our Google Storage bucket.
New ‘Actions on Google’ Project
With the images uploaded and the database entries created, we can start building our Actions for Google Assistant. Using the Actions on Google web console, we first create a new Actions project.
The Directory Information tab is where we define metadata about the project. This information determines how it will look in the Actions directory and is required to publish your project. The Actions directory is where users discover published Actions on the web and mobile devices.
Actions and Intents
Our project will contain a series of related Actions. According to Google, an Action is ‘an interaction you build for the Assistant that supports a specific intent and has a corresponding fulfillment that processes the intent.’ To build our Actions, we first want to create our Intents. To do so, we will want to switch from the Actions on Google console to the Dialogflow console. Actions on Google provides a link for switching to Dialogflow in the Actions tab.
We will build our Action’s Intents in Dialogflow. The term Intent, used by Dialogflow, is standard terminology across other voice-assistant platforms, such as Amazon’s Alexa and Microsoft’s Azure Bot Service and LUIS. In Dialogflow, will be building Intents—the Azure Facts Intent, Welcome Intent, and the Fallback Intent.
Below, we see the Azure Facts Intent. The Azure Facts Intent is the main Intent, responsible for handling our user’s requests for facts about Azure. The Intent includes a fair number, but certainly not an exhaustive list, of training phrases. These represent all the possible ways a user might express intent when invoking the Action. According to Google, the greater the number of natural language examples in the Training Phrases section of Intents, the better the classification accuracy.
Each of the highlighted words in the training phrases maps to the facts parameter, which maps to a collection of @facts Entities. Entities represent a list of intents the Action is trained to understand. According to Google, there are three types of entities: system (defined by Dialogflow), developer (defined by a developer), and user (built for each individual end-user in every request) entities. We will be creating developer type entities for our Action’s Intent.
An entity contains Synonyms. Multiple synonyms may be mapped to a single reference value. The reference value is the value passed to the Cloud Function by the Action. For example, take the reference value of ‘competition’. A user might ask Google about Azure’s competition. However, the user might also substitute the words ‘competitor’ or ‘competitors’ for ‘competition’. Using synonyms, if the user utters any of these three words in their intent, they will receive the same response.
Although our Azure Facts Action is a simple example, typical Actions might contain hundreds of entities or more, each with several synonyms. Dialogflow provides the option of copy and pasting bulk entities, in either JSON or CSV format. The project’s source code includes both JSON or CSV formats, which may be input in this manner.
Not every possible fact, which will have a response, returned by Google Assistant, needs an entity defined. For example, we created a ‘compliance’ Cloud Datastore Entity. The Action understands the term ‘compliance’ and will return a response to the user if they ask about Azure compliance. However, ‘compliance’ is not defined as an Intent Entity, since we have chosen not to define any synonyms for the term ‘compliance’.
In order to allow this, you must enable Allow Automated Expansion. According to Google, this option allows an Agent to recognize values that have not been explicitly listed in the entity. Google describes Agents as NLU (Natural Language Understanding) modules.
Actions on Google Integration
Another configuration item in Dialogflow that needs to be completed is the Dialogflow’s Actions on Google integration. This will integrate the Azure Tech Facts Action with Google Assistant. Google provides more than a dozen different integrations, as shown below.
The Dialogflow’s Actions on Google integration configuration is simple, just choose the Azure Facts Intent as our Action’s Implicit Invocation intent, in addition to the default Welcome Intent, which is our Action’s Explicit Invocation intent. According to Google, integration allows our Action to reach users on every device where the Google Assistant is available.
When an intent is received from the user, it is fulfilled by the Action. In the Dialogflow Fulfillment console, we see the Action has two fulfillment options, a Webhook or a Cloud Function, which can be edited inline. A Webhook allows us to pass information from a matched intent into a web service and get a result back from the service. In our example, our Action’s Webhook will call our Cloud Function, using the Cloud Function’s URL endpoint. We first need to create our function in order to get the endpoint, which we will do next.
Google Cloud Functions
Our Cloud Function, called by our Action, is written in Node.js 8. As stated earlier, Node 8 LTS was the first LTS version to support async/await with Promises. Async/await is the new way of handling asynchronous operations in Node.js, replacing callbacks.
Our function, index.js, is divided into four sections: constants, intent handlers, helper functions, and the function’s entry point. The Cloud Function attempts to follow many of the coding practices from Google’s code examples on Github.
The section defines the global constants used within the function. Note the constant for the URL of our new Cloud Storage bucket, on line 30 below,
IMAGE_BUCKET, references an environment variable,
process.env.IMAGE_BUCKET. This value is set in the
.env.yaml file. All environment variables in the
.env.yaml file will be set during the Cloud Function’s deployment, explained later in this post. Environment variables were recently released, and are still considered beta functionality (gist).
The npm package dependencies declared in the constants section, are defined in the dependencies section of the
package.json file. Function dependencies include Actions on Google, Firebase Functions, and Cloud Datastore (gist).
The three intent handlers correspond to the three intents in the Dialogflow console: Azure Facts Intent, Welcome Intent, and Fallback Intent. Each handler responds in a very similar fashion. The handlers all return a
SimpleResponse for audio-only and display-enabled devices. Optionally, a
BasicCard is returned for display-enabled devices (gist).
The Welcome Intent handler handles explicit invocations of our Action. The Fallback Intent handler handles both help requests, as well as cases when Dialogflow cannot match any of the user’s input. Lastly, the Azure Facts Intent handler handles implicit invocations of our Action, returning a fact to the user from Cloud Datastore, based on the user’s requested fact.
The next section of the function contains two helper functions. The primary function is the
buildFactResponse function. This is the function that queries Google Cloud Datastore for the fact. The second function, the
selectRandomFact, handles the fact value of ‘random’, by selecting a random fact value to query Datastore. (gist).
Async/Await, Promises, and Callbacks
Let’s look closer at the relationship and asynchronous nature of the Azure Facts Intent intent handler and
buildFactResponse function. Below, note the
async function on line 1 in the intent and the
await function on line 3, which is part of the
buildFactResponse function call. This is typically how we see async/await applied when calling an asynchronous function, such as
await function allows the intent’s execution to wait for the
buildFactResponse function’s Promise to be resolved, before attempting to use the resolved value to construct the response.
buildFactResponse function returns a Promise, as seen on line 28. The Promise’s payload contains the results of the successful callback from the Datastore API’s
runQuery function. The
runQuery function returns a callback, which is then resolved and returned by the Promise, as seen on line 40 (gist).
The payload returned by Google Datastore, through the resolved Promise to the intent handler, will resemble the example response, shown below. Note the
title key/value pairs in the
textPayload section of the response payload. These are what are used to format the
BasicCard responses (gist).
Cloud Function Deployment
To deploy the Cloud Function to GCP, use the
gcloud CLI with the beta version of the functions deploy command. According to Google,
gcloud is a part of the Google Cloud SDK. You must download and install the SDK on your system and initialize it before you can use
gcloud. You should ensure that your function is deployed to the same region as your Google Storage Bucket. Currently, Cloud Functions are only available in four regions. I have included a shell script,
deploy-cloud-function.sh, to make this step easier. (gist).
The creation or update of the Cloud Function can take up to two minutes. Note the
.gcloudignore file referenced in the verbose output below. This file is created the first time you deploy a new function. Using the the
.gcloudignore file, you can limit the deployed files to just the function (
index.js) and the
package.json file. There is no need to deploy any other files to GCP.
If you recall, the URL endpoint of the Cloud Function is required in the Dialogflow Fulfillment tab. The URL can be retrieved from the deployment output (shown above), or from the Cloud Functions Console on GCP (shown below). The Cloud Function is now deployed and will be called by the Action when a user invokes the Action.
Simulation Testing and Debugging
With our Action and all its dependencies deployed and configured, we can test the Action using the Simulation console on Actions on Google. According to Google, the Action Simulation console allows us to manually test our Action by simulating a variety of Google-enabled hardware devices and their settings. You can also access debug information such as the request and response that your fulfillment receives and sends.
Below, in the Action Simulation console, we see the successful display of the initial Azure Tech Facts containing the expected Simple Response, Basic Card, and Suggestions, triggered by a user’s explicit invocation of the Action.
The simulated response indicates that the Google Cloud Function was called, and it responded successfully. It also indicates that the Google Cloud Function was able to successfully retrieve the correct image from Google Cloud Storage.
Below, we see the successful response to the user’s implicit invocation of the Action, in which they are seeking a fact about Azure’s Cognitive Services. The simulated response indicates that the Google Cloud Function was called, and it responded successfully. It also indicates that the Google Cloud Function was able to successfully retrieve the correct Entity from Google Cloud Datastore, as well as the correct image from Google Cloud Storage.
If we had issues with the testing, the Action Simulation console also contains tabs containing the request and response objects sent to and from the Cloud Function, the audio response, a debug console, and any errors.
Logging and Analytics
In addition to the Simulation console’s ability to debug issues with our service, we also have Google Stackdriver Logging. The Stackdriver logs, which are viewed from the GCP management console, contain the complete requests and responses, to and from the Cloud Function, from the Google Assistant Action. The Stackdriver logs will also contain any logs entries you have explicitly placed in the Cloud Function.
We also have the ability to view basic Analytics about our Action from within the Dialogflow Analytics console. Analytics displays metrics, such as the number of sessions, the number of queries, the number of times each Intent was triggered, how often users exited the Action from an intent, and Sessions flows, shown below.
In simple Action such as this one, the Session flow is not very beneficial. However, in more complex Actions, with multiple Intents and a variety potential user interactions, being able to visualize Session flows becomes essential to understanding the user’s conversational path through the Action.
In this post, we have seen how to use the Actions on Google development platform and the latest version of the Dialogflow API to build Google Actions. Google Actions rather effortlessly integrate with the breath Google Cloud Platform’s many serverless offerings, including Google Cloud Functions, Cloud Datastore, and Cloud Storage.
We have seen how Google is quickly maturing their serverless functions, to compete with AWS and Azure, with the recently announced support of LTS version 8 of Node.js and Python, to create an Actions for Google Assistant.
Impact of Serverless
As an Engineer, I have spent endless days, late nights, and thankless weekends, building, deploying and managing servers, virtual machines, container clusters, persistent storage, and database servers. I think what is most compelling about platforms like Actions on Google, but even more so, serverless technologies on GCP, is that I spend the majority of my time architecting and developing compelling software. I don’t spend time managing infrastructure, worrying about capacity, configuring networking and security, and doing DevOps.
¹Azure is a trademark of Microsoft
All opinions expressed in this post are my own and not necessarily the views of my current or past employers, their clients, or Google and Microsoft.