Posts Tagged AWS

Architecting a Successful SaaS: Understanding Cloud-based SaaS Models

Orginally published on the AWS APN Blog.

You’re a startup with an idea for a revolutionary new software product. You quickly build a beta version and deploy it to the cloud. After a successful social-marketing campaign and concerted sales effort, dozens of customers subscribe to your SaaS-based product. You’re ecstatic…until you realize you never architected your product for this level of success. You were so busy coding, raising capital, marketing, and selling, you never planned how you would scale your Sass product. How you would ensure your customer’s security, as well as your own. How you would meet the product reliability, compliance, and performance you promised. And, how you would monitor and meter your customer’s usage, no matter how fast you or they grew.

I’ve often heard budding entrepreneurs jest, if only success was their biggest problem. Certainly, success won’t be their biggest problem. For many, the problems come afterward, when they disappoint their customers by failing to deliver the quality product they promised. Or worse, damaging their customer’s reputation (and their own) by losing or exposing sensitive data. As the old saying goes, ‘you never get a second chance to make a first impression.’ Customer trust is hard-earned and easily lost. Properly architecting a scalable and secure SaaS-based product is just as important as feature development and sales. No one wants to fail on Day 1—you worked too hard to get there.

Architecting a Successful SaaS

In this series of posts, Architecting a Successful SaaS, we will explore how to properly plan and architect a SaaS product offering, designed for hosting on the cloud. We will start by answering basic questions, like, what is SaaS, what are the alternatives to SaaS for software distribution, and what are the most common SaaS product models. We will then examine different high-level SaaS architectures, review tenant isolation strategies, and explore how SaaS vendors securely interact with their customer’s cloud accounts. Finally, we will discuss how SaaS providers can meet established best practices, like those from AWS SaaS Factory and the AWS Well-Architected Framework.

For this post, I have chosen many examples of cloud services from AWS and vendors from AWS Marketplace. However, the principals discussed may be applied to other leading cloud providers, SaaS products, and cloud-based software marketplaces. All information in this post is publicly available.

What is SaaS?

According to AWS Marketplace, ‘SaaS [Software as a Service] is a delivery model for software applications whereby the vendor hosts and operates the application over the Internet. Customers pay for using the software without owning the underlying infrastructure.’ Another definition from AWS, ‘SaaS is a licensing and delivery model whereby software is centrally managed and hosted by a provider and available to customers on a subscription basis.’

A SaaS product, like other forms of software, is produced by what is commonly referred to as an Independent Software Vendor (ISV). According to Wikipedia, an Independent Software Vendor ‘is an organization specializing in making and selling software, as opposed to hardware, designed for mass or niche markets. This is in contrast to in-house software, which is developed by the organization that will use it, or custom software, which is designed or adapted for a single, specific third party. Although ISV-provided software is consumed by end-users, it remains the property of the vendor.’

Although estimates vary greatly, according to The Software as a Service (SaaS) Global Market Report 2020, the global SaaS market was valued at about $134.44B in 2018 and is expected to grow to $220.21B at a compound annual growth rate (CAGR) of 13.1% through 2022. Statista predicts SaaS revenues will grow even faster, forecasting revenues of $266B by 2022, with continued strong positive growth to $346B by 2027.

Cloud-based Usage Models

Let’s start by reviewing the three most common ways that individuals, businesses, academic institutions, the public sector, and government consume services from cloud providers such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud, and IBM Cloud (now includes Red Hat).

Indirect Consumer

Indirect consumers are customers who consume cloud-based SaaS products. Indirect users are often unlikely to know which cloud provider host’s the SaaS products to which they subscribe. Many SaaS products can import and export data, as well as integrate with other SaaS products. Many successful companies run their entire business in the cloud using a combination of SaaS products from multiple vendors.

SaaS-28

Examples

  • An advertising firm that uses Google G Suite for day-to-day communications and collaboration between its employees and clients.
  • A large automotive parts manufacturer that runs its business using the Workday cloud-based Enterprise Resource Management (ERP) suite.
  • A software security company that uses Zendesk for customer support. They also use the Slack integration for Zendesk to view, create, and take action on support tickets, using Slack channels.
  • A recruiting firm that uses Zoom Meetings & Chat to interview remote candidates. They also use the Zoom integration with Lever recruiting software, to schedule interviews.

Direct Consumer

Direct consumers are customers who use cloud-based Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) services to build and run their software; the DIY (do it yourself) model. The software deployed in the customer’s account may be created by the customer or purchased from a third-party software vendor and deployed within the customer’s cloud account. Direct users may purchase IaaS and PaaS services from multiple cloud providers.

SaaS-18

Examples

Hybrid Consumers

Hybrid consumers are customers who use a combination of IaaS, PaaS, and SaaS services. Customers often connect multiple IaaS, PaaS, and SaaS services as part of larger enterprise software application platforms.

SaaS-27

Examples

  • A payroll company that hosts its proprietary payroll software product, using IaaS products like Amazon EC2 and Elastic Load Balancing. In addition, they use an integrated SaaS-based fraud detection product, like Cequence Security CQ botDefense, to ensure the safety and security of payroll customers.
  • An online gaming company that operates its applications using the fully-managed container-based PaaS service, Amazon ECS. To promote their gaming products, they use a SaaS-based marketing product, like Mailchimp Marketing CRM.

Cloud-based Software

Most cloud-based software is sold in one of two ways, Customer-deployed or SaaS. Below, we see a breakdown by the method of product delivery on AWS Marketplace. All items in the chart, except SaaS, represent Customer-deployed products. Serverless applications are available elsewhere on AWS and are not represented in the AWS Marketplace statistics.

DeliveryTypes
AWS Marketplace: All Products – Delivery Methods (February 2020)

Customer-deployed

An ISV who sells customer-deployed software products to consumers of cloud-based IaaS and PaaS services. Products are installed by the customer, Systems Integrator (SI), or the ISV into the customer’s cloud account. Customer-deployed products are reminiscent of traditional ‘boxed’ software.

Customers typically pay a reoccurring hourly, monthly, or annual subscription fee for the software product, commonly referred to as pay-as-you-go (PAYG). The subscription fee paid to the vendor is in addition to the fees charged to the customer by the cloud service provider for the underlying compute resources on which the customer-deployed product runs in the customer’s cloud account.

Some customer-deployed products may also require a software license. Software licenses are often purchased separately through other channels. Applying a license you already own to a newly purchased product is commonly referred to as bring your own license (BYOL). BYOL is common in larger enterprise customers, who may have entered into an Enterprise License Agreement (ELA) with the ISV.

PlanTypesCD
AWS Marketplace: Customer-deployed Product Subscription Types (February 2020)

Customer-deployed cloud-based software products can take a variety of forms. The most common deliverables include some combination of virtual machines (VMs) such as Amazon Machine Images (AMIs), Docker images, Amazon SageMaker models, or Infrastructure as Code such as AWS CloudFormationHashiCorp Terraform, or Helm Charts. Customers usually pull these deliverables from a vendor’s AWS account or other public or private source code or binary repositories. Below, we see the breakdown of customer-deployed products, by the method of delivery, on AWS Marketplace.

DeliveryTypesCD2
AWS Marketplace: Customer-deployed Product Delivery Methods (February 2020)

Although historically, AMIs have been the predominant method of customer-deployed software delivery, newer technologies, such as Docker images, serverless, SageMaker models, and AWS Data Exchange datasets will continue to grow in this segment. The AWS Serverless Application Repository (SAR), currently contains over 500 serverless applications, not reflected in this chart. AWS appears to be moving toward making it easier to sell serverless software applications in AWS Marketplace, according to one recent post.

Customer-deployed cloud-based software products may require a connection between the installed product and the ISV for product support, license verification, product upgrades, or security notifications.

SaaS-17

Examples

SaaS

An ISV who sells SaaS software products to customers. The SaaS product is deployed, managed, and sold by the ISV and hosted by a cloud provider, such as AWS. A SaaS product may or may not interact with a customer’s cloud account. SaaS products are similar to customer-deployed products with respect to their subscription-based fee structure. Subscriptions may be based on a unit of measure, often a period of time. Subscriptions may also be based on the number of users, requests, hosts, or the volume of data.

AWS Marketplace: SaaS Products - Delivery Methods (February 2020)
AWS Marketplace: SaaS Products – Pricing Plans (February 2020)

A significant difference between SaaS products and customer-deployed products is the lack of direct customer costs from the underlying cloud provider. The underlying costs are bundled into the subscription fee for the SaaS product.

Similar to Customer-deployed products, SaaS products target both consumers and businesses. SaaS products span a wide variety of consumer, business, industry-specific, and technical categories. AWS Marketplace offers products from vendors covering eight major categories and over 70 sub-categories.

SaaSCats
AWS Marketplace: SaaS Product Categories (February 2020)

SaaS Product Variants

I regularly work with a wide variety of cloud-based software vendors. In my experience, most cloud-based SaaS products fit into one of four categories, based on the primary way a customer interacts with the SaaS product:

  • Stand-alone: A SaaS product that has no interaction with the customer’s cloud account;
  • Data Access: A SaaS product that connects to the customer’s cloud account to only obtain data;
  • Augmentation: A SaaS product that connects to the customer’s cloud account, interacting with and augmenting the customer’s software;
  • Discrete Service: A variation of augmentation, a SaaS product that provides a discrete service or function as opposed to a more complete software product;

Stand-alone

A stand-alone SaaS product has no interaction with a customer’s cloud account. Customers of stand-alone SaaS products interact with the product through an interface provided by the SaaS vendor. Many stand-alone SaaS products can import and export customer data, as well as integrate with other cloud-based SaaS products. Stand-alone SaaS products may target consumers, known as Business-to-Consumer (B2C SaaS). They may also target businesses, known as Business-to-Business (B2B SaaS).

SaaS-29

Examples

Data Access

A SaaS product that connects to a customer’s data sources in their cloud account or on-prem. These SaaS products often fall into the categories of Big Data and Data Analytics, Machine Learning and Artificial Intelligence, and IoT (Internet of Things). Products in these categories work with large quantities of data. Given the sheer quantity of data or real-time nature of the data, importing or manually inputting data directly into the SaaS product, through the SaaS vendor’s user interface is unrealistic. Often, these SaaS products will cache some portion of the customer’s data to reduce customer’s data transfer costs.

Similar to the previous stand-alone SaaS products, customers of these SaaS products interact with the product thought a user interface provided by the SaaS vendor.

SaaS-14

Examples

  • Zepl provides an enterprise data science analytics platform, which enables data exploration, analysis, and collaboration. Zepl sells its Zepl Science and Analytics Platform SaaS product on AWS Marketplace. The Zepl product provides integration to many types of customer data sources including Snowflake, Amazon S3, Amazon Redshift, Amazon Athena, Google BigQuery, Apache Cassandra (Amazon MCS), and other SQL databases.
  • Sisense provides an enterprise-grade, cloud-native business intelligence and analytics platform, powered by AI. Sisense offers its Sisense Business Intelligence SaaS product on AWS Marketplace. This product lets customers prepare and analyze disparate big datasets using Sisense’s Data Connectors. The wide array of connectors provide connectivity to dozens of different cloud-based and on-prem data sources.
  • Databricks provides a unified data analytics platform, designed for massive-scale data engineering and collaborative data science. Databricks offers its Databricks Unified Analytics Platform SaaS product on AWS Marketplace. Databricks allows customers to interact with data across many different data sources, data storage types, and data types, including batch and streaming.
  • DataRobot provides an enterprise AI platform, which enables global enterprises to collaboratively harness the power of AI. DataRobot sells its DataRobot Automated Machine Learning for AWS SaaS product on AWS Marketplace. Using connectors, like Skyvia’s OData connector, customers can connect their data sources to the DataRobot product.

Augmentation

A SaaS product that interacts with, or augments a customer’s application, which is managed by the customer in their own cloud account. These SaaS products often maintain secure, loosely-coupled, unidirectional or bidirectional connections between the vendor’s SaaS product and the customer’s account. Vendors on AWS often use services like Amazon EventBridgeAWS PrivateLink, VPC Peering, Amazon S3, Amazon Kinesis, Amazon SQS, and Amazon SNS to interact with customer’s accounts and exchange data. Often, these SaaS products fall within the categories of Security, Logging and Monitoring, and DevOps.

Customers of these types of SaaS products generally interact with their own software, as well as the SaaS product thought an interface provided by the SaaS vendor.

SaaS-24

Examples

  • CloudCheckr provides solutions that enable clients to optimize costs, security, and compliance on leading cloud providers. CloudCheckr sells its Cloud Management Platform SaaS product on AWS Marketplace. CloudCheckr uses an AWS IAM cross-account role and Amazon S3 to exchange data between the customer’s account and their SaaS product.
  • Splunk provides the leading software platform for real-time Operational Intelligence. Splunk sells its Splunk Cloud SaaS product on AWS Marketplace. Splunk Cloud enables rapid application troubleshooting, ensures security and compliance, and provides monitoring of business-critical services in real-time. According to their documentation, Splunk uses a combination of AWS S3, Amazon SQS, and Amazon SNS services to transfer AWS CloudTrail logs from the customer’s accounts to Splunk Cloud.

Discrete Service

Discrete SaaS products are a variation of SaaS augmentation products. Discrete SaaS products provide specific, distinct functionality to a customer’s software application. These products may be an API, data source, or machine learning model, which is often accessed completely through a vendor’s API. The products have a limited or no visual user interface. These SaaS products are sometimes referred to as a ‘Service as a Service’. Discrete SaaS products often fall into the categories of Artificial Intelligence and Machine Learning, Financial Services, Reference Data, and Authentication and Authorization.

SaaS-30

Examples

AWS Data Exchange

There is a new category of products on AWS Marketplace. Released in November 2019, AWS Data Exchange makes it easy to find, subscribe to, and use third-party data in the cloud. According to AWS, Data Exchange vendors can publish new data, as well as automatically publish revisions to existing data and notify subscribers. Once subscribed to a data product, customers can use the AWS Data Exchange API to load data into Amazon S3 and then analyze it with a wide variety of AWS analytics and machine learning services.

SaaS-25

Data Exchange seems to best fit the description of a customer-deployed product. However, given the nature of the vendor-subscriber relationship, where data may be regularly exchanged—revised and published by the vendor and pulled by the subscriber—I would consider Data Exchange a cloud-based hybrid product.

AWS Data Exchange products are available on AWS Marketplace. The list of qualified data providers is growing and includes Reuters, Foursquare, TransUnion, Pitney Bowes, IMDb, Epsilon, ADP, Dun & Bradstreet, and others. As illustrated below, data sets are available in the categories of financial services, public sector, healthcare, media, telecommunications, and more.

DataTypes
AWS Marketplace: Data Exchange Product Categories (February 2020)

Examples

Conclusion

In this first post, we’ve become familiar with the common ways in which customers consume cloud-based IaaS, PaaS, and SaaS products and services. We also explored the different ways in which ISVs sell their software products to customers. In future posts, we will examine different high-level SaaS architectures, review tenant isolation strategies, and explore how SaaS vendors securely interact with their customer’s cloud accounts. Finally, we will discuss how SaaS providers can meet best-practices, like those from AWS SaaS Factory and the AWS Well-Architected Framework.

References

Here are some great references to learn more about building and managing SaaS products on AWS.

This blog represents my own view points and not of my employer, Amazon Web Services.

, , , , , ,

Leave a comment

Using Amazon Polly Text-to-Speech Service to Expand your Blog’s Audience

 

Audio Version of Post

Writing a blog about your latest product’s features? Case studies on your latest’s customer integrations? Opinions and insights into your industry, or your customer’s industries? Well written blog posts not only inform, they can also showcase you or your organization’s experience and expertise. However, if you blog on a regular basis, you know there is a lot of content out there to steal a reader’s interest. Your audience’s interest in your post can be short-lived.

A great way to extend the reach of your posts and maintain interest longer is to produce an audio version. Audio versions can be included along with the written post. Audio versions may also be shared separately on popular services such as YouTube and SoundCloud. Many multi-tasking readers may prefer to listen to a post as opposed to reading. I often listen while commuting, working out, or running.

screen_shot_2020-03-26_at_9.55.47_pm

screen_shot_2020-03-26_at_10.24.53_pm

Amazon Polly’s text-to-speech (TTS) capabilities make it easy to convert a written post to a lifelike, professionally-sounding audio version. In this brief post, we will learn how to use Amazon Polly to convert blog posts to audio.

screen_shot_2020-03-25_at_3.58.19_pm

Preparing Post for Audio

Most posts only require minor modifications to optimize them for conversion to audio.

  • Adding an opening statement to your post, like ‘Audio version of the post’, followed by the post’s title, provides a good way to start the audio. For example, ‘Audio Introduction to: Getting Started with Data Analytics using Jupyter Notebooks, PySpark, and Docker’.
  • If your post includes code, you probably want to exclude those sections. If the post contains a large amount of code, you might consider only creating an audio introduction to the post.
  • If your post contains graphs or charts, which are referenced in the post, I suggest adding text, such as ‘See the Chart’, along with the caption of the graph or chart. For example, ‘See the Chart – AWS Marketplace: Product Delivery Methods (February 2020)’.
  • Create a simple URL for your post and add it to the audio, either at the beginning or end of the post. For example, ‘To read the full version of this post, including code samples, please go to tiny url dot com forward slash streaming warehouse’.

Custom Lexicons

Amazon Polly offers the ability to use custom lexicons, or vocabularies. According to AWS, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words, and neologisms. If you write industry-specific or highly technical blogs, you will find creating a lexicon is probably necessary to ensure your accompanying audio sounds accurate. In my own technical posts, I most often use a custom lexicon file for acronyms and company names. While many acronyms are spelled out, others are not and have unique pronunciations. Likewise, many company names have a unique pronunciation.

Take for example the following acronyms, which I used in my last few posts: PaaS, BYOL, ELA, PAYG, IPv4, IPv6, IAM, ENI. Using the default lexicon of Amazon Polly, we end up with incorrect pronunciations for all of these acronyms.

Now listen to the pronunciation of the same acronyms, after we apply a custom lexicon.

Lexicons must conform to the Pronunciation Lexicon Specification (PLS) W3C recommendation. The lexicon files are in XML format. Below is a snippet of a sample lexicon files.

Amazon Polly Console

Amazon Polly supports synthesizing speech from either plain text or SSML input. From the Amazon Polly’s Management Console, copy and paste your post’s prepared text into the ‘Plain text’ tab.

Next, choose the Voice Engine. If you are using English, I suggest ‘Neural’. According to AWS, Amazon Polly has a Neural TTS (NTTS) system that can produce even higher quality voices than its standard voices. The NTTS system produces the most natural and human-like text-to-speech voices possible.

Choose your Language and Region. Then, select your Voice; I prefer ‘Joanna’. In my opinion, her voice has a natural, lifelike sound. If you prefer a male voice, ‘Matthew’ is quite natural sounding. Lastly, upload your lexicon file(s).

screen_shot_2020-03-25_at_4.00.52_pm

To start the process, choose ‘Synthesize to S3’. Indicate the S3 bucket you would like the mp3 format audio file, output into. You can also add a prefix to the mp3 files. For most average length posts, the text synthesis process takes less than one minute. To be notified, you can include an Amazon SNS topic ARN. Select ‘Synthesize’.

screen_shot_2020-03-25_at_4.04.35_pm

Polly creates a synthesis task.

screen_shot_2020-03-25_at_4.04.42_pm

The synthesis tasks may be viewed from the ‘S3 synthesis tasks’ tab.

screen_shot_2020-03-25_at_4.05.42_pm

Once the synthesis task is complete, the resulting mp3 audio file may be viewed and downloaded from the S3 Management Console. If you are using a Mac, QuickTime Player works great to review the audio file.

screen_shot_2020-03-25_at_4.06.22_pm

AWS CLI and SDK

Amazon Polly may also be used from the AWS CLI or using the AWS SDK. In the example below, we have replicated the same operations performed in the Console, this time using the AWS CLI. First, upload your lexicon file using the polly put-lexicon command. Then call the polly start-speech-synthesis-task command to create a synthesis task.

The output should look similar to the screengrab, below. The results will be identical to using the Console.

screen_shot_2020-03-25_at_10.56.05_pm

You can check the task’s results using the polly list-speech-synthesis-tasks command.

Conclusion

In this brief post, we saw a great use case for Amazon Polly, converting your written blog posts into audio. Creating audio versions of our blogs is a great way to extend the reach of the post to a potentially new audience and maintain your current audience’s interest a little longer. Amazon Polly has several other features and capabilities to explore.

This blog represents my own view points and not of my employer, Amazon Web Services.

, , , , ,

Leave a comment

Streaming Data Analytics with Amazon Kinesis Data Firehose, Redshift, and QuickSight

Databases are ideal for storing and organizing data that requires a high volume of transaction-oriented query processing while maintaining data integrity. In contrast, data warehouses are designed for performing data analytics on vast amounts of data from one or more disparate sources. In our fast-paced, hyper-connected world, those sources often take the form of continuous streams of web application logs, e-commerce transactions, social media feeds, online gaming activities, financial trading transactions, and IoT sensor readings. Streaming data must be analyzed in near real-time, while often first requiring cleansing, transformation, and enrichment.

In the following post, we will demonstrate the use of Amazon Kinesis Data Firehose, Amazon Redshift, and Amazon QuickSight to analyze streaming data. We will simulate time-series data, streaming from a set of IoT sensors to Kinesis Data Firehose. Kinesis Data Firehose will write the IoT data to an Amazon S3 Data Lake, where it will then be copied to Redshift in near real-time. In Amazon Redshift, we will enhance the streaming sensor data with data contained in the Redshift data warehouse, which has been gathered and denormalized into a star schema.

Streaming-Kinesis-Redshift

In Redshift, we can analyze the data, asking questions like, what is the min, max, mean, and median temperature over a given time period at each sensor location. Finally, we will use Amazon Quicksight to visualize the Redshift data using rich interactive charts and graphs, including displaying geospatial sensor data.

screen_shot_2020-03-04_at_9.27.33_pm

Featured Technologies

The following AWS services are discussed in this post.

Amazon Kinesis Data Firehose

According to Amazon, Amazon Kinesis Data Firehose can capture, transform, and load streaming data into data lakes, data stores, and analytics tools. Direct Kinesis Data Firehose integrations include Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk. Kinesis Data Firehose enables near real-time analytics with existing business intelligence (BI) tools and dashboards.

Amazon Redshift

According to Amazon, Amazon Redshift is the most popular and fastest cloud data warehouse. With Redshift, users can query petabytes of structured and semi-structured data across your data warehouse and data lake using standard SQL. Redshift allows users to query and export data to and from data lakes. Redshift can federate queries of live data from Redshift, as well as across one or more relational databases.

Amazon Redshift Spectrum

According to Amazon, Amazon Redshift Spectrum can efficiently query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum tables are created by defining the structure for data files and registering them as tables in an external data catalog. The external data catalog can be AWS Glue or an Apache Hive metastore. While Redshift Spectrum is an alternative to copying the data into Redshift for analysis, we will not be using Redshift Spectrum in this post.

Amazon QuickSight

According to Amazon, Amazon QuickSight is a fully managed business intelligence service that makes it easy to deliver insights to everyone in an organization. QuickSight lets users easily create and publish rich, interactive dashboards that include Amazon QuickSight ML Insights. Dashboards can then be accessed from any device and embedded into applications, portals, and websites.

What is a Data Warehouse?

According to Amazon, a data warehouse is a central repository of information that can be analyzed to make better-informed decisions. Data flows into a data warehouse from transactional systems, relational databases, and other sources, typically on a regular cadence. Business analysts, data scientists, and decision-makers access the data through business intelligence tools, SQL clients, and other analytics applications.

Demonstration

Source Code

All the source code for this post can be found on GitHub. Use the following command to git clone a local copy of the project.

CloudFormation

Use the two AWS CloudFormation templates, included in the project, to build two CloudFormation stacks. Please review the two templates and understand the costs of the resources before continuing.

The first CloudFormation template, redshift.yml, provisions a new Amazon VPC with associated network and security resources, a single-node Redshift cluster, and two S3 buckets.

The second CloudFormation template, kinesis-firehose.yml, provisions an Amazon Kinesis Data Firehose delivery stream, associated IAM Policy and Role, and an Amazon CloudWatch log group and two log streams.

Change the REDSHIFT_PASSWORD value to ensure your security. Optionally, change the REDSHIFT_USERNAME value. Make sure that the first stack completes successfully, before creating the second stack.

Review AWS Resources

To confirm all the AWS resources were created correctly, use the AWS Management Console.

Kinesis Data Firehose

In the Amazon Kinesis Dashboard, you should see the new Amazon Kinesis Data Firehose delivery stream, redshift-delivery-stream.

screen_shot_2020-03-02_at_3.58.12_pm

The Details tab of the new Amazon Kinesis Firehose delivery stream should look similar to the following. Note the IAM Role, FirehoseDeliveryRole, which was created and associated with the delivery stream by CloudFormation.

screen_shot_2020-03-02_at_6.30.25_pm

We are not performing any transformations of the incoming messages. Note the new S3 bucket that was created and associated with the stream by CloudFormation. The bucket name was randomly generated. This bucket is where the incoming messages will be written.

screen_shot_2020-03-02_at_6.31.13_pm

Note the buffer conditions of 1 MB and 60 seconds. Whenever the buffer of incoming messages is greater than 1 MB or the time exceeds 60 seconds, the messages are written in JSON format, using GZIP compression, to S3. These are the minimal buffer conditions, and as close to real-time streaming to Redshift as we can get.

screen_shot_2020-03-02_at_6.31.28_pm

Note the COPY command, which is used to copy the messages from S3 to the message table in Amazon Redshift. Kinesis uses the IAM Role, ClusterPermissionsRole, created by CloudFormation, for credentials. We are using a Manifest to copy the data to Redshift from S3. According to Amazon, a Manifest ensures that the COPY command loads all of the required files, and only the required files, for a data load. The Manifests are automatically generated and managed by the Kinesis Firehose delivery stream.

screen_shot_2020-03-02_at_6.31.43_pm

Redshift Cluster

In the Amazon Redshift Console, you should see a new single-node Redshift cluster consisting of one Redshift dc2.large Dense Compute node type.

screen_shot_2020-03-02_at_7.09.35_pm

Note the new VPC, Subnet, and VPC Security Group created by CloudFormation. Also, observe that the Redshift cluster is publically accessible from outside the new VPC.

screen_shot_2020-03-02_at_7.09.41_pm

Redshift Ingress Rules

The single-node Redshift cluster is assigned to an AWS Availability Zone in the US East (N. Virginia) us-east-1 AWS Region. The cluster is associated with a VPC Security Group. The Security Group contains three inbound rules, all for Redshift port 5439. The IP addresses associated with the three inbound rules provide access to the following: 1) a /27 CIDR block for Amazon QuickSight in us-east-1, a /27 CIDR block for Amazon Kinesis Firehose in us-east-1, and to you, a /32 CIDR block with your current IP address. If your IP address changes or you do not use the us-east-1 Region, you will need to change one or all of these IP addresses. The list of Kinesis Firehose IP addresses is here. The list of QuickSight IP addresses is here.

screen_shot_2020-03-02_at_7.09.59_pm

If you cannot connect to Redshift from your local SQL client, most often, your IP address has changed and is incorrect in the Security Group’s inbound rule.

Redshift SQL Client

You can choose to use the Redshift Query Editor to interact with Redshift or use a third-party SQL client for greater flexibility. To access the Redshift Query Editor, use the user credentials specified in the redshift.yml CloudFormation template.

screen_shot_2020-03-02_at_4.01.49_pm

There is a lot of useful functionality in the Redshift Console and within the Redshift Query Editor. However, a notable limitation of the Redshift Query Editor, in my opinion, is the inability to execute multiple SQL statements at the same time. Whereas, most SQL clients allow multiple SQL queries to be executed at the same time.

screen_shot_2020-03-04_at_8.48.39_am

I prefer to use JetBrains PyCharm IDE. PyCharm has out-of-the-box integration with RedShift. Using PyCharm, I can edit the project’s Python, SQL, AWS CLI shell, and CloudFormation code, all from within PyCharm.

screen_shot_2020-03-02_at_7.12.11_pm

If you use any of the common SQL clients, you will need to set-up a JDBC (Java Database Connectivity) or ODBC (Open Database Connectivity) connection to Redshift. The ODBC and JDBC connection strings can be found in the Redshift cluster’s Properties tab or in the Outputs tab from the CloudFormation stack, redshift-stack.

screen_shot_2020-03-04_at_9.07.14_am

You will also need the RedShift database username and password you included in the aws cloudformation create-stack AWS CLI command you executed previously. Below, we see PyCharm’s Project Data Sources window containing a new data source for the Redshift dev database.

screen_shot_2020-03-02_at_6.33.01_pm

Database Schema and Tables

When CloudFormation created the Redshift cluster, it also created a new database, dev. Using the Redshift Query Editor or your SQL client of choice, execute the following series of SQL commands to create a new database schema, sensor, and six tables in the sensor schema.

Star Schema

The tables represent denormalized data, taken from one or more relational database sources. The tables form a star schema.  The star schema is widely used to develop data warehouses. The star schema consists of one or more fact tables referencing any number of dimension tables. The location, manufacturer, sensor, and history tables are dimension tables. The sensors table is a fact table.

In the diagram below, the foreign key relationships are virtual, not physical. The diagram was created using PyCharm’s schema visualization tool. Note the schema’s star shape. The message table is where the streaming IoT data will eventually be written. The message table is related to the sensors fact table through the common guid field.

schema-light-2

Sample Data to S3

Next, copy the sample data, included in the project, to the S3 data bucket created with CloudFormation. Each CSV-formatted data file corresponds to one of the tables we previously created. Since the bucket name is semi-random, we can use the AWS CLI and jq to get the bucket name, then use it to perform the copy commands.

The output from the AWS CLI should look similar to the following.

screen_shot_2020-03-02_at_7.18.22_pm

Sample Data to RedShift

Whereas a relational database, such as Amazon RDS is designed for online transaction processing (OLTP), Amazon Redshift is designed for online analytic processing (OLAP) and business intelligence applications. To write data to Redshift we typically use the COPY command versus frequent, individual INSERT statements, as with OLTP, which would be prohibitively slow. According to Amazon, the Redshift COPY command leverages the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from files on Amazon S3, from a DynamoDB table, or from text output from one or more remote hosts.

In the following series of SQL statements, replace the placeholder, your_bucket_name, in five places with your S3 data bucket name. The bucket name will start with the prefix, redshift-stack-databucket. The bucket name can be found in the Outputs tab of the CloudFormation stack, redshift-stack. Next, replace the placeholder, cluster_permissions_role_arn, with the ARN (Amazon Resource Name) of the ClusterPermissionsRole. The ARN is formatted as follows, arn:aws:iam::your-account-id:role/ClusterPermissionsRole. The ARN can be found in the Outputs tab of the CloudFormation stack, redshift-stack.

Using the Redshift Query Editor or your SQL client of choice, execute the SQL statements to copy the sample data from S3 to each of the corresponding tables in the Redshift dev database. The TRUNCATE commands guarantee there is no previous sample data present in the tables.

Database Views

Next, create four Redshift database Views. These views may be used to analyze the data in Redshift, and later, in Amazon QuickSight.

  1. sensor_msg_detail: Returns aggregated sensor details, using the sensors fact table and all five dimension tables in a SQL Join.
  2. sensor_msg_count: Returns the number of messages received by Redshift, for each sensor.
  3. sensor_avg_temp: Returns the average temperature from each sensor, based on all the messages received from each sensor.
  4. sensor_avg_temp_current: View is identical for the previous view but limited to the last 30 minutes.

Using the Redshift Query Editor or your SQL client of choice, execute the following series of SQL statements.

At this point, you should have a total of six tables and four views in the sensor schema of the dev database in Redshift.

Test the System

With all the necessary AWS resources and Redshift database objects created and sample data in the Redshift database, we can test the system. The included Python script, kinesis_put_test_msg.py, will generate a single test message and send it to Kinesis Data Firehose. If everything is working, the message should be delivered from Kinesis Data Firehose to S3, then copied to Redshift, and appear in the message table.

Install the required Python packages and then execute the Python script.

Run the following SQL query to confirm the record is in the message table of the dev database. It will take at least one minute for the message to appear in Redshift.

Once the message is confirmed to be present in the message table, delete the record by truncating the table.

Streaming Data

Assuming the test message worked, we can proceed with simulating the streaming IoT sensor data. The included Python script, kinesis_put_streaming_data.py, creates six concurrent threads, representing six temperature sensors. The simulated data uses an algorithm that follows an oscillating sine wave or sinusoid, representing rising and falling temperatures. In the script, I have configured each thread to start with an arbitrary offset to add some randomness to the simulated data.

screen_shot_2020-03-04_at_9.27.33_pm

The variables within the script can be adjusted to shorten or lengthen the time it takes to stream the simulated data. By default, each of the six threads creates 400 messages per sensor, in one-minute increments. Including the offset start of each proceeding thread, the total runtime of the script is about 7.5 hours to generate 2,400 simulated IoT sensor temperature readings and push to Kinesis Data Firehose. Make sure you can guarantee you will maintain a connection to the Internet for the entire runtime of the script. I normally run the script in the background, from a small EC2 instance.

To use the Python script, execute either of the two following commands. Using the first command will run the script in the foreground. Using the second command will run the script in the background.

If you used the foreground command, you should see messages being generated by each thread and sent to Kinesis Data Firehose. Each message contains the GUID of the sensor, a timestamp, and a temperature reading.

screen_shot_2020-03-03_at_7.01.37_am

The messages are sent to Kinesis Data Firehose, which in turn writes the messages to S3. The messages are written in JSON format using GZIP compression. Below, we see an example of the GZIP compressed JSON files in S3. The JSON files are partitioned by year, month, day, and hour.

screen_shot_2020-03-04_at_11.04.36_pm

Confirm Data Streaming to Redshift

From the Amazon Kinesis Firehose Console Metrics tab, you should see incoming messages flowing to S3 and on to Redshift.

screen_shot_2020-03-03_at_6.17.14_pm

Executing the following SQL query should show an increasing number of messages.

How Near Real-time?

Earlier, we saw how the Amazon Kinesis Data Firehose delivery stream was configured to buffer data at the rate of 1 MB or 60 seconds. Whenever the buffer of incoming messages is greater than 1 MB or the time exceeds 60 seconds, the messages are written to S3. Each record in the message table has two timestamps. The first timestamp, ts, is when the temperature reading was recorded. The second timestamp, created, is when the message was written to Redshift, using the COPY command. We can calculate the delta in seconds between the two timestamps using the following SQL query in Redshift.

Using the results of the Redshift query, we can visualize the results in Amazon QuickSight. In my own tests, we see that for 2,400 messages, over approximately 7.5 hours, the minimum delay was 1 second, and a maximum delay was 64 seconds. Hence, near real-time, in this case, is about one minute or less, with an average latency of roughly 30 seconds.

latency

Analyzing the Data with Redshift

I suggest waiting at least thirty minutes for a significant number of messages copied into Redshift. With the data streaming into Redshift, execute each of the database views we created earlier. You should see the streaming message data, joined to the existing static data in RedShift. As data continues to stream into Redshift, the views will display different results based on the current message table contents.

Here, we see the first ten results of the sensor_msg_detail view.

Next, we see the results of the sensor_avg_temp view.

Amazon QuickSight

In a recent post, Getting Started with Data Analysis on AWS using AWS Glue, Amazon Athena, and QuickSight: Part 2, I detailed getting started with Amazon QuickSight. In this post, I will assume you are familiar with QuickSight.

Amazon recently added a full set of aws quicksight APIs for interacting with QuickSight. Though, for this part of the demonstration, we will be working directly in the Amazon QuickSight Console, as opposed to the AWS CLI, AWS CDK, or CloudFormation.

Redshift Data Sets

To visualize the data from Amazon Redshift, we start by creating Data Sets in QuickSight. QuickSight supports a large number of data sources for creating data sets. We will use the Redshift data source. If you recall, we added an inbound rule for QuickSight, allowing us to connect to our Redshift cluster in us-east-1.

screen_shot_2020-03-02_at_10.21.35_pm

We will select the sensor schema, which is where the tables and views for this demonstration are located.

screen_shot_2020-03-02_at_10.21.49_pm

We can choose any of the tables or views in the Redshift dev database that we want to use for visualization.

screen_shot_2020-03-02_at_10.22.11_pm

Below, we see examples of two new data sets, shown in the QuickSight Data Prep Console. Note how QuickSight automatically recognizes field types, including dates, latitude, and longitude.

screen_shot_2020-03-02_at_10.23.12_pm

screen_shot_2020-03-02_at_10.59.32_pm

Visualizations

Using the data sets, QuickSight allows us to create a wide number of rich visualizations. Below, we see the simulated time-series data from the six temperature sensors.

screen_shot_2020-03-04_at_9.26.26_pm

Next, we see an example of QuickSight’s ability to show geospatial data. The Map shows the location of each sensor and the average temperature recorded by that sensor.

screen_shot_2020-03-04_at_9.26.51_pm

Cleaning Up

To remove the resources created for this post, use the following series of AWS CLI commands.

Conclusion

In this brief post, we have learned how streaming data can be analyzed in near real-time, in Amazon Redshift, using Amazon Kinesis Data Firehose. Further, we explored how the results of those analyses can be visualized in Amazon QuickSight. For customers that depend on a data warehouse for data analytics, but who also have streaming data sources, the use of Amazon Kinesis Data Firehose or Amazon Redshift Spectrum is an excellent choice.

This blog represents my own viewpoints and not of my employer, Amazon Web Services.

, , , , , , ,

Leave a comment

Event-driven, Serverless Architectures with AWS Lambda, SQS, DynamoDB, and API Gateway

Introduction

In this post, we will explore modern application development using an event-driven, serverless architecture on AWS. To demonstrate this architecture, we will integrate several fully-managed services, all part of the AWS Serverless Computing platform, including Lambda, API Gateway, SQS, S3, and DynamoDB. The result will be an application composed of small, easily deployable, loosely coupled, independently scalable, serverless components.

What is ‘Event-Driven’?

According to Otavio Ferreira, Manager, Amazon SNS, and James Hood, Senior Software Development Engineer, in their AWS Compute Blog, Enriching Event-Driven Architectures with AWS Event Fork Pipelines, “Many customers are choosing to build event-driven applications in which subscriber services automatically perform work in response to events triggered by publisher services. This architectural pattern can make services more reusable, interoperable, and scalable.” This description of an event-driven architecture perfectly captures the essence of the following post. All interactions between application components in this post will be as a direct result of triggering an event.

What is ‘Serverless’?

Mistakingly, many of us think of serverless as just functions (aka Function-as-a-Service or FaaS). When it comes to functions on AWS, Lambda is just one of many fully-managed services that make up the AWS Serverless Computing platform. So, what is ‘serverless’? According to AWS, “Serverless applications don’t require provisioning, maintaining, and administering servers for backend components such as compute, databases, storage, stream processing, message queueing, and more.

As a Developer, one of my favorite features of serverless is the cost, or lack thereof. With serverless on AWS, you pay for consistent throughput or execution duration rather than by server unit, and, at least on AWS, you don’t pay for idle resources. This is not always true of ‘serverless’ offerings on other leading Cloud platforms. Remember, if you’re paying for it but not using it, it’s not serverless.

If you’re paying for it but not using it, it’s not serverless.

Demonstration

To demonstrate an event-driven, serverless architecture, we will build, package, and deploy an application capable of extracting messages from CSV files placed in S3, transforming those messages, queueing them to SQS, and finally, writing the messages to DynamoDB, using Lambda functions throughout. We will also expose a RESTful API, via API Gateway, to perform CRUD-like operations on those messages in DynamoDB.

AWS Technologies

In this demonstration, we will use several AWS serverless services, including the following.

Each Lambda will use function-specific execution roles, part of AWS Identity and Access Management (IAM). We will log the event details and monitor services using Amazon CloudWatch.

To codify, build, package, deploy, and manage the Lambda functions and other AWS resources in a fully automated fashion, we will also use the following AWS services:

Architecture

The high-level architecture for the platform provisioned and deployed in this post is illustrated in the diagram below. There are two separate workflows. In the first workflow (top), data is extracted from CSV files placed in S3, transformed, queued to SQS, and written to DynamoDB, using Python-based Lambda functions throughout. In the second workflow (bottom), data is manipulated in DynamoDB through interactions with a RESTful API, exposed via an API Gateway, and backed by Node.js-based Lambda functions.

new-01-sqs-dynamodb

Using the vast array of current AWS services, there are several ways we could extract, transform, and load data from static files into DynamoDB. The demonstration’s event-driven, serverless architecture represents just one possible approach.

Source Code

All source code for this post is available on GitHub in a single public repository, serverless-sqs-dynamo-demo. To clone the GitHub repository, execute the following command.

git clone --branch master --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/serverless-sqs-dynamo-demo.git

The project files relevant to this demonstration are organized as follows.

.
├── README.md
├── lambda_apigtw_to_dynamodb
│   ├── app.js
│   ├── events
│   ├── node_modules
│   ├── package.json
│   └── tests
├── lambda_s3_to_sqs
│   ├── __init__.py
│   ├── app.py
│   ├── requirements.txt
│   └── tests
├── lambda_sqs_to_dynamodb
│   ├── __init__.py
│   ├── app.py
│   ├── requirements.txt
│   └── tests
├── requirements.txt
├── template.yaml
└── sample_data
    ├── data.csv
    ├── data_bad_msg.csv
    └── data_good_msg.csv

Some source code samples in this post are GitHub Gists, which may not display correctly on all social media browsers, such as LinkedIn.

Prerequisites

The demonstration assumes you already have an AWS account. You will need the latest copy of the AWS CLI, SAM CLI, and Python 3 installed on your development machine.

Additionally, you will need two existing S3 buckets. One bucket will be used to store the packaged project files for deployment. The second bucket is where we will place CSV data files, which, in turn, will trigger events that invoke multiple Lambda functions.

Deploying the Project

Before diving into the code, we will deploy the project to AWS. Conveniently, the entire project’s resources are codified in an AWS SAM template. We are using the AWS Serverless Application Model (SAM). AWS SAM is a model used to define serverless applications on AWS. According to the official SAM GitHub project documentation, AWS SAM is based on AWS CloudFormation. A serverless application is defined in a CloudFormation template and deployed as a CloudFormation stack.

Template Parameter

CloudFormation will create and uniquely name the SQS queues and the DynamoDB table. However, to avoid circular references, a common issue when creating resources associated with S3 event notifications, it is easier to use a pre-existing bucket. To start, you will need to change the SAM template’s DataBucketName parameter’s default value to your own S3 bucket name. Again, this bucket is where we will eventually push the CSV data files. Alternately, override the default values using the sam build command, next.

Parameters:
  DataBucketName:
    Type: String
    Description: S3 bucket where CSV files are processed
    Default: your-data-bucket-name

SAM CLI Commands

With the DataBucketName parameter set, proceed to validate, build, package, and deploy the project using the SAM CLI and the commands below. In addition to the sam validate command, I also like to use the aws cloudformation validate-template command to validate templates and catch any potential, additional errors.

Note the S3_BUCKET_BUILD variable, below, refers to the name of the S3 bucket SAM will use package and deploy the project from, as opposed to the S3 bucket, which the CSV data files will be placed into (gist).

After validating the template, SAM will build and package each individual Lambda function and its associated dependencies. Below, we see each individual Lambda function being packaged with a copy of its dependencies.

screen_shot_2019-09-30_at_8_42_41_pm

Once packaged, SAM will deploy the project and create the AWS resources as a CloudFormation stack.

screen_shot_2019-09-30_at_8_43_11_pm

Once the stack creation is complete, use the CloudFormation management console to review the AWS resources created by SAM. There are approximately 14 resources defined in the SAM template, which result in 33 individual resources deployed as part of the CloudFormation stack.

screen_shot_2019-09-30_at_8_44_46_pm

Note the stack’s output values. You will need these values to interact with the deployed platform, later in the demonstration.

screen_shot_2019-09-30_at_8_45_13_pm

Test the Deployed Application

Once the CloudFormation stack has deployed without error, copying a CSV file to the S3 bucket is the quickest way to confirm everything is working. The project includes test data files with 20 rows of test message data. Below is a sample of the CSV file, which is included in the project. The data was collected from IoT devices that measured response time from wired versus wireless devices on a LAN network; the message details are immaterial to this demonstration (gist).

Run the following commands to copy the test data file to your S3 bucket.

S3_DATA_BUCKET=your_data_bucket_name
aws s3 cp sample_data/data.csv s3://$S3_DATA_BUCKET

Visit the DynamoDB management console. You should observe a new DynamoDB table.

screen_shot_2019-09-30_at_8_47_10_pm

Within the new DynamoDB table, you should observe twenty items, corresponding to each of the twenty rows in the CSV file, uploaded to S3.

screen_shot_2019-09-30_at_8_51_07_pm

Drill into an individual item within the table and review its attributes. They should match the rows in the CSV file.

screen_shot_2019-09-30_at_8_51_54_pm

Both the Python- and Node.js-based Lambda functions have their default logging levels set to debug. The debug-level output from each Lambda function is streamed to individual Amazon CloudWatch Log Groups. We can use the CloudWatch logs to troubleshoot any issues with the deployed application. Below we see an example of CloudWatch log entries for the request and response payloads generated from GetMessageFunction Lambda function, which is querying DynamoDB for a single Item.

screen_shot_2019-10-03_at_9_16_22_pm

Event-Driven Patterns

There are three distinct and discrete event-driven dataflows within the demonstration’s architecture

  1. S3 Event Source for Lambda (S3 to SQS)
  2. SQS Event Source for Lambda (SQS to DynamoDB)
  3. API Gateway Event Source for Lambda (API Gateway to DynamoDB)

Let’s examine each event-driven dataflow and the Lambda code associated with that part of the architecture.

S3 Event Source for Lambda

Whenever a file is copied into the target S3 bucket, an S3 Event Notification triggers an asynchronous invocation of a Lambda. According to AWS, when you invoke a function asynchronously, the Lambda sends the event to the SQS queue. A separate process reads events from the queue and executes your Lambda function.

new-02-sqs-dynamodb

The Lambda’s function handler, written in Python, reads the CSV file, whose filename is contained in the event. The Lambda extracts the rows in the CSV file, transforms the data, and pushes each message to the SQS queue (gist).

Below is an example of a message body, part an SQS message, extracted from a single row of the CSV file, and sent by the Lambda to the SQS queue. The timestamp has been converted to separate date and time fields by the Lambda. The DynamoDB table is part of the SQS message body. The key/value pairs in the Item JSON object reflect the schema of the DynamoDB table (gist).

SQS Event Source for Lambda

According to AWS, SQS offers two types of message queues, Standard and FIFO (First-In-First-Out). An SQS FIFO queue is designed to guarantee that messages are processed exactly once, in the exact order that they are sent. A Standard SQS queue offers maximum throughput, best-effort ordering, and at-least-once delivery.

Examining the SQS management console, you should observe that the CloudFormation stack creates two SQS Standard queues—a primary queue and a Dead Letter Queue (DLQ). According to AWS, Amazon SQS supports dead-letter queues, which other queues (source queues) can target for messages that cannot be processed (consumed) successfully.

screen_shot_2019-09-30_at_8_55_33_pm

Examining the SQS Lambda Triggers tab, you should observe the Lambda, which will be triggered by the SQS events.

screen_shot_2019-09-30_at_8_56_25_pm

When a message is pushed into the SQS queue by the previous process, an SQS event is fired, which synchronously triggers an invocation of the Lambda using the SQS Event Source for Lambda functionality. When a function is invoked synchronously, Lambda runs the function and waits for a response.

new-03-sqs-dynamodb

In the demonstration, the Lambda’s function handler, also written in Python, pulls the message off of the SQS queue and writes the message (DynamoDB put) to the DynamoDB table. Although writing is the primary use case in this demonstration, an event could also trigger a get, scan, update, or delete command to be executed on the DynamoDB table (gist).

API Gateway Event Source for Lambda

Examining the API Gateway management console, you should observe that CloudFormation created a new Edge-optimized API. The API contains several resources and their associated HTTP methods.

screen_shot_2019-09-30_at_9_02_52_pm

Each API resource is associated with a deployed Lambda function. Switching to the Lambda console, you should observe a total of seven new Lambda functions. There are five Lambda functions related to the API, in addition to the Lambda called by the S3 event notifications and the Lambda called by the SQS event notifications.

screen_shot_2019-09-30_at_8_46_24_pm

Examining one of the Lambda functions associated with the API Gateway, we should observe that the API Gateway trigger for the Lambda (lower left and bottom).

screen_shot_2019-09-30_at_8_59_39_pm

When an end-user makes an HTTP(S) request via the RESTful API exposed by the API Gateway, an event is fired, which synchronously invokes a Lambda using the API Gateway Event Source for Lambda functionality. The event contains details about the HTTP request that is received. The event triggers any one of five different Lambda functions, depending on the HTTP request method.

new-04-sqs-dynamodb

The Lambda code, written in Node.js, contains five function handlers. Each handler corresponds to an HTTP method, including GET (DynamoDB get) POST (put), PUT (update), DELETE (delete), and SCAN (scan). Below is an example of the getMessage handler function. The function accepts two inputs. First, a path parameter, the date, which is the primary partition key for the DynamoDB table. Second, a query parameter, the time, which is the primary sort key for the DynamoDB table. Both the primary partition key and sort key must be passed to DynamoDB to retrieve the requested record (gist).

Test the API

To test the Lambda functions, called by our API, we can use the sam local invoke command, part of the SAM CLI. Using this command, we can test the local Lambda functions, without them being deployed to AWS. The command allows us to trigger events, which the Lambda functions will handle. This is useful as we continue to develop, test, and re-deploy the Lambda functions to our Development, Staging, and Production environments.

The local Node.js-based, API-related Lambda functions, just like their deployed copies, will execute commands against the actual DynamoDB on AWS. The Github project contains a set of five sample events, corresponding to the five Lambda functions, which in turn are associated with five different HTTP methods and API resources. For example, the event_getMessage.json event is associated with the GET HTTP method and calls the /message/{date}?time={time} resource endpoint, to return a single item. This event, shown below, triggers the GetMessageFunction Lambda (gist).

We can trigger all the events from using the CLI. The local Lambda expects the DynamoDB table name to exist as an environment variable. Make sure you set it locally, first, before executing the sam local invoke commands (gist).

If the events were successfully handled by the local Lambda functions, in the terminal, you should see the same HTTP response status codes you would expect from calling the RESTful resources via the API Gateway. Below, for example, we see the POST event being handled by the PostMessageFunction Lambda, adding a record to the DynamoDB table, and returning a successful status of 201 Created.

screen_shot_2019-10-04_at_2_50_41_pm.png

Testing the Deployed API

To test the actual deployed API, we can call one of the API’s resources using an HTTP client, such as Postman. To locate the URL used to invoke the API resource, look at the ‘Prod’ Stage for the new API. This can be found in the Stages tab of the API Gateway console. For example, note the Invoke URL for the POST HTTP method of the /message resource, shown below.

screen_shot_2019-10-04_at_3_02_21_pm

Below, we see an example of using Postman to make an HTTP GET request the /message/{date}?time={time} resource. We pass the required query and path parameters for date and for time. The request should receive a single item in response from DynamoDB, via the API Gateway and the associated Lambda. Here, the request was successful, and the Lambda function returns a 200 OK status.

screen_shot_2019-09-30_at_9_20_45_pm

Similarly, below, we see an example of calling the same /message endpoint using the HTTP POST method. In the body of the POST request, we pass the DynamoDB table name and the Item object. Again, the POST is successful, and the Lambda function returns a 201 Created status.

screen_shot_2019-10-03_at_10_05_31_pm

Cleaning Up

To complete the demonstration and remove the AWS resources, run the following commands. It is necessary to delete all objects from the S3 data bucket, first, before deleting the CloudFormation stack. Else, the stack deletion will fail.

S3_DATA_BUCKET=your_data_bucket_name
STACK_NAME=your_stack_name

aws s3 rm s3://$S3_DATA_BUCKET/data.csv # and any other objects

aws cloudformation delete-stack \
  --stack-name $STACK_NAME

Conclusion

In this post, we explored a simple example of building a modern application using an event-driven serverless architecture on AWS. We used several services, all part of the AWS Serverless Computing platform, including Lambda, API Gateway, SQS, S3, and DynamoDB. In addition to these, AWS has additional serverless services, which could enhance this demonstration, in particular, Amazon Kinesis, AWS Step Functions, Amazon SNS, and AWS AppSync.

In a future post, we will look at how to further test the individual components within this demonstration’s application stack, and how to automate its deployment using DevOps and CI/CD principals on AWS.

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , , ,

Leave a comment

Getting Started with PostgreSQL using Amazon RDS, CloudFormation, pgAdmin, and Python

Introduction

In the following post, we will explore how to get started with Amazon Relational Database Service (RDS) for PostgreSQL. CloudFormation will be used to build a PostgreSQL master database instance and a single read replica in a new VPC. AWS Systems Manager Parameter Store will be used to store our CloudFormation configuration values. Amazon RDS Event Notifications will send text messages to our mobile device to let us know when the RDS instances are ready for use. Once running, we will examine a variety of methods to interact with our database instances, including pgAdmin, Adminer, and Python.

Technologies

The primary technologies used in this post include the following.

PostgreSQL

Image result for postgres logoAccording to its website, PostgreSQL, commonly known as Postgres, is the world’s most advanced Open Source relational database. Originating at UC Berkeley in 1986, PostgreSQL has more than 30 years of active core development. PostgreSQL has earned a strong reputation for its proven architecture, reliability, data integrity, robust feature set, extensibility. PostgreSQL runs on all major operating systems and has been ACID-compliant since 2001.

Amazon RDS for PostgreSQL

Image result for amazon rds logoAccording to Amazon, Amazon Relational Database Service (RDS) provides six familiar database engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. RDS is available on several database instance types - optimized for memory, performance, or I/O.

Amazon RDS for PostgreSQL makes it easy to set up, operate, and scale PostgreSQL deployments in the cloud. Amazon RDS supports the latest PostgreSQL version 11, which includes several enhancements to performance, robustness, transaction management, query parallelism, and more.

AWS CloudFormation

Deployment__Management_copy_AWS_CloudFormation-512

According to Amazon, CloudFormation provides a common language to describe and provision all the infrastructure resources within AWS-based cloud environments. CloudFormation allows you to use a JSON- or YAML-based template to model and provision all the resources needed for your applications across all AWS regions and accounts, in an automated and secure manner.

Demonstration

Architecture

Below, we see an architectural representation of what will be built in the demonstration. This is not a typical three-tier AWS architecture, wherein the RDS instances would be placed in private subnets (data tier) and accessible only by the application tier, running on AWS. The architecture for the demonstration is designed for interacting with RDS through external database clients such as pgAdmin, and applications like our local Python scripts, detailed later in the post.

RDS AWS Arch Diagram

Source Code

All source code for this post is available on GitHub in a single public repository, postgres-rds-demo.

.
├── LICENSE.md
├── README.md
├── cfn-templates
│   ├── event.template
│   ├── rds.template
├── parameter_store_values.sh
├── python-scripts
│   ├── create_pagila_data.py
│   ├── database.ini
│   ├── db_config.py
│   ├── query_postgres.py
│   ├── requirements.txt
│   └── unit_tests.py
├── sql-scripts
│   ├── pagila-insert-data.sql
│   └── pagila-schema.sql
└── stack.yml

To clone the GitHub repository, execute the following command.

git clone --branch master --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/aws-rds-postgres.git

Prerequisites

For this demonstration, I will assume you already have an AWS account. Further, that you have the latest copy of the AWS CLI and Python 3 installed on your development machine. Optionally, for pgAdmin and Adminer, you will also need to have Docker installed.

Steps

In this demonstration, we will perform the following steps.

  • Put CloudFormation configuration values in Parameter Store;
  • Execute CloudFormation templates to create AWS resources;
  • Execute SQL scripts using Python to populate the new database with sample data;
  • Configure pgAdmin and Python connections to RDS PostgreSQL instances;

AWS Systems Manager Parameter Store

With AWS, it is typical to use services like AWS Systems Manager Parameter Store and AWS Secrets Manager to store overt, sensitive, and secret configuration values. These values are utilized by your code, or from AWS services like CloudFormation. Parameter Store allows us to follow the proper twelve-factor, cloud-native practice of separating configuration from code.

To demonstrate the use of Parameter Store, we will place a few of our CloudFormation configuration items into Parameter Store. The demonstration’s GitHub repository includes a shell script, parameter_store_values.sh, which will put the necessary parameters into Parameter Store.

Below, we see several of the demo’s configuration values, which have been put into Parameter Store.

screen_shot_2019-08-08_at_7_00_17_pm

SecureString

Whereas our other parameters are stored in Parameter Store as String datatypes, the database’s master user password is stored as a SecureString data-type. Parameter Store uses an AWS Key Management Service (KMS) customer master key (CMK) to encrypt the SecureString parameter value.

screen_shot_2019-08-08_at_7_01_15_pm

SMS Text Alert Option

Before running the Parameter Store script, you will need to change the /rds_demo/alert_phone parameter value in the script (shown below) to your mobile device number, including country code, such as ‘+12038675309’. Amazon SNS will use it to send SMS messages, using Amazon RDS Event Notification. If you don’t want to use this messaging feature, simply ignore this parameter and do not execute the event.template CloudFormation template in the proceeding step.

aws ssm put-parameter \
  --name /rds_demo/alert_phone \
  --type String \
  --value "your_phone_number_here" \
  --description "RDS alert SMS phone number" \
  --overwrite

Run the following command to execute the shell script, parameter_store_values.sh, which will put the necessary parameters into Parameter Store.

sh ./parameter_store_values.sh

CloudFormation Templates

The GitHub repository includes two CloudFormation templates, cfn-templates/event.template and cfn-templates/rds.template. This event template contains two resources, which are an AWS SNS Topic and an AWS RDS Event Subscription. The RDS template also includes several resources, including a VPC, Internet Gateway, VPC security group, two public subnets, one RDS master database instance, and an AWS RDS Read Replica database instance.

The resources are split into two CloudFormation templates so we can create the notification resources, first, independently of creating or deleting the RDS instances. This will ensure we get all our SMS alerts about both the creation and deletion of the databases.

Template Parameters

The two CloudFormation templates contain a total of approximately fifteen parameters. For most, you can use the default values I have set or chose to override them. Four of the parameters will be fulfilled from Parameter Store. Of these, the master database password is treated slightly differently because it is secure (encrypted in Parameter Store). Below is a snippet of the template showing both types of parameters. The last two are fulfilled from Parameter Store.

DBInstanceClass:
  Type: String
  Default: "db.t3.small"
DBStorageType:
  Type: String
  Default: "gp2"
DBUser:
  Type: String
  Default: "{{resolve:ssm:/rds_demo/master_username:1}}"
DBPassword:
  Type: String
  Default: "{{resolve:ssm-secure:/rds_demo/master_password:1}}"
  NoEcho: True

Choosing the default CloudFormation parameter values will result in two minimally-configured RDS instances running the PostgreSQL 11.4 database engine on a db.t3.small instance with 10 GiB of General Purpose (SSD) storage. The db.t3 DB instance is part of the latest generation burstable performance instance class. The master instance is not configured for Multi-AZ high availability. However, the master and read replica each run in a different Availability Zone (AZ) within the same AWS Region.

Parameter Versioning

When placing parameters into Parameter Store, subsequent updates to a parameter result in the version number of that parameter being incremented. Note in the examples above, the version of the parameter is required by CloudFormation, here, ‘1’. If you chose to update a value in Parameter Store, thus incrementing the parameter’s version, you will also need to update the corresponding version number in the CloudFormation template’s parameter.

{
    "Parameter": {
        "Name": "/rds_demo/rds_username",
        "Type": "String",
        "Value": "masteruser",
        "Version": 1,
        "LastModifiedDate": 1564962073.774,
        "ARN": "arn:aws:ssm:us-east-1:1234567890:parameter/rds_demo/rds_username"
    }
}

Validating Templates

Although I have tested both templates, I suggest validating the templates yourself, as you usually would for any CloudFormation template you are creating. You can use the AWS CLI CloudFormation validate-template CLI command to validate the template. Alternately, or I suggest additionally, you can use CloudFormation Lintercfn-lint command.

aws cloudformation validate-template \
  --template-body file://cfn-templates/rds.template

cfn-lint -t cfn-templates/cfn-templates/rds.template

Create the Stacks

To execute the first CloudFormation template and create a CloudFormation Stack containing the two event notification resources, run the following create-stack CLI command.

aws cloudformation create-stack \
  --template-body file://cfn-templates/event.template \
  --stack-name RDSEventDemoStack

The first stack only takes less than one minute to create. Using the AWS CloudFormation Console, make sure the first stack completes successfully before creating the second stack with the command, below.

aws cloudformation create-stack \
  --template-body file://cfn-templates/rds.template \
  --stack-name RDSDemoStack

screen_shot_2019-08-04_at_10_35_20_pm

Wait for my Text

In my tests, the CloudFormation RDS stack takes an average of 25–30 minutes to create and 15–20 minutes to delete, which can seem like an eternity. You could use the AWS CloudFormation console (shown below) or continue to use the CLI to follow the progress of the RDS stack creation.

screen_shot_2019-08-08_at_7_39_45_pm.png

However, if you recall, the CloudFormation event template creates an AWS RDS Event Subscription. This resource will notify us when the databases are ready by sending text messages to our mobile device.

screen_shot_2019-08-04_at_11_06_31_pm

In the CloudFormation events template, the RDS Event Subscription is configured to generate Amazon Simple Notification Service (SNS) notifications for several specific event types, including RDS instance creation and deletion.

  MyEventSubscription:
    Properties:
      Enabled: true
      EventCategories:
        - availability
        - configuration change
        - creation
        - deletion
        - failover
        - failure
        - recovery
      SnsTopicArn:
        Ref: MyDBSNSTopic
      SourceType: db-instance
    Type: AWS::RDS::EventSubscription

Amazon SNS will send SMS messages to the mobile number you placed into Parameter Store. Below, we see messages generated during the creation of the two instances, displayed on an Apple iPhone.

img-2839

Amazon RDS Dashboard

Once the RDS CloudFormation stack has successfully been built, the easiest way to view the results is using the Amazon RDS Dashboard, as shown below. Here we see both the master and read replica instances have been created and are available for our use.

screen_shot_2019-08-08_at_7_05_24_pm

The RDS dashboard offers CloudWatch monitoring of each RDS instance.

screen_shot_2019-08-08_at_7_06_17_pm

The RDS dashboard also provides detailed configuration information about each RDS instance.

screen_shot_2019-08-08_at_7_06_26_pm

The RDS dashboard’s Connection & security tab is where we can obtain connection information about our RDS instances, including the RDS instance’s endpoints. Endpoints information will be required in the next part of the demonstration.

screen_shot_2019-08-08_at_7_06_01_pm

Sample Data

Now that we have our PostgreSQL database instance and read replica successfully provisioned and configured on AWS, with an empty database, we need some test data. There are several sources of sample PostgreSQL databases available on the internet to explore. We will use the Pagila sample movie rental database by pgFoundry. Although the database is several years old, it provides a relatively complex relational schema (table relationships shown below) and plenty of sample data to query, about 100 database objects and 46K rows of data.

pagila_tablespng

In the GitHub repository, I have included the two Pagila database SQL scripts required to install the sample database’s data structures (DDL), sql-scripts/pagila-schema.sql, and the data itself (DML), sql-scripts/pagila-insert-data.sql.

To execute the Pagila SQL scripts and install the sample data, I have included a Python script. If you do not want to use Python, you can skip to the Adminer section of this post. Adminer also has the capability to import SQL scripts.

Before running any of the included Python scripts, you will need to install the required Python packages and configure the database.ini file.

Python Packages

To install the required Python packages using the supplied python-scripts/requirements.txt file, run the below commands.

cd python-scripts
pip3 install --upgrade -r requirements.txt

We are using two packages, psycopg2 and configparser, for the scripts. Psycopg is a PostgreSQL database adapter for Python. According to their website, Psycopg is the most popular PostgreSQL database adapter for the Python programming language. The configparser module allows us to read configuration from files similar to Microsoft Windows INI files. The unittest package is required for a set of unit tests includes the project, but not discussed as part of the demo.

screen_shot_2019-08-13_at_11_06_10_pm

Database Configuration

The python-scripts/database.ini file, read by configparser, provides the required connection information to our RDS master and read replica instance’s databases. Use the input parameters and output values from the CloudFormation RDS template, or the Amazon RDS Dashboard to obtain the required connection information, as shown in the example, below. Your host values will be unique for your master and read replica. The host values are the instance’s endpoint, listed in the RDS Dashboard’s Configuration tab.

[docker]
host=localhost
port=5432
database=pagila
user=masteruser
password=5up3r53cr3tPa55w0rd

[master]
host=demo-instance.dkfvbjrazxmd.us-east-1.rds.amazonaws.com
port=5432
database=pagila
user=masteruser
password=5up3r53cr3tPa55w0rd

[replica]
host=demo-replica.dkfvbjrazxmd.us-east-1.rds.amazonaws.com
port=5432
database=pagila
user=masteruser
password=5up3r53cr3tPa55w0rd

With the INI file configured, run the following command, which executes a supplied Python script, python-scripts/create_pagila_data.py, to create the data structure and insert sample data into the master RDS instance’s Pagila database. The database will be automatically replicated to the RDS read replica instance. From my local laptop, I found the Python script takes approximately 40 seconds to create all 100 database objects and insert 46K rows of movie rental data. That is compared to about 13 seconds locally, using a Docker-based instance of PostgreSQL.

python3 ./create_pagila_data.py

The Python script’s primary function, create_pagila_db(), reads and executes the two external SQL scripts.

def create_pagila_db():
    """
    Creates Pagila database by running DDL and DML scripts
    """

    try:
        global conn
        with conn:
            with conn.cursor() as curs:
                curs.execute(open("../sql-scripts/pagila-schema.sql", "r").read())
                curs.execute(open("../sql-scripts/pagila-insert-data.sql", "r").read())
                conn.commit()
                print('Pagila SQL scripts executed')
    except (psycopg2.OperationalError, psycopg2.DatabaseError, FileNotFoundError) as err:
        print(create_pagila_db.__name__, err)
        close_conn()
        exit(1)

If the Python script executes correctly, you should see output indicating there are now 28 tables in our master RDS instance’s database.

screen_shot_2019-08-08_at_7_13_11_pm

pgAdmin

pgAdmin is a favorite tool for interacting with and managing PostgreSQL databases. According to its website, pgAdmin is the most popular and feature-rich Open Source administration and development platform for PostgreSQL.

The project includes an optional Docker Swarm stack.yml file. The stack will create a set of three Docker containers, including a local copy of PostgreSQL 11.4, Adminer, and pgAdmin 4. Having a local copy of PostgreSQL, using the official Docker image, is helpful for development and trouble-shooting RDS issues.

screen_shot_2019-08-10_at_1_43_24_pm.png

Use the following commands to deploy the Swarm stack.

# create stack
docker swarm init
docker stack deploy -c stack.yml postgres

# get status of new containers
docker stack ps postgres --no-trunc
docker container ls

If you do not want to spin up the whole Docker Swarm stack, you could use the docker run command to create just a single pgAdmin Docker container. The pgAdmin 4 Docker image being used is the image recommended by pgAdmin.

docker pull dpage/pgadmin4

docker run -p 81:80 \
  -e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \
  -e "PGADMIN_DEFAULT_PASSWORD=SuperSecret" \
  -d dpage/pgadmin4

docker container ls | grep pgadmin4

Database Server Configuration

Once pgAdmin is up and running, we can configure the master and read replica database servers (RDS instances) using the connection string information from your database.ini file or from the Amazon RDS Dashboard. Below, I am configuring the master RDS instance (server).

screen_shot_2019-08-08_at_7_25_27_pm

With that task complete, below, we see the master RDS instance and the read replica, as well as my local Docker instance configured in pgAdmin (left side of screengrab). Note how the Pagila database has been replicated automatically, from the RDS master to the read replica instance.

screen_shot_2019-08-08_at_7_29_00_pm

Building SQL Queries

Switching to the Query tab, we can run regular SQL queries against any of the database instances. Below, I have run a simple SELECT query against the master RDS instance’s Pagila database, returning the complete list of movie titles, along with their genre and release date.

screen_shot_2019-08-08_at_7_27_58_pm

The pgAdmin Query tool even includes an Explain tab to view a graphical representation of the same query, very useful for optimization. Here we see the same query, showing an analysis of the execution order. A popup window displays information about the selected object.

screen_shot_2019-08-08_at_7_28_35_pm

Query the Read Replica

To demonstrate the use of the read replica, below I’ve run the same query against the RDS read replica’s copy of the Pagila database. Any schema and data changes against the master instance are replicated to the read replica(s).

screen_shot_2019-08-08_at_7_30_14_pm

Adminer

Adminer is another good general-purpose database management tool, similar to pgAdmin, but with a few different capabilities. According to its website, with Adminer, you get a tidy user interface, rich support for multiple databases, performance, and security, all from a single PHP file. Adminer is my preferred tool for database admin tasks. Amazingly, Adminer works with MySQL, MariaDB, PostgreSQL, SQLite, MS SQL, Oracle, SimpleDB, Elasticsearch, and MongoDB.

Below, we see the Pagila database’s tables and views displayed in Adminer, along with some useful statistical information about each database object.

screen_shot_2019-08-09_at_7_04_07_am

Similar to pgAdmin, we can also run queries, along with other common development and management tasks, from within the Adminer interface.

screen_shot_2019-08-09_at_7_05_16_am

Import Pagila with Adminer

Another great feature of Adminer is the ability to easily import and export data. As an alternative to Python, you could import the Pagila data using Adminer’s SQL file import function. Below, you see an example of importing the Pagila database objects into the Pagila database, using the file upload function.

screen_shot_2019-08-09_at_7_27_53_am.png

IDE

For writing my AWS infrastructure as code files and Python scripts, I prefer JetBrains PyCharm Professional Edition (v19.2). PyCharm, like all the JetBrains IDEs, has the ability to connect to and manage PostgreSQL database. You can write and run SQL queries, including the Pagila SQL import scripts. Microsoft Visual Studio Code is another excellent, free choice, available on multiple platforms.

screen_shot_2019-08-11_at_9_40_57_pm

Python and RDS

Although our IDE, pgAdmin, and Adminer are useful to build and test our queries, ultimately, we still need to connect to the Amazon RDS PostgreSQL instances and perform data manipulation from our application code. The GitHub repository includes a sample python script, python-scripts/query_postgres.py. This script uses the same Python packages and connection functions as our Pagila data creation script we ran earlier. This time we will perform the same SELECT query using Python as we did previously with pgAdmin and Adminer.

cd python-scripts
python3 ./query_postgres.py

With a successful database connection established, the scripts primary function, get_movies(return_count), performs the SELECT query. The function accepts an integer representing the desired number of movies to return from the SELECT query. A separate function within the script handles closing the database connection when the query is finished.

def get_movies(return_count=100):
    """
    Queries for all films, by genre and year
    """

    try:
        global conn
        with conn:
            with conn.cursor() as curs:
                curs.execute("""
                    SELECT title AS title, name AS genre, release_year AS released
                    FROM film f
                             JOIN film_category fc
                                  ON f.film_id = fc.film_id
                             JOIN category c
                                  ON fc.category_id = c.category_id
                    ORDER BY title
                    LIMIT %s;
                """, (return_count,))

                movies = []
                row = curs.fetchone()
                while row is not None:
                    movies.append(row)
                    row = curs.fetchone()

                return movies
    except (psycopg2.OperationalError, psycopg2.DatabaseError) as err:
        print(get_movies.__name__, err)
    finally:
        close_conn()


def main():
    set_connection('docker')
    for movie in get_movies(10):
        print('Movie: {0}, Genre: {1}, Released: {2}'
              .format(movie[0], movie[1], movie[2]))

Below, we see an example of the Python script’s formatted output, limited to only the first ten movies.

screen_shot_2019-08-13_at_10_51_47_pm.png

Using the Read Replica

For better application performance, it may be optimal to redirect some or all of the database reads to the read replica, while leaving writes, updates, and deletes to hit the master instance. The script can be easily modified to execute the same query against the read replica rather than the master RDS instance by merely passing the desired section, ‘replica’ versus ‘master’, in the call to the set_connection(section) function. The section parameter refers to one of the two sections in the database.ini file. The configparser module will handle retrieving the correct connection information.

set_connection('replica')

Cleaning Up

When you are finished with the demonstration, the easiest way to clean up all the AWS resources and stop getting billed is to delete the two CloudFormation stacks using the AWS CLI, in the following order.

aws cloudformation delete-stack \
  --stack-name RDSDemoStack

# wait until the above resources are completely deleted
aws cloudformation delete-stack \
  --stack-name RDSEventDemoStack

You should receive the following SMS notifications as the first CloudFormation stack is being deleted.

img-2841

You can delete the running Docker stack using the following command. Note, you will lose all your pgAdmin server connection information, along with your local Pagila database.

docker stack rm postgres

Conclusion

In this brief post, we just scraped the surface of the many benefits and capabilities of Amazon RDS for PostgreSQL. The best way to learn PostgreSQL and the benefits of Amazon RDS is by setting up your own RDS instance, insert some sample data, and start writing queries in your favorite database client or programming language.

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , ,

Leave a comment

Managing AWS Infrastructure as Code using Ansible, CloudFormation, and CodeBuild

Introduction

When it comes to provisioning and configuring resources on the AWS cloud platform, there is a wide variety of services, tools, and workflows you could choose from. You could decide to exclusively use the cloud-based services provided by AWS, such as CodeBuild, CodePipeline, CodeStar, and OpsWorks. Alternatively, you could choose open-source software (OSS) for provisioning and configuring AWS resources, such as community editions of Jenkins, HashiCorp Terraform, Pulumi, Chef, and Puppet. You might also choose to use licensed products, such as Octopus Deploy, TeamCity, CloudBees Core, Travis CI Enterprise, and XebiaLabs XL Release. You might even decide to write your own custom tools or scripts in Python, Go, JavaScript, Bash, or other common languages.

The reality in most enterprises I have worked with, teams integrate a combination of AWS services, open-source software, custom scripts, and occasionally licensed products to construct complete, end-to-end, infrastructure as code-based workflows for provisioning and configuring AWS resources. Choices are most often based on team experience, vendor relationships, and an enterprise’s specific business use cases.

In the following post, we will explore one such set of easily-integrated tools for provisioning and configuring AWS resources. The tool-stack is comprised of Red Hat Ansible, AWS CloudFormation, and AWS CodeBuild, along with several complementary AWS technologies. Using these tools, we will provision a relatively simple AWS environment, then deploy, configure, and test a highly-available set of Apache HTTP Servers. The demonstration is similar to the one featured in a previous post, Getting Started with Red Hat Ansible for Google Cloud Platform.

ansible-aws-stack2.png

Why Ansible?

With its simplicity, ease-of-use, broad compatibility with most major cloud, database, network, storage, and identity providers amongst other categories, Ansible has been a popular choice of Engineering teams for configuration-management since 2012. Given the wide variety of polyglot technologies used within modern Enterprises and the growing predominance of multi-cloud and hybrid cloud architectures, Ansible provides a common platform for enabling mature DevOps and infrastructure as code practices. Ansible is easily integrated with higher-level orchestration systems, such as AWS CodeBuild, Jenkins, or Red Hat AWX and Tower.

Technologies

The primary technologies used in this post include the following.

Red Hat Ansible

ansibleAnsible, purchased by Red Hat in October 2015, seamlessly provides workflow orchestration with configuration management, provisioning, and application deployment in a single platform. Unlike similar tools, Ansible’s workflow automation is agentless, relying on Secure Shell (SSH) and Windows Remote Management (WinRM). If you are interested in learning more on the advantages of Ansible, they’ve published a whitepaper on The Benefits of Agentless Architecture.

According to G2 Crowd, Ansible is a clear leader in the Configuration Management Software category, ranked right behind GitLab. Competitors in the category include GitLab, AWS Config, Puppet, Chef, Codenvy, HashiCorp Terraform, Octopus Deploy, and JetBrains TeamCity.

AWS CloudFormation

Deployment__Management_copy_AWS_CloudFormation-512

According to AWS, CloudFormation provides a common language to describe and provision all the infrastructure resources within AWS-based cloud environments. CloudFormation allows you to use a JSON- or YAML-based template to model and provision, in an automated and secure manner, all the resources needed for your applications across all AWS regions and accounts.

Codifying your infrastructure, often referred to as ‘Infrastructure as Code,’ allows you to treat your infrastructure as just code. You can author it with any IDE, check it into a version control system, and review the files with team members before deploying it.

AWS CodeBuild

code-build-console-iconAccording to AWS, CodeBuild is a fully managed continuous integration service that compiles your source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue.

CloudBuild integrates seamlessly with other AWS Developer tools, including CodeStar, CodeCommit, CodeDeploy, and CodePipeline.

According to G2 Crowd, the main competitors to AWS CodeBuild, in the Build Automation Software category, include Jenkins, CircleCI, CloudBees Core and CodeShip, Travis CI, JetBrains TeamCity, and Atlassian Bamboo.

Other Technologies

In addition to the major technologies noted above, we will also be leveraging the following services and tools to a lesser extent, in the demonstration:

  • AWS CodeCommit
  • AWS CodePipeline
  • AWS Systems Manager Parameter Store
  • Amazon Simple Storage Service (S3)
  • AWS Identity and Access Management (IAM)
  • AWS Command Line Interface (CLI)
  • CloudFormation Linter
  • Apache HTTP Server

Demonstration

Source Code

All source code for this post is contained in two GitHub repositories. The CloudFormation templates and associated files are in the ansible-aws-cfn GitHub repository. The Ansible Roles and related files are in the ansible-aws-roles GitHub repository. Both repositories may be cloned using the following commands.

git clone --branch master --single-branch --depth 1 --no-tags \ 
  https://github.com/garystafford/ansible-aws-cfn.git

git clone --branch master --single-branch --depth 1 --no-tags \
  https://github.com/garystafford/ansible-aws-roles.git

Development Process

The general process we will follow for provisioning and configuring resources in this demonstration are as follows:

  • Create an S3 bucket to store the validated CloudFormation templates
  • Create an Amazon EC2 Key Pair for Ansible
  • Create two AWS CodeCommit Repositories to store the project’s source code
  • Put parameters in Parameter Store
  • Write and test the CloudFormation templates
  • Configure Ansible and AWS Dynamic Inventory script
  • Write and test the Ansible Roles and Playbooks
  • Write the CodeBuild build specification files
  • Create an IAM Role for CodeBuild and CodePipeline
  • Create and test CodeBuild Projects and CodePipeline Pipelines
  • Provision, deploy, and configure the complete web platform to AWS
  • Test the final web platform

Prerequisites

For this demonstration, I will assume you already have an AWS account, the AWS CLI, Python, and Ansible installed locally, an S3 bucket to store the final CloudFormation templates and an Amazon EC2 Key Pair for Ansible to use for SSH.

 Continuous Integration and Delivery Overview

In this demonstration, we will be building multiple CI/CD pipelines for provisioning and configuring our resources to AWS, using several AWS services. These services include CodeCommit, CodeBuild, CodePipeline, Systems Manager Parameter Store, and Amazon Simple Storage Service (S3). The diagram below shows the complete CI/CD workflow we will build using these AWS services, along with Ansible.

aws_devops

AWS CodeCommit

According to Amazon, AWS CodeCommit is a fully-managed source control service that makes it easy to host secure and highly scalable private Git repositories. CodeCommit eliminates the need to operate your own source control system or worry about scaling its infrastructure.

Start by creating two AWS CodeCommit repositories to hold the two GitHub projects your cloned earlier. Commit both projects to your own AWS CodeCommit repositories.

screen_shot_2019-07-26_at_9_02_54_pm

Configuration Management

We have several options for storing the configuration values necessary to provision and configure the resources on AWS. We could set configuration values as environment variables directly in CodeBuild. We could set configuration values from within our Ansible Roles. We could use AWS Systems Manager Parameter Store to store configuration values. For this demonstration, we will use a combination of all three options.

AWS Systems Manager Parameter Store

According to Amazon, AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, and license codes as parameter values, as either plain text or encrypted.

The demonstration uses two CloudFormation templates. The two templates have several parameters. A majority of those parameter values will be stored in Parameter Store, retrieved by CloudBuild, and injected into the CloudFormation template during provisioning.

screen_shot_2019-07-26_at_9_38_33_pm

The Ansible GitHub project includes a shell script, parameter_store_values.sh, to put the necessary parameters into Parameter Store. The script requires the AWS Command Line Interface (CLI) to be installed locally. You will need to change the KEY_PATH key value in the script (snippet shown below) to match the location your private key, part of the Amazon EC2 Key Pair you created earlier for use by Ansible.

KEY_PATH="/path/to/private/key"

# put encrypted parameter to Parameter Store
aws ssm put-parameter \
  --name $PARAMETER_PATH/ansible_private_key \
  --type SecureString \
  --value "file://${KEY_PATH}" \
  --description "Ansible private key for EC2 instances" \
  --overwrite

SecureString

Whereas all other parameters are stored in Parameter Store as String datatypes, the private key is stored as a SecureString datatype. Parameter Store uses an AWS Key Management Service (KMS) customer master key (CMK) to encrypt the SecureString parameter value. The IAM Role used by CodeBuild (discussed later) will have the correct permissions to use the KMS key to retrieve and decrypt the private key SecureString parameter value.

screen_shot_2019-07-26_at_9_41_42_pm

CloudFormation

The demonstration uses two CloudFormation templates. The first template, network-stack.template, contains the AWS network stack resources. The template includes one VPC, one Internet Gateway, two NAT Gateways, four Subnets, two Elastic IP Addresses, and associated Route Tables and Security Groups. The second template, compute-stack.template, contains the webserver compute stack resources. The template includes an Auto Scaling Group, Launch Configuration, Application Load Balancer (ALB), ALB Listener, ALB Target Group, and an Instance Security Group. Both templates originated from the AWS CloudFormation template sample library, and were modified for this demonstration.

The two templates are located in the cfn_templates directory of the CloudFormation project, as shown below in the tree view.

.
├── LICENSE.md
├── README.md
├── buildspec_files
│   ├── build.sh
│   └── buildspec.yml
├── cfn_templates
│   ├── compute-stack.template
│   └── network-stack.template
├── codebuild_projects
│   ├── build.sh
│   └── cfn-validate-s3.json
├── codepipeline_pipelines
│   ├── build.sh
│   └── cfn-validate-s3.json
└── requirements.txt

The templates require no modifications for the demonstration. All parameters are in Parameter store or set by the Ansible Roles, and consumed by the Ansible Playbooks via CodeBuild.

Ansible

We will use Red Hat Ansible to provision the network and compute resources by interacting directly with CloudFormation, deploy and configure Apache HTTP Server, and finally, perform final integration tests of the system. In my opinion, the closest equivalent to Ansible on the AWS platform is AWS OpsWorks. OpsWorks lets you use Chef and Puppet (direct competitors to Ansible) to automate how servers are configured, deployed, and managed across Amazon EC2 instances or on-premises compute environments.

Ansible Config

To use Ansible with AWS and CloudFormation, you will first want to customize your project’s ansible.cfg file to enable the aws_ec2 inventory plugin. Below is part of my configuration file as a reference.

[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp
fact_caching_timeout = 300

host_key_checking = False
roles_path = roles
inventory = inventories/hosts
remote_user = ec2-user
private_key_file = ~/.ssh/ansible

[inventory]
enable_plugins = host_list, script, yaml, ini, auto, aws_ec2

Ansible Roles

According to Ansible, Roles are ways of automatically loading certain variable files, tasks, and handlers based on a known file structure. Grouping content by roles also allows easy sharing of roles with other users. For the demonstration, I have written four roles, located in the roles directory, as shown below in the project tree view. The default, common role is not used in this demonstration.

.
├── LICENSE.md
├── README.md
├── ansible.cfg
├── buildspec_files
│   ├── buildspec_compute.yml
│   ├── buildspec_integration_tests.yml
│   ├── buildspec_network.yml
│   └── buildspec_web_config.yml
├── codebuild_projects
│   ├── ansible-test.json
│   ├── ansible-web-config.json
│   ├── build.sh
│   ├── cfn-compute.json
│   ├── cfn-network.json
│   └── notes.md
├── filter_plugins
├── group_vars
├── host_vars
├── inventories
│   ├── aws_ec2.yml
│   ├── ec2.ini
│   ├── ec2.py
│   └── hosts
├── library
├── module_utils
├── notes.md
├── parameter_store_values.sh
├── playbooks
│   ├── 10_cfn_network.yml
│   ├── 20_cfn_compute.yml
│   ├── 30_web_config.yml
│   └── 40_integration_tests.yml
├── production
├── requirements.txt
├── roles
│   ├── cfn_compute
│   ├── cfn_network
│   ├── common
│   ├── httpd
│   └── integration_tests
├── site.yml
└── staging

The four roles include a role for provisioning the network, the cfn_network role. A role for configuring the compute resources, the cfn_compute role. A role for deploying and configuring the Apache servers, the httpd role. Finally, a role to perform final integration tests of the platform, the integration_tests role. The individual roles help separate the project’s major parts, network, compute, and middleware, into logical code files. Each role was initially built using Ansible Galaxy (ansible-galaxy init). They follow Galaxy’s standard file structure, as shown in the tree view below, of the cfn_network role.

.
├── README.md
├── defaults
│   └── main.yml
├── files
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── tasks
│   ├── create.yml
│   ├── delete.yml
│   └── main.yml
├── templates
├── tests
│   ├── inventory
│   └── test.yml
└── vars
    └── main.yml

Testing Ansible Roles

In addition to checking each role during development and on each code commit with Ansible Lint, each role contains a set of unit tests, in the tests directory, to confirm the success or failure of the role’s tasks. Below we see a basic set of tests for the cfn_compute role. First, we gather Facts about the deployed EC2 instances. Facts information Ansible can automatically derive from your remote systems. We check the facts for expected properties of the running EC2 instances, including timezone, Operating System, major OS version, and the UserID. Note the use of the failed_when conditional. This Ansible playbook error handling conditional is used to confirm the success or failure of tasks.

---
- name: Test cfn_compute Ansible role
  gather_facts: True
  hosts: tag_Group_webservers

  pre_tasks:
  - name: List all ansible facts
    debug:
      msg: "{{ ansible_facts }}"

  tasks:
  - name: Check if EC2 instance's timezone is set to 'UTC'
    debug:
      msg: Timezone is UTC
    failed_when: ansible_facts['date_time']['tz'] != 'UTC'

  - name: Check if EC2 instance's OS is 'Amazon'
    debug:
      msg: OS is Amazon
    failed_when: ansible_facts['distribution_file_variety'] != 'Amazon'

  - name: Check if EC2 instance's OS major version is '2018'
    debug:
      msg: OS major version is 2018
    failed_when: ansible_facts['distribution_major_version'] != '2018'

  - name: Check if EC2 instance's UserID is 'ec2-user'
    debug:
      msg: UserID is ec2-user
    failed_when: ansible_facts['user_id'] != 'ec2-user'

If we were to run the test on their own, against the two correctly provisioned and configured EC2 web servers, we would see results similar to the following.

screen_shot_2019-07-26_at_6_55_04_pm

In the cfn_network role unit tests, below, note the use of the Ansible cloudformation_facts module. This module allows us to obtain facts about the successfully completed AWS CloudFormation stack. We can then use these facts to drive additional provisioning and configuration, or testing. In the task below, we get the network CloudFormation stack’s Outputs. These are the exact same values we would see in the stack’s Output tab, from the AWS CloudFormation management console.

---
- name: Test cfn_network Ansible role
  gather_facts: False
  hosts: localhost

  pre_tasks:
    - name: Get facts about the newly created cfn network stack
      cloudformation_facts:
        stack_name: "ansible-cfn-demo-network"
      register: cfn_network_stack_facts

    - name: List 'stack_outputs' from cached facts
      debug:
        msg: "{{ cloudformation['ansible-cfn-demo-network'].stack_outputs }}"

  tasks:
  - name: Check if the AWS Region of the VPC is {{ lookup('env','AWS_REGION') }}
    debug:
      msg: "AWS Region of the VPC is {{ lookup('env','AWS_REGION') }}"
    failed_when: cloudformation['ansible-cfn-demo-network'].stack_outputs['VpcRegion'] != lookup('env','AWS_REGION')

Similar to the CloudFormation templates, the Ansible roles require no modifications. Most of the project’s parameters are decoupled from the code and stored in Parameter Store or CodeBuild buildspec files (discussed next). The few parameters found in the roles, in the defaults/main.yml files are neither account- or environment-specific.

Ansible Playbooks

The roles will be called by our Ansible Playbooks. There is a create and a delete set of tasks for the cfn_network and cfn_compute roles. Either create or delete tasks are accessible through the role, using the main.yml file and referencing the create or delete Ansible Tags.

---
- import_tasks: create.yml
  tags:
    - create

- import_tasks: delete.yml
  tags:
    - delete

Below, we see the create tasks for the cfn_network role, create.yml, referenced above by main.yml. The use of the cloudcormation module in the first task allows us to create or delete AWS CloudFormation stacks and demonstrates the real power of Ansible—the ability to execute complex AWS resource provisioning, by extending its core functionality via a module. By switching the Cloud module, we could just as easily provision resources on Google Cloud, Azure, AliCloud, OpenStack, or VMWare, to name but a few.

---
- name: create a stack, pass in the template via an S3 URL
  cloudformation:
    stack_name: "{{ stack_name }}"
    state: present
    region: "{{ lookup('env','AWS_REGION') }}"
    disable_rollback: false
    template_url: "{{ lookup('env','TEMPLATE_URL') }}"
    template_parameters:
      VpcCIDR: "{{ lookup('env','VPC_CIDR') }}"
      PublicSubnet1CIDR: "{{ lookup('env','PUBLIC_SUBNET_1_CIDR') }}"
      PublicSubnet2CIDR: "{{ lookup('env','PUBLIC_SUBNET_2_CIDR') }}"
      PrivateSubnet1CIDR: "{{ lookup('env','PRIVATE_SUBNET_1_CIDR') }}"
      PrivateSubnet2CIDR: "{{ lookup('env','PRIVATE_SUBNET_2_CIDR') }}"
      TagEnv: "{{ lookup('env','TAG_ENVIRONMENT') }}"
    tags:
      Stack: "{{ stack_name }}"

The CloudFormation parameters in the above task are mainly derived from environment variables, whose values were retrieved from the Parameter Store by CodeBuild and set in the environment. We obtain these external values using Ansible’s Lookup Plugins. The stack_name variable’s value is derived from the role’s defaults/main.yml file. The task variables use the Python Jinja2 templating system style of encoding.

variables

The associated Ansible Playbooks, which call the tasks, are located in the playbooks directory, as shown previously in the tree view. The playbooks define a few required parameters, like where the list of hosts will be derived and calls the appropriate roles. For our simple demonstration, only a single role is called per playbook. Typically, in a larger project, you would call multiple roles from a single playbook. Below, we see the Network playbook, playbooks/10_cfn_network.yml, which calls the cfn_network role.

---
- name: Provision VPC and Subnets
  hosts: localhost
  connection: local
  gather_facts: False

  roles:
    - role: cfn_network

Dynamic Inventory

Another principal feature of Ansible is demonstrated in the Web Server Configuration playbook, playbooks/30_web_config.yml, shown below. Note the hosts to which we want to deploy and configure Apache HTTP Server is based on an AWS tag value, indicated by the reference to tag_Group_webservers. This indirectly refers to an AWS tag, named Group, with the value of webservers, which was applied to our EC2 hosts by CloudFormation. The ability to generate a Dynamic Inventory, using a dynamic external inventory system, is a key feature of Ansible.

---
- name: Configure Apache Web Servers
  hosts: tag_Group_webservers
  gather_facts: False
  become: yes
  become_method: sudo

  roles:
    - role: httpd

To generate a dynamic inventory of EC2 hosts, we are using the Ansible AWS EC2 Dynamic Inventory script, inventories/ec2.py and inventories/ec2.ini files. The script dynamically queries AWS for all the EC2 hosts containing specific AWS tags, belonging to a particular Security Group, Region, Availability Zone, and so forth.

I have customized the AWS EC2 Dynamic Inventory script’s configuration in the inventories/aws_ec2.yml file. Amongst other configuration items, the file defines  keyed_groups. This instructs the script to inventory EC2 hosts according to their unique AWS tags and tag values.

plugin: aws_ec2
remote_user: ec2-user
private_key_file: ~/.ssh/ansible
regions:
  - us-east-1
keyed_groups:
  - key: tags.Name
    prefix: tag_Name_
    separator: ''
  - key: tags.Group
    prefix: tag_Group_
    separator: ''
hostnames:
  - dns-name
  - ip-address
  - private-dns-name
  - private-ip-address
compose:
  ansible_host: ip_address

Once you have built the CloudFormation compute stack in the proceeding section of the demonstration, to build the dynamic EC2 inventory of hosts, you would use the following command.

ansible-inventory -i inventories/aws_ec2.yml --graph

You would then see an inventory of all your EC2 hosts, resembling the following.

@all:
  |--@aws_ec2:
  |  |--ec2-18-234-137-73.compute-1.amazonaws.com
  |  |--ec2-3-95-215-112.compute-1.amazonaws.com
  |--@tag_Group_webservers:
  |  |--ec2-18-234-137-73.compute-1.amazonaws.com
  |  |--ec2-3-95-215-112.compute-1.amazonaws.com
  |--@tag_Name_Apache_Web_Server:
  |  |--ec2-18-234-137-73.compute-1.amazonaws.com
  |  |--ec2-3-95-215-112.compute-1.amazonaws.com
  |--@ungrouped:

Note the two EC2 web servers instances, listed under tag_Group_webservers. They represent the target inventory onto which we will install Apache HTTP Server. We could also use the tag, Name, with the value tag_Name_Apache_Web_Server.

AWS CodeBuild

Recalling our diagram, you will note the use of CodeBuild is a vital part of each of our five DevOps workflows. CodeBuild is used to 1) validate the CloudFormation templates, 2) provision the network resources,  3) provision the compute resources, 4) install and configure the web servers, and 5) run integration tests.

aws_devops

Splitting these processes into separate workflows, we can redeploy the web servers without impacting the compute resources or redeploy the compute resources without affecting the network resources. Often, different teams within a large enterprise are responsible for each of these resources categories—architecture, security (IAM), network, compute, web servers, and code deployments. Separating concerns makes a shared ownership model easier to manage.

Build Specifications

CodeBuild projects rely on a build specification or buildspec file for its configuration, as shown below. CodeBuild’s buildspec file is synonymous to Jenkins’ Jenkinsfile. Each of our five workflows will use CodeBuild. Each CodeBuild project references a separate buildspec file, included in the two GitHub projects, which by now you have pushed to your two CodeCommit repositories.

screen_shot_2019-07-26_at_6_10_59_pm

Below we see an example of the buildspec file for the CodeBuild project that deploys our AWS network resources, buildspec_files/buildspec_network.yml.

version: 0.2

env:
  variables:
    TEMPLATE_URL: "https://s3.amazonaws.com/garystafford_cloud_formation/cf_demo/network-stack.template"
    AWS_REGION: "us-east-1"
    TAG_ENVIRONMENT: "ansible-cfn-demo"
  parameter-store:
    VPC_CIDR: "/ansible_demo/vpc_cidr"
    PUBLIC_SUBNET_1_CIDR: "/ansible_demo/public_subnet_1_cidr"
    PUBLIC_SUBNET_2_CIDR: "/ansible_demo/public_subnet_2_cidr"
    PRIVATE_SUBNET_1_CIDR: "/ansible_demo/private_subnet_1_cidr"
    PRIVATE_SUBNET_2_CIDR: "/ansible_demo/private_subnet_2_cidr"

phases:
  install:
    runtime-versions:
      python: 3.7
    commands:
      - pip install -r requirements.txt -q
  build:
    commands:
      - ansible-playbook -i inventories/aws_ec2.yml playbooks/10_cfn_network.yml --tags create  -v
  post_build:
    commands:
      - ansible-playbook -i inventories/aws_ec2.yml roles/cfn_network/tests/test.yml

There are several distinct sections to the buildspec file. First, in the variables section, we define our variables. They are a combination of three static variable values and five variable values retrieved from the Parameter Store. Any of these may be overwritten at build-time, using the AWS CLI, SDK, or from the CodeBuild management console. You will need to update some of the variables to match your particular environment, such as the TEMPLATE_URL to match your S3 bucket path.

Next, the phases of our build. Again, if you are familiar with Jenkins, think of these as Stages with multiple Steps. The first phase, install, builds a Docker container, in which the build process is executed. Here we are using Python 3.7. We also run a pip command to install the required Python packages from our requirements.txt file. Next, we perform our build phase by executing an Ansible command.

 ansible-playbook \
  -i inventories/aws_ec2.yml \
  playbooks/10_cfn_network.yml --tags create -v

The command calls our playbook, playbooks/10_cfn_network.yml. The command references the create tag. This causes the playbook to run to cfn_network role’s create tasks (roles/cfn_network/tasks/create.yml), as defined in the main.yml file (roles/cfn_network/tasks/main.yml). Lastly, in our post_build phase, we execute our role’s unit tests (roles/cfn_network/tests/test.yml), using a second Ansible command.

CodeBuild Projects

Next, we need to create CodeBuild projects. You can do this using the AWS CLI or from the CodeBuild management console (shown below). I have included individual templates and a creation script in each project, in the codebuild_projects directory, which you could use to build the projects, using the AWS CLI. You would have to modify the JSON templates, replacing all references to my specific, unique AWS resources, with your own. For the demonstration, I suggest creating the five projects manually in the CodeBuild management console, using the supplied CodeBuild project templates as a guide.

screen_shot_2019-07-26_at_6_10_12_pm

CodeBuild IAM Role

To execute our CodeBuild projects, we need an IAM Role or Roles CodeBuild with permission to such resources as CodeCommit, S3, and CloudWatch. For this demonstration, I chose to create a single IAM Role for all workflows. I then allowed CodeBuild to assign the required policies to the Role as needed, which is a feature of CodeBuild.

screen_shot_2019-07-26_at_6_52_23_pm

CodePipeline Pipeline

In addition to CodeBuild, we are using CodePipeline for our first of five workflows. CodePipeline validates the CloudFormation templates and pushes them to our S3 bucket. The pipeline calls the corresponding CodeBuild project to validate each template, then deploys the valid CloudFormation templates to S3.

codepipeline

In true CI/CD fashion, the pipeline is automatically executed every time source code from the CloudFormation project is committed to the CodeCommit repository.

screen_shot_2019-07-26_at_6_12_51_pm

CodePipeline calls CodeBuild, which performs a build, based its buildspec file. This particular CodeBuild buildspec file also demonstrates another ability of CodeBuild, executing an external script. When we have a complex build phase, we may choose to call an external script, such as a Bash or Python script, verses embedding the commands in the buildspec.

version: 0.2

phases:
  install:
    runtime-versions:
      python: 3.7
  pre_build:
    commands:
      - pip install -r requirements.txt -q
      - cfn-lint -v
  build:
    commands:
      - sh buildspec_files/build.sh

artifacts:
  files:
    - '**/*'
  base-directory: 'cfn_templates'
  discard-paths: yes

Below, we see the script that is called. Here we are using both the CloudFormation Linter, cfn-lint, and the cloudformation validate-template command to validate our templates for comparison. The two tools give slightly different, yet relevant, linting results.

#!/usr/bin/env bash

set -e

for filename in cfn_templates/*.*; do
    cfn-lint -t ${filename}
    aws cloudformation validate-template \
      --template-body file://${filename}
done

Similar to the CodeBuild project templates, I have included a CodePipeline template, in the codepipeline_pipelines directory, which you could modify and create using the AWS CLI. Alternatively, I suggest using the CodePipeline management console to create the pipeline for the demo, using the supplied CodePipeline template as a guide.

screen_shot_2019-07-26_at_6_11_51_pm

Below, the stage view of the final CodePipleine pipeline.

screen_shot_2019-07-26_at_6_12_26_pm

Build the Platform

With all the resources, code, and DevOps workflows in place, we should be ready to build our platform on AWS. The CodePipeline project comes first, to validate the CloudFormation templates and place them into your S3 bucket. Since you are probably not committing new code to the CloudFormation file CodeCommit repository,  which would trigger the pipeline, you can start the pipeline using the AWS CLI (shown below) or via the management console.

# list names of pipelines
aws codepipeline list-pipelines

# execute the validation pipeline
aws codepipeline start-pipeline-execution --name cfn-validate-s3

screen_shot_2019-07-26_at_6_08_03_pm

The pipeline should complete within a few seconds.

screen_shot_2019-07-26_at_10_12_53_pm.png

Next, execute each of the four CodeBuild projects in the following order.

# list the names of the projects
aws codebuild list-projects

# execute the builds in order
aws codebuild start-build --project-name cfn-network
aws codebuild start-build --project-name cfn-compute

# ensure EC2 instance checks are complete before starting
# the ansible-web-config build!
aws codebuild start-build --project-name ansible-web-config
aws codebuild start-build --project-name ansible-test

As the code comment above states, be careful not to start the ansible-web-config build until you have confirmed the EC2 instance Status Checks have completed and have passed, as shown below. The previous, cfn-compute build will complete when CloudFormation finishes building the new compute stack. However, the fact CloudFormation finished does not indicate that the EC2 instances are fully up and running. Failure to wait will result in a failed build of the ansible-web-config CodeBuild project, which installs and configures the Apache HTTP Servers.

screen_shot_2019-07-26_at_6_27_52_pm

Below, we see the cfn_network CodeBuild project first building a Python-based Docker container, within which to perform the build. Each build is executed in a fresh, separate Docker container, something that can trip you up if you are expecting a previous cache of Ansible Facts or previously defined environment variables, persisted across multiple builds.

screen_shot_2019-07-26_at_6_15_12_pm

Below, we see the two completed CloudFormation Stacks, a result of our CodeBuild projects and Ansible.

screen_shot_2019-07-26_at_6_44_43_pm

The fifth and final CodeBuild build tests our platform by attempting to hit the Apache HTTP Server’s default home page, using the Application Load Balancer’s public DNS name.

screen_shot_2019-07-26_at_6_32_09_pm

Below, we see an example of what happens when a build fails. In this case, one of the final integration tests failed to return the expected results from the ALB endpoint.

screen_shot_2019-07-26_at_6_40_37_pm

Below, with the bug is fixed, we rerun the build, which re-executed the tests, successfully.

screen_shot_2019-07-26_at_6_38_21_pm

We can manually confirm the platform is working by hitting the same public DNS name of the ALB as our tests in our browser. The request should load-balance our request to one of the two running web server’s default home page. Normally, at this point, you would deploy your application to Apache, using a software continuous deployment tool, such as Jenkins, CodeDeploy, Travis CI, TeamCity, or Bamboo.

screen_shot_2019-07-26_at_6_39_26_pm

Cleaning Up

To clean up the running AWS resources from the demonstration, first delete the CloudFormation compute stack, then delete the network stack. To do so, execute the following commands, one at a time. The commands call the same playbooks we called to create the stacks, except this time, we use the delete tag, as opposed to the create tag.

# first delete cfn compute stack
ansible-playbook \ 
  -i inventories/aws_ec2.yml \ 
  playbooks/20_cfn_compute.yml -t delete -v

# then delete cfn network stack
ansible-playbook \ 
  -i inventories/aws_ec2.yml \ 
  playbooks/10_cfn_network.yml -t delete -v

You should observe the following output, indicating both CloudFormation stacks have been deleted.

screen_shot_2019-07-26_at_7_12_38_pm

Confirm the stacks were deleted from the CloudFormation management console or from the AWS CLI.

 

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , , , , , , ,

Leave a comment

Building Asynchronous, Serverless Alexa Skills with AWS Lambda, DynamoDB, S3, and Node.js

Introduction

In the following post, we will use the new version 2 of the Alexa Skills Kit, AWS Lambda, Amazon DynamoDB, Amazon S3, and the latest LTS version Node.js, to create an Alexa Custom Skill. According to Amazon, a custom skill allows you to define the requests the skill can handle (intents) and the words users say to invoke those requests (utterances).

If you want to compare the development of an Alexa Custom Skill with those of Google and Azure, in addition to this post, please read my previous two posts in this series, Building and Integrating LUIS-enabled Chatbots with Slack, using Azure Bot Service, Bot Builder SDK, and Cosmos DB and Building Serverless Actions for Google Assistant with Google Cloud Functions, Cloud Datastore, and Cloud Storage. All three of the article’s demonstrations are written in Node.js, all three leverage their cloud platform’s machine learning-based Natural Language Understanding services, and all three take advantage of NoSQL database and storage services available on their respective cloud platforms.

AWS Technologies

The final high-level architecture of our Alexa Custom Skill will look as follows.

Alexa Skill Final Architecture v2.png

Here is a brief overview of the key AWS technologies we will incorporate into our Skill’s architecture.

Alexa Skills Kit

According to Amazon, the Alexa Skills Kit (ASK) is a collection of self-service APIs, tools, documentation, and code samples that makes it possible to add skills to Alexa. The Alexa Skills Kit supports building different types of skills. Currently, Alexa skill types include Custom, Smart Home, Video, Flash Briefing, and List Skills. Each skill type makes use of a different Alexa Skill API.

AWS Serverless Platform

To create a custom skill for Alexa, you currently have the choice of using an AWS Lambda function or a web service. The AWS Lambda is part of an ecosystem of Cloud services and Developer tools, Amazon refers to as the AWS Serverless Platform. The platform’s services are designed to support the development and hosting of highly-performant, enterprise-grade serverless applications.

In this post, we will leverage three of the AWS Serverless Platform’s services, including Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), and AWS Lambda.

Node.js

AWS Lamba supports multiple programming languages, including Node.js (JavaScript), Python, Java (Java 8 compatible), and C# (.NET Core) and Go. All are excellent choices for writing modern serverless functions. For this post, we will use Node.js. According to Node.js Foundation, Node.js is an asynchronous event-driven JavaScript runtime built on Chrome’s V8 JavaScript engine.

In April 2018, AWS Lamba announced support for the Node.js 8.10 runtime, which is the current Long Term Support (LTS) version of Node.js. Node 8, also known as Project Carbon, was the first LTS version of Node to support async/await with Promises. Async/await is the new way of handling asynchronous operations in Node.js. We will make use of async/await and Promises with the custom skill.

Demonstration

To demonstrate Alexa Custom Skills we will build an informational skill that responds to the user with interesting facts about Azure¹, Microsoft’s Cloud computing platform (Alexa talking about Azure, ironic, I know). This is not an official Microsoft skill; it is only used for this demonstration and has not been published.

Source Code

All open-source code for this post can be found on GitHub. Code samples in this post are displayed as GitHub Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.

Important, this post and the associated source code were updated from v1.0 to v2.0 on 13 August 2018. You should clone the GitHub project again, to correspond with this revised post, if you originally cloned the project before 14 August 2018. Code changes were significant.

Objectives

This objective of the fact-based skill will be to demonstrate the following.

  • Build, deploy, and test an Alexa Custom Skill using AWS Lambda and Node.js;
  • Use DynamoDB to store and retrieve Alexa voice responses;
  • Maintain a count of user’s questions in DynamoDB using atomic counters;
  • Use Amazon S3 to store and retrieve images, used in Display Cards;
  • Log Alexa Skill activities using Amazon CloudWatch Logs;

Steps to Build

Building the Azure fact skill will involve the following steps.

  • Design the Alexa skill’s voice interaction model;
  • Design the skill’s Display Cards for Alexa-enabled products, to enhance the voice experience;
  • Create the skill’s DynamoDB table and import the responses the skill will return;
  • Create an S3 bucket and upload the images used for the Display Cards;
  • Write the Alexa Skill, which involves mapping the user’s spoken input to the intents your cloud-based service can handle;
  • Write the Lambda function, which involves responding to the user’s utterances, by building and returning appropriate voice and display card responses, from DynamoDB and S3;
  • Extend the default ASK-generated AWS IAM Role, to allow the Lambda to update DynamoDB;
  • Deploy the skill;
  • Test the skill;

Let’s explore each step in detail.

Voice Interaction Model

First, we must design the fact skill’s voice interaction model. We need to consider the way we want the user to interact with the skill. What is the user’s conversational journey? How do they invoke your skill? How will the user provide intent?

This skill will require two intent slot values, the fact the user is interested in (i.e. ‘global infrastructure’) and the user’s first name (i.e. ‘Susan’). We will train the skill to allow Alexa to query the user for each slot value, but also allow the user to provide either or both values in the initial intent invocation. We will also allow the user to request a random fact.

Shown below in the Alexa Skills Kit Development Console Test tab are three examples of interactions the skill is trained to understand and handle:

  1. The first example on the left invokes the skill with no intent (‘Alexa, load Azure Tech Facts). The user is led through a series of three questions to obtain the full intent.
  2. The center example is similar, however, the initial invocation contains a partial intent (‘Alexa, ask Azure Tech Facts for a fact about certifications’). Alexa must still ask for the user’s name.
  3. Lastly, the example on the right is a so-called ‘one-shot’ invocation (‘Alexa, ask Azure Tech Facts about Azure’s platforms for Gary’). The user’s invocation of the skill contains a complete intent, allowing Alexa to respond immediately with a fact about Azure platforms.

alexa-skill-post-020

In all cases, our skill has the ability to continue to provide the user with additional facts if they chose, or they may cancel at any time.

We also need to design how Alexa will respond. What is the persona will Alexa assume through her words, phrases, and use of Speech Synthesis Markup Language (SSML).

User Interaction Previews

Here are a few examples of interactions with the final Alexa skill using an iPhone 8 and the Alexa App. They are intended to show the rich conversational capabilities of custom skills more so the than the display, which is pretty poor on the Alexa App as compared to the Echo Show or even Echo Spot.

Example 1: Indirect Invocation

The first example shows a basic interaction with our Alexa skill. It demonstrates an indirect invocation, a user utterance without initial intent. It also illustrates several variations of user utterances (YouTube).

Example 2: Direct Invocation

The second example of an interaction our skill demonstrates a direct invocation, in which the initial user utterance contains intent. It also demonstrates the user following up with additional requests (YouTube).

Example 3: Direct Invocation, Help, Problem

Lastly, another direct invocation demonstrates the use of the Help Intent. You also see an example of when Alexa does not understand the user’s utterance.  The user is able to repeat their request, more clearly (YouTube).

Visual Interaction Model

Many Alexa-enabled devices are capable of both vocal and visual responses. Designing for a multimodal user experience is important. The instructional skill will provide vocal responses, as well as Display Cards optimized for the Amazon Echo Show. The skill contains a basic design for the Display Card shown during the initial invocation, where there is no intent uttered by the user.

alexa-skill-post-021

The fact skill also contains a Display Card, designed to present the final Alexa response to the user’s intent. The content of the vocal and visual response is returned from DynamoDB via the Lambda function. The random Azure icons, available from Microsoft, are hosted in an S3 bucket. Each fact response is unique, as well as the icon associated with the fact.

alexa-skill-post-022

The Display Cards will also work on other Alexa-enabled screen-based products. Shown below is the same card on an iPhone 8 using the Amazon Alexa app. This is the same app shown in the videos, above.

alexa-skill-post-027

DynamoDB

Next, we create the DynamoDB table used to store the facts the Alexa skill will respond with when invoked by the user. DynamoDB is Amazon’s non-relational database that delivers reliable performance at any scale. DynamoDB consists of three basic components: tables, items, and attributes.

There are numerous ways to create a DynamoDB table. For simplicity, I created the AzureFacts DynamoDB table using the AWS CLI (gist). You could also choose CloudFormation, or create the table using any of nine or more programming languages with an AWS SDK.

The AzureFacts table’s schema has four key/value pair attributes per item: Fact, Response, Image, and Hits. The Fact attribute, a string, contains the name of the fact the user is seeking. The Fact attribute also serves as the table’s unique partition key. The Response attribute, a string, contains the conversational response Alexa will return. The Image attribute, a string, contains the name of the image in the S3 bucket displayed by Alexa. Lastly, the Hits attribute, a number, stores the number of user requests for a particular fact.

Importing Table Items

After the DynamoDB table is created, the pre-defined facts are imported into the empty table using AWS CLI (gist). The JSON-formatted data file, AzureFacts.json, is included with the source code on GitHub.

The resulting table should appear as follows in the AWS Management Console.

alexa-skill-post-004

Note the imported items shown below. The Hits counts reflect the number of times each fact has been requested.

alexa-skill-post-005

Shown below is a detailed view of a single item that was imported into the DynamoDB table.

alexa-skill-post-006

Amazon S3 Image Bucket

Next, we create the Amazon S3 bucket, which will house the images, actually Azure icons as PNGs, returned by Alexa with each fact. Again, I used the AWS CLI for simplicity (gist).

The images can be uploaded manually to the bucket through a web browser, or programmatically, using the AWS CLI or SDKs. You will need to ensure the images are made public so they can be displayed by Alexa.

alexa-skill-post-007

Alexa Skill

Next, we create the actual Alexa custom skill. I have used version 2 of the Alexa Skills Kit (ASK) Software Development Kit (SDK) for Node.js and the new ASK Command Line Interface (ASK CLI) to create the skill. The ASK SDK v2 for Node.js was recently released in April 2018. If you have previously written Alexa skills using version 1 of the Node.js SDK, the creation of a new project and the format of the Lambda Node.js code is somewhat different. I strongly suggest reviewing the example skills provided by Amazon on GitHub.

With version 1, I would have likely used the Alexa Skills Kit Development Console to develop and deploy the skill, and separate IDE, like JetBrains WebStorm, to write the Lambda. The JSON-format skill would live in the Alexa Skills Kit Development Console, and my Lambda in source control. I would have used AWS Serverless Application Model (AWS SAM) or Claudia.js to handle the deployment of Lambda functions.

With version 2 of ASK, you can easily create and manage the Alexa skill’s JSON-formatted code, as well as the Lambda, all from the command-line and a single IDE or text editor. All components that comprise the skill can be kept together in source control. I now only use the Alexa Skills Kit Development Console to preview my deployed skill and for testing. I am not going to go into detail about creating a new project using the ASK CLI, I suggest reviewing Amazon’s instructional guides.

Below, I have initiated a new AWS profile for the Alexa skill using the ask init command.

alexa-skill-post-008

There are three main parts to the new skill project created by the ASK CLI: the skill’s manifest (skill.json), model(s) (en-US.json), and API endpoint, the Lambda (index.js). The skill’s manifest, skill.json, contains information (metadata) about the skill. This is the same information you find in the Distribution tab of the Alexa Skills Kit Development Console. The manifest includes publishing information, example phrases to invoke the skill, the skill’s category, distribution locales, privacy information, and the location of the skill’s API endpoint, the Lambda. An end-user would most commonly see this information in Amazon Alexa app when adding skills to their Alexa-enabled devices.

alexa-skill-post-026

Next, the skill’s model, en-US.json, is located the models sub-directory. This file defines the skill’s custom interaction model, it contains the skill’s interaction model written in JSON, which includes the invocation name, intents, standard and custom slots, sample utterances, slot values, and synonyms of those values. This is the same information you would find in the Build tab of the Alexa Skills Kit Development Console. Amazon has an excellent guide to creating your custom skill’s interaction model.

Intents and Intent Slots

The skill’s custom interaction model contains the AzureFactsIntent intent, along with the boilerplate Cancel, Help and Stop intents. The AzureFactsIntent intent contains two intent slots, myName and myQuestion. The myName intent slot is a standard AMAZON.US_FIRST_NAME slot type. According to Amazon, this slot type understands thousands of popular first names commonly used by speakers in the United States. Shown below, I have included a short list of sample utterances in the intent model, which helps improve voice recognition for Alexa (gist).

Custom Slot Types and Entities

The myQuestion intent slot is a custom slot type. According to Amazon, a custom slot type defines a list of representative values for the slot. The myQuestion slot contains all the available facts the custom instructional skill understands and can retrieve from DynamoDB. Like myName, the user can provide the fact intent in various ways (gist).

This slot also contains synonyms for each fact. Collectively, the slot value, it’s synonyms, and the optional ID are collectively referred to as an Entity. According to Amazon, entity resolution improves the way Alexa matches possible slot values in a user’s utterance with the slots defined in the skill’s interaction model.

An example of an entity in the myQuestion custom slot type is ‘competition’. A user can ask Alexa to tell them about Azure’s competition. The slot value ‘competition’ returns a fact about Azure’s leading competitors, as reported on the G2 Crowd website’s Microsoft Azure Alternatives & Competitors page. However, the user might also substitute the words ‘competitor’ or ‘competitors’ for ‘competition’. Using synonyms, if the user utters any of these three words in their intent, they will receive the same response from Alexa (gist).

Lambda

Initializing a skill with the ASK CLI also creates the default API endpoint, a Lambda (index.js). The serverless Lambda function is written in Node.js 8.10. As mentioned in the Introduction, AWS recently announced support for the Node.js 8.10 runtime, in April. This is the first LTS version of Node to support async/await with Promises. Node’s async/await is the new way of handling asynchronous operations in Node.js.

The layout of the custom skill’s Lambda’s code closely follows the custom Alexa Fact Skill example. I suggest closely reviewing this example. The Lambda has four main sections: constants, setup code, intent handlers, and helper functions.

In addition to the boilerplate Help, Stop, Error, and Session intent handlers, there are the LaunchRequestHandler and the AzureFactsIntent handlers. According to Amazon, a LaunchRequestHandler fires when the Lambda receives a LaunchRequest from Alexa, in which the user invokes the skill with the invocation name, but does not provide any command mapping to an intent.

The AzureFactsIntent aligns with the custom intent we defined in the skill’s model (en-US.json), of the same name. This handler handles an IntentRequest from Alexa. This handler and the buildFactResponse function the handler calls are what translate a request for a fact from the user into a request to DynamoDB for a response.

The AzureFactsIntent handler checks the IntentRequest for both the myName and myQuestion slot values. If the values are unfulfilled, the AzureFactsIntent handler delegates responsibility back to Alexa, using a Dialog delegate directive (addDelegateDirective). Alexa then requests the slot values from the user in a conversational interaction. Alexa then calls the AzureFactsIntent handler again (gist).

Once both slot values are received by the AzureFactsIntent handler, it calls the buildFactResponse function, passing in the myName and myQuestion slot values. In turn, the buildFactResponse function calls AWS.DynamoDB.DocumentClient.update. The DynamoDB update returns a callback. In turn, the buildFactResponse function returns a Promise, a standard built-in object type, part of the JavaScript ES2015 spec (gist).

What is unique about the DynamoDB update call in this case, is it actually performs two functions. First, it implements an Atomic Counter. According to AWS, an atomic counter is a numeric DynamoDB attribute that is incremented, unconditionally, without interfering with other write requests. The update increments the numeric Hits attribute of the requested fact by exactly one. Secondly, the update returns the DynamoDB item. We can increment the count and get the response in a single call.

The buildFactResponse function’s Promise returns the DynamoDB item, a JSON object, from the callback. An example of a JSON response payload is shown below. (gist).

The AzureFactsIntent handler uses the async/await methods to perform the call to the buildFactResponse function. Note line 7 of the AzureFactsIntent handler below, where the async method is applied directly to the handler. Note line 33 where the await method is used with the call to the buildFactResponse function (gist).

The AzureFactsIntent handler awaits the Promise from the buildFactResponse function. In an async function, you can await for any Promise or catch its rejection cause. If the update callback and the ensuing Promise were both returned successfully, the AzureFactsIntent handler returns both a vocal and visual response to Alexa.

AWS IAM Role

By default, an AWS IAM Role was created by ASK when the project was initialized, the ask-lambda-alexa-skill-azure-facts role. This role is automatically associated with the AWS Managed Policy, AWSLambdaBasicExecutionRole. This managed policy simply allows the skill’s Lambda function to create Amazon CloudWatch Events (gist).

For the skill’s Lambda to read and write to DynamoDB, we must extend the default role’s permissions, by adding an additional policy. I have created a new AzureFacts_Alexa_Skill IAM Policy, which allows the associated role to get and update items from the AzureFacts DynamoDB table, and that is it. The role only has access to two of forty possible DynamoDB actions, and only for the AzureFacts table, and nothing else. Following the principle of Least Privilege is a cornerstone of AWS Security (gist).

Below, we see the new IAM Policy in the AWS Management Console.

alexa-skill-post-011

Below, we see the policy being applied to the skill’s IAM Role, along with the original AWS managed policy.

alexa-skill-post-012

Deploying the Skill

Version 2 of the ASK CLI makes deploying the Alexa custom skill very easy. Using the ASK CLI’s deploy command, we can validate and deploy the skill (manifest),  model, and Lambda, all at once, as shown below. This makes DevOps automation of skill deployments with tools like Jenkins or AWS CodeDeploy straight-forward.

alexa-skill-post-009

You can verify the skill has been deployed, from the Alexa Skills Kit Development Console. You should observe the skill’s model (intents, slots, entities, and endpoints) in the Build tab. You should observe the skill’s publishing details in the Distribution tab. Note deploying the skill does not submit the skill to Amazon’s for review and publishing, you must still submit the skill separately.

alexa-skill-post-013

From the AWS Lambda Management Console, you should observe the skill’s Lambda was deployed. You should observe only the skill can trigger the Lambda. Lastly, you should observe that the correct IAM Role was applied to the Lambda, giving the Lambda access to Amazon CloudWatch Logs and Amazon DynamoDB.

alexa-skill-post-010

Testing the Skill

The ASK CLI comes with the simulate command. According to Amazon, the simulate command simulates an invocation of the skill with text-based input. Again, the ASK CLI makes DevOps test automation with tools like Jenkins or AWS CodeDeploy pretty easy (gist).

Below, are the results of simulating the invocation. The simulate command returns the expected verbal response, including any SSML, and the visual responses (the Display Card). You could easily write an automation script to run a battery of these tests on every code commit, and prior to deployment.

alexa-skill-post-024

I also like to manually test my skills from the Alexa Skills Kit Development Console Test tab. You may invoke the skill using your voice or by typing the skill invocation.

alexa-skill-post-014

The Alexa Skills Kit Development Console Test tab both shows and speaks Alexa’s response. The console also displays the request and response body (JSON input/output), as well as the Display Card for an Echo Show and Echo Spot.

alexa-skill-post-015

Lastly, the Alexa Skills Kit Development Console Test tab displays the Device Log. The log captures Alexa Directives and Events. I have found the Device Log to be very helpful in troubleshooting problems with deployed skills.

alexa-skill-post-025.png

CloudWatch Logs

By default the custom skill outputs events to CloudWatch Logs. I have added the DynamoDB callback payload, as well as the slot values of myName and myQuestion to the logs, for each successful Alexa response. CloudWatch logs, like the Device Logs above, are very helpful in troubleshooting problems with deployed skills.

alexa-skill-post-016

Conclusion

In this brief post, we have seen how to use the new ASK SDK/CLI version 2, services from the AWS Serverless Platform, and the LTS version of Node.js, to create an Alexa Custom Skill. Using the AWS Serverless Platform, we could easily extend the example to take advantage of additional serverless services, such as the use of Amazon SNS and SQS for notifications and messaging and Amazon Kinesis for analytics.

In a future post, we will extend this example, adding the capability to securely add and update our DynamoDB table’s items. We will use addition AWS services, including Amazon Cognito to authorize access to our API. We will also use AWS API Gateway to integrate with our Lambdas, producing a completely serverless API.

¹Azure is a trademark of Microsoft

All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.

, , , , , , , , , , , ,

4 Comments