Using Amazon Polly Text-to-Speech Service to Expand your Blog’s Audience

Introduction

Writing a blog about your latest product’s features? Case studies on your latest customer integrations? Opinions and insights into your industry, or your customer’s industries? Well written blog posts not only inform, they can also showcase you or your organization’s experience and expertise. However, if you blog on a regular basis, you know there is a lot of content out there to steal a reader’s interest. Your audience’s interest in your post can be short-lived.

A great way to extend the reach of your posts and maintain interest longer is to produce an audio version. Audio versions can be included along with the written post. Audio versions may also be shared separately on popular services such as YouTube and SoundCloud. Many multi-tasking readers may prefer to listen to a post as opposed to reading. I often listen while commuting, working out, or running.

screen_shot_2020-03-26_at_9.55.47_pm

screen_shot_2020-03-26_at_10.24.53_pm

Amazon Polly’s text-to-speech (TTS) capabilities make it easy to convert a written post to a lifelike, professionally-sounding audio version. In this brief post, we will learn how to use Amazon Polly to convert blog posts to audio.

screen_shot_2020-03-25_at_3.58.19_pm

Preparing Post for Audio

Most posts only require minor modifications to optimize them for conversion to audio.

  • Adding an opening statement to your post, like ‘Audio version of the post’, followed by the post’s title, provides a good way to start the audio. For example, ‘Audio Introduction to: Getting Started with Data Analytics using Jupyter Notebooks, PySpark, and Docker’.
  • If your post includes code, you probably want to exclude those sections. If the post contains a large amount of code, you might consider only creating an audio introduction to the post.
  • If your post contains graphs or charts, which are referenced in the post, I suggest adding text, such as ‘See the Chart’, along with the caption of the graph or chart. For example, ‘See the Chart – AWS Marketplace: Product Delivery Methods (February 2020)’.
  • Create a simple URL for your post and add it to the audio, either at the beginning or end of the post. For example, ‘To read the full version of this post, including code samples, please go to tiny url dot com forward slash streaming warehouse’.

Custom Lexicons

Amazon Polly offers the ability to use custom lexicons, or vocabularies. According to AWS, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words, and neologisms. If you write industry-specific or highly technical blogs, you will find creating a lexicon is probably necessary to ensure your accompanying audio sounds accurate. In my own technical posts, I most often use a custom lexicon file for acronyms and company names. While many acronyms are spelled out, others are not and have unique pronunciations. Likewise, many company names have a unique pronunciation.

Take for example the following acronyms, which I used in my last few posts: PaaS, BYOL, ELA, PAYG, IPv4, IPv6, IAM, ENI. Using the default lexicon of Amazon Polly, we end up with incorrect pronunciations for all these acronyms.

Now listen to the pronunciation of the same acronyms, after we apply a custom lexicon.

Lexicons must conform to the Pronunciation Lexicon Specification (PLS) W3C recommendation. The lexicon files are in XML format. Below is a snippet of a sample lexicon files.


<?xml version="1.0" encoding="UTF-8"?>
<lexicon
version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa"
xml:lang="en-US">
<lexeme>
<grapheme>PaaS</grapheme>
<alias>pass</alias>
</lexeme>
<lexeme>
<grapheme>BYOL</grapheme>
<alias>b-y-o-l</alias>
</lexeme>
<lexeme>
<grapheme>ELA</grapheme>
<alias>e-l-a</alias>
</lexeme>
<lexeme>
<grapheme>PAYG</grapheme>
<alias>p-a-y-g</alias>
</lexeme>
<lexeme>
<grapheme>IPv4</grapheme>
<alias>i-p-v-4</alias>
</lexeme>
<lexeme>
<grapheme>IPv6</grapheme>
<alias>i-p-v-6</alias>
</lexeme>
<lexeme>
<grapheme>IAM</grapheme>
<alias>i-a-m</alias>
</lexeme>
<lexeme>
<grapheme>ENI</grapheme>
<alias>e-n-i</alias>
</lexeme>
</lexicon>

view raw

lexicon.xml

hosted with ❤ by GitHub

Amazon Polly Console

Amazon Polly supports synthesizing speech from either plain text or SSML input. From the Amazon Polly’s Management Console, copy and paste your post’s prepared text into the ‘Plain text’ tab.

Next, choose the Voice Engine. If you are using English, I suggest ‘Neural’. According to AWS, Amazon Polly has a Neural TTS (NTTS) system that can produce even higher quality voices than its standard voices. The NTTS system produces the most natural and human-like text-to-speech voices possible.

Choose your Language and Region. Then, select your Voice; I prefer ‘Joanna’. In my opinion, her voice has a natural, lifelike sound. If you prefer a male voice, ‘Matthew’ is quite natural sounding. Lastly, upload your lexicon file(s).

screen_shot_2020-03-25_at_4.00.52_pm

To start the process, choose ‘Synthesize to S3’. Indicate the S3 bucket you would like the mp3 format audio file, output into. You can also add a prefix to the mp3 files. For most average length posts, the text synthesis process takes less than one minute. To be notified, you can include an Amazon SNS topic ARN. Select ‘Synthesize’.

screen_shot_2020-03-25_at_4.04.35_pm

Polly creates a synthesis task.

screen_shot_2020-03-25_at_4.04.42_pm

The synthesis tasks may be viewed from the ‘S3 synthesis tasks’ tab.

screen_shot_2020-03-25_at_4.05.42_pm

Once the synthesis task is complete, the resulting mp3 audio file may be viewed and downloaded from the S3 Management Console. If you are using a Mac, QuickTime Player works great to review the audio file.

screen_shot_2020-03-25_at_4.06.22_pm

AWS CLI and SDK

Amazon Polly may also be used from the AWS CLI or using the AWS SDK. In the example below, we have replicated the same operations performed in the Console, this time using the AWS CLI. First, upload your lexicon file(s) using the polly put-lexicon command. Each lexicon can only be up to 4,000 characters in size. Then call the polly start-speech-synthesis-task command to create a synthesis task.


TEXT_FILE_CONTENTS=$(cat path/to/my/blog_text_file.txt)
OUTPUT_BUCKET=my_bucket_name
TOPIC=blog
aws polly put-lexicon \
–name blogvocab \
–content file://path/to/my/blogvocab.pls
aws polly put-lexicon \
–name techterms \
–content file://path/to/my/techterms.pls
aws polly start-speech-synthesis-task \
–engine neural \
–language-code en-US \
–lexicon-names blogvocab techterms \
–output-format mp3 \
–output-s3-bucket-name ${OUTPUT_BUCKET} \
–output-s3-key-prefix ${TOPIC} \
–text ${TEXT_FILE_CONTENTS} \
–text-type text \
–voice-id Joanna

view raw

polly.sh

hosted with ❤ by GitHub

The output should look similar to the screengrab, below. The results will be identical to using the Console.

screen_shot_2020-03-25_at_10.56.05_pm

You can check the task’s results using the polly list-speech-synthesis-tasks command.

Conclusion

In this brief post, we saw a great use case for Amazon Polly, converting your written blog posts into audio. Creating audio versions of our blogs is a great way to extend the reach of the post to a potentially new audience and maintain your current audience’s interest a little longer. Amazon Polly has several other features and capabilities to explore.

This blog represents my own view points and not of my employer, Amazon Web Services.

, , , , ,

  1. Using Amazon Polly Text-to-Speech Service to Expand your Blog’s Audience | Chris Short

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: