Tinkering with Voice Apps During Quarantine

I guess the upside to being in a global pandemic is it gives some of us an opportunity to try something new. Tinkering with new technology platforms can be a great way to make the most of being stuck at home.

Custom Voice experiences for Amazon Echo and Google Home have been around for a few years and Rocket has built dozens of experiences for our clients. If you are interested in building your own voice experience, here are a few tips we’ve learned along the way.

Use Cocktail Party Etiquette to Determine Session Length

We’ve all attended a party (back when parties were a thing) where you get stuck talking to someone where the conversation seems to go on too long. That same dynamic plays out in Voice experiences and it’s really important to measure, test and refine your overall session length. You don’t want your Voice experience to be the bore of the party. Both Google and Amazon offer built-in analytics that show where users drop off. For every experience, you’ll see the point where you’ve asked users too many questions and they close the session. We’ve found that five questions, within an overall session length under 60 seconds, is the sweet spot to keep users engaged.

Audio Clips Drive Engagement

The digital assistants are pretty darn impressive but the voice responses can come across as stilted or canned. While it’s interesting that users will quickly adapt their own speech patterns to converse with this artificial intelligence, we have noticed in user testing that the canned nature of the responses can get tiring over time. One way to keep your engagement high is to include unexpected easter eggs or non-Alexa audio clips in your custom skill. A few years ago, if you asked an Amazon Echo, “Alexa, how many Oscars has Alec Baldwin won?” Alec Baldwin would interrupt Alexa and provide his own response. It was a cute moment that likely increased the engagement time and warmth of the experience. Even if you can’t afford Alec Baldwin’s SAG rate, adding sound effects or non-Alexa voice responses can keep your users coming back.

Context is King

Unlike some people, Alexa doesn’t do well with one word answers. A simple “YES” or “NO” will sometimes confuse Alexa and introduce bugs or unexpected dead-ends in your Alexa Skill. As you’re working on your first Alexa Skill, it’s important to remember that wherever you expect a one word response, you programmatically “remind” Alexa of the question’s context. The Alexa Skills Kit has some great examples for how to handle this correctly.

For ‘Utterances”, The More the Merrier

An “utterance” is the question that the user asks the digital assistant. One mistake we made early on was not anticipating enough variations in the question. For example, say you built an Uber-like Voice skill for a user to arrange for a ride. The rookie mistake is to just create one utterance (e.g. “Alexa, I want to book a ride”). The pro move is to create multiple utterances to account for every turn of phrase (e.g. “Alexa, I need a ride”, “Alexa, book me a ride”, “Alexa, I need to schedule a ride”). The Node.js Alexa Skills Kit Sample has a great open source framework to help you auto-generate a bunch of different utterances for each of your questions.

Gently Guide Your Users

It’s smart to assume that most people won’t know what to do when they open your Alexa Skill. When you design your Alexa Skill it’s critical - particularly at the opening - to gently guide the user down the path you want them to travel. When a user first opens your Alexa Skill, don’t just confirm the Skill has been opened. You need to confirm your Skill is open and gently guide the user to the next step. An example of this would be to lead with, “Opening <YOUR SKILL>, do you want to do X or Y?” versus just saying “Opening <YOUR SKILL>” Assume your users know nothing and gently lead them down a path.

Skills that are published in the Alexa Skills Store are required to have a Help Intent, but just because it’s required doesn’t mean that users will automatically ask for help. You can start to slowly teach the user about various functionality in your skill. Continuing with the Uber-like example above, after booking a ride Alexa could say “Your ride will arrive in 5 minutes. Would you like to share your ETA with a friend?” By doing this instead of saying “Your ride is booked”, you’re not only increasing engagement with the user, but you’re also teaching about some of the features of your Alexa Skill.

Linking Your Users

If you need to have a deeper integration with a user of your system, you’ll need to enable Account Linking with your Alexa Skill. This process simply surfaces the access token used to make authenticated requests against your backend API. Once the user links successfully, that access token will be accessible from your skill’s code. If you’re already using OAuth 2 and are following the specification, there’s a strong chance you can simply point the skill’s configuration to your existing authentication service. If you’re not quite following the OAuth 2 spec, you can set up an intermediary to make sure that tokens are passed back to Alexa in a way that Amazon expects.

Alexa currently supports the Implicit Grant flow and the Authorization Grant flow. The Implicit Grant is easier to reason about when you first start working on linking accounts, though is limited by the token’s expiry date. If your authentication service is set up in a way where tokens expire, you’ll likely have to use the Authorization Grant flow, which requires an extra step and introduces the concept of the refresh token. If your tokens don’t expire, the Implicit Grant is the way to go.

Give Yourself a Full Week for Approval

While you may get lucky, you should allow yourself seven days from submission to final Amazon approval of your Alexa Skill. During our own submission process, we’ve noticed that Amazon has a two stage process. The first stage seems to have a bunch of standard scripts to test your Skill for common errors. Things like how you handle good intents, bad intents and how you handle passing in a bad application ID. The second phase is when an employee of Amazon looks at your app to determine if it meets their standards, terms and conditions. If you follow Amazon’s best practices - and make sure you handle intents well, the process should be pretty smooth.

There you have it. These are our tried and tested tips for creating great voice apps. You can also check out our ebook on “How to Build Your First Amazon Alexa Skill” if you want to learn more.