Alexa Presentation Language Tutorial: Getting Started

Voice is obviously the primary method of interaction with Alexa, but sometimes a picture really is worth a thousand words. Amazon has recently announced a new way to enhance Alexa skills with interactive visuals on devices with a screen such as the Echo Show. It's called Alexa Presentation Language (APL) and it's a major step forward for the platform. Before APL, GUI options for Alexa skills were limited to 9 predefined display templates. This setup had the advantage of keeping the look and feel consistent across all of the skills, but ultimately it was too limiting. Many of our customers have been asking us to design and implement Alexa skills with custom graphic interfaces and thanks to APL we are finally able to deliver on such requests.

The documentation for APL is pretty good and has been getting better over time. Some excellent code samples can be found under alexa-labs on GitHub and in the APL authoring tool. Having worked with APL even before it became public, I figured I would contribute to this list of useful resources with a series of blog posts describing the process of implementing a simple Alexa skill that takes full advantage of APL. Let's build a skill that plays animal sounds similar to those sturdy sounds books with buttons little kids love to play with. Feel free to follow along and if you get stuck the code revisions described below map pretty closely to the commit history for the animal-sounds-apl repo on my GitHub.

There are many good ways to write a backend for an Alexa skill, but the stack with the most Amazon blessing (and therefore best tools and support) right now seems to be JavaScript code using the Alexa Skill Kit (ASK) SDK v2 for Node.js deployed to AWS Lambda, so let's stick to that. To that point, running the ask new command from the ASK CLI toolkit is a great way to get up and running with a new Alexa project in no time. Given no extra options, it produces a deployable Hello World skill using the tech outlined above that we can use as a skeleton for our project. If you need to set up ASK CLI on your machine, follow this quick start guide.

Only a few changes to the Hello World project produced by ask new are needed before we can start adding APL. In the interaction model, change the invocation name to animal sounds a. p. l.. Also, turn the HelloWorldIntent into AnimalSoundIntent with a slot called animal and add an animals type filled with some sample values. After these edits, the contents of the en-US.json file should look something like this:

{
  "interactionModel": {
    "languageModel": {
      "invocationName": "animal sounds a. p. l.",
      "intents": [
        {
          "name": "AMAZON.CancelIntent",
          "samples": []
        },
        {
          "name": "AMAZON.HelpIntent",
          "samples": []
        },
        {
          "name": "AMAZON.StopIntent",
          "samples": []
        },
        {
          "name": "AnimalSoundIntent",
          "slots": [
            {
              "name": "animal",
              "type": "animals"
            },
            {
              "name": "article",
              "type": "articles"
            }
          ],
          "samples": [
            "{animal}",
            "{article} {animal}",
            "{animal} sound",
            "{article} {animal} sound"
          ]
        }
      ],
      "types": [
        {
          "name": "animals",
          "values": [
            {
              "name": {
                "value": "cat"
              }
            },
            {
              "name": {
                "value": "dog"
              }
            },
            {
              "name": {
                "value": "cow"
              }
            }
          ]
        },
        {
          "name": "articles",
          "values": [
            {
              "name": {
                "value": "a"
              }
            },
            {
              "name": {
                "value": "an"
              }
            },
            {
              "name": {
                "value": "the"
              }
            }
          ]
        }
      ]
    }
  }
}

Next, let's make some adjustments to the skill manifest in the skill.json file. The publishing information doesn't really matter for skills in development, so the changes to fields like examplePhrases and description can be kept to minimum for now. The most important thing to do here is to add ALEXA_PRESENTATION_APL as a type of an interface used by the skill under apis. This communicates to the Alexa device that rendering APL will be involved and is required if we want to see our designs appear on screens. Here is what skill.json should look like after these changes:

{
  "manifest": {
    "publishingInformation": {
      "locales": {
        "en-US": {
          "summary": "This skill plays animal sounds and displays animal pictures using APL.",
          "examplePhrases": [
            "Alexa, open animal sounds a. p. l.",
            "Alexa, ask animal sounds a. p. l. for a cat sound"
          ],
          "name": "animal-sounds-apl",
          "description": "This skill plays animal sounds and displays animal pictures using APL."
        }
      },
      "isAvailableWorldwide": true,
      "testingInstructions": "Sample Testing Instructions.",
      "category": "CHILDRENS_EDUCATION_AND_REFERENCE",
      "distributionCountries": []
    },
    "apis": {
      "custom": {
        "endpoint": {
          "sourceDir": "lambda/custom"
        },
        "interfaces": [
          {
            "type": "ALEXA_PRESENTATION_APL"
          }
        ]
      }
    },
    "manifestVersion": "1.0"
  }
}

Moving on to the code (contained in lambda/custom/index.js), there are a few speechText values and some parameters to the handlerInput.responseBuilder.withSimpleCard function calls that should be adjusted to something that makes more sense for our skill's intent handlers. More importantly, we need an AnimalSoundIntentHandler in place of the HelloWorldIntentHandler. For the purposes of our sample skill something simple like this would suffice:

const AnimalSoundIntentHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'IntentRequest'
      && handlerInput.requestEnvelope.request.intent.name === 'AnimalSoundIntent';
  },
  handle(handlerInput) {
    const animalData = {
      cat: {sound: 'meow', image: 'https://s3.amazonaws.com/animal-sounds-apl/images/cat.png'},
      dog: {sound: 'bark', image: 'https://s3.amazonaws.com/animal-sounds-apl/images/dog.png'},
      cow: {sound: 'mooo', image: 'https://s3.amazonaws.com/animal-sounds-apl/images/cow.png'}
    };

    const requestedAnimal = handlerInput.requestEnvelope.request.intent.slots.animal.value;
    const requestedAnimalSound = animalData[requestedAnimal].sound;
    const requestedAnimalImage = animalData[requestedAnimal].image;

    const speechText = `${requestedAnimal.charAt(0).toUpperCase()}${requestedAnimal.slice(1)} says ${requestedAnimalSound}!`;

    return handlerInput.responseBuilder
      .speak(speechText)
      .withSimpleCard('Animal Sounds APL', speechText)
      .getResponse();
  },
};

For the sake of example, a local object with strings representing sounds and URLs to publicly accessible images for the animals from the model is good enough. Only the sound strings are used for now (to build the speech outputs), but we'll start using the images soon enough. Remember to add AnimalSoundIntentHandler to the parameters for the skillBuilder.addRequestHandlers function, but otherwise we are ready to start writing some APL to define what we want displayed whenever the AnimalSoundIntent gets invoked. APL GUIs are implemented in APL documents, which are JSON files made up of APL components that get instantiated using the following syntax pattern:

{
    "type": "aplComponentName",
    "property1": "property1value",
    "property2": "property2value",
    ...
}

The best place to write APL is the Start from scratch section of the APL authoring tool. It lets you see the visual output of your work simulated in the browser for a number of different screen sizes and even push a preview to a real device, which are great frontend development features giving you a quick feedback loop. There is also a Data JSON tab that lets you populate your template with some test data and a toggle to switch between row APL code and a more abstracted editor. The boilerplate code the authoring tool will set you up with is likely going to look something like this:

{
    "type": "APL",
    "version": "1.0",
    "theme": "dark",
    "import": [],
    "resources": [],
    "styles": {},
    "layouts": {},
    "mainTemplate": {
        "items": []
    }
}

Most of these fields are used for code optimizations, which I will get to in another post. The only part you need to worry about for now is mainTemplate. For simple layouts, you can nest all of your APL components right there. Here is what you could do for AnimalSoundIntent:

{
    "type": "APL",
    "version": "1.0",
    "theme": "dark",
    "import": [],
    "resources": [],
    "styles": {},
    "layouts": {},
    "mainTemplate": {
        "parameters": [
            "payload"
        ],
        "items": [
            {
                "type": "Frame",
                "width": "100vw",
                "height": "100vh",
                "backgroundColor": "rgb(22,147,165)",
                "items": [
                    {
                        "type": "Container",
                        "width": "100vw",
                        "height": "100vh",
                        "alignItems": "center",
                        "justifyContent": "spaceAround",
                        "items": [
                            {
                                "type": "Text",
                                "text": "${payload.animalSoundData.message}",
                                "fontSize": "50px",
                                "color": "rgb(251,184,41)"
                            },
                            {
                                "type": "Image",
                                "source": "${payload.animalSoundData.image}",
                                "height": "50vh",
                                "width": "30vw",
                                "scale": "best-fit"
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

Let's break this down. "parameters": [ "payload" ] gives us a reference to the data object our skill's backend code will be sending along with the APL document. The other parameter of mainTemplate ("items") is used for nesting components inside other components. Frame makes up the first layer of our layout. It can be used to create and style rectangular and oval shapes, but all we are using it for this time is to set the background color, defined through its RGB value. The vw and vh units are used to make the Frame fill all of the available viewport space. The next layer of our layout is a Container component. Containers do not produce any visible output, but they are very useful for positioning and constraining components nested inside of them. "alignItems": "center" will seem familiar to anyone who has come across CSS Flexbox. It's a very straightforward way to center items along the cross axis, which is horizontal for Containers by default. "justifyContent": "spaceAround" is another Flexbox inspired parameter that adds equal amounts of distance between and around all of the items along the main axis, which is vertical for Containers by default. The final layer of our layout consists of Text and Image components sourced from the payload object we defined earlier. The syntax for getting values out of this data source is very similar to the one used in JavaScript's template strings, with placeholders indicated by a dollar sign and curly braces. Text's "fontSize" and "color" parameters should be pretty self explanatory. The same goes for Image's "width", "height" and "scale".

With the APL document described above and given the following Data JSON:

{
    "animalSoundData": {
        "message": "Cat says meow!",
        "image": "https://s3.amazonaws.com/animal-sounds-apl/images/cat.png"
    }
}

your APL authoring tool should look more or less like this at this point:

The only step remaining to enable rendering of this layout for each AnimalSoundIntent request is to send the APL document along with the data inputs in a directive with each response. First, add the APL document code to the project directory structure. lambda/custom//aplDocuments/animalSound.json is a good place for it. Then, in AnimalSoundIntentHandler replace the call to withSimpleCard() in handlerInput.responseBuilder with addDirective() like this:

return handlerInput.responseBuilder
  .speak(speechText)
  .addDirective({
    type: 'Alexa.Presentation.APL.RenderDocument',
    version: '1.0',
    document: require('./aplDocuments/animalSound.json'),
    datasources: {
      'animalSoundData': {
        'message': speechText,
        'image': requestedAnimalImage
      }
    }
  })
  .getResponse();

With that (and once the final version of the skill gets deployed), you should be able to see this interaction in the test section of the alexa developer console:

Next time, I'll describe how to add a launch screen with a list of selectable animal sounds using the Sequence and TouchWrapper components. In the meantime, feel free to open an issue on GitHub if you have any questions!

Menu