Building a Chatbot with AWS Lex

Evan Morgan - 30/01/2018

Working with Chatbots

A lot of people will be familiar with Alexa, Amazon’s helpful and cheery assistant that’s ready to play music for you, tell jokes and assure you it’s not here to take over the world. However, not so many will know of Lex - a deep-learning conversational interface that’s effectively the brains behind that friendly female voice on the Amazon Echo and Echo Dot.

This article focuses on Travel Tech Labs development experience with AWS Lex in a travel context, its strengths, a few pitfalls and our overall impression of Lex as a service. Using Amazon Web Services, you can build your own chatbot to work within an app, on social media or even in a travel call centre when you integrate with Amazon Connect.

This amazing technology has the capacity to change the interface of the travel industry, but it all revolves around one question - how ready is Lex to replace a conversation with a real person?

Chatbot Breakdown - How Does it Work?

How does it even do that?

On a simple level, Lex uses machine learning to process massive amounts of conversational data, constantly reviewing its own understanding of how conversational flow works and using this, Lex provides a service by which we can make chatbots that gets better over time as it processes more data. Simple? Not really… But the important part to take from this is that we can make chatbots with Lex, that can operate 24/7, responding to travellers demands/inquiries while we sleep soundly in our beds.

So how does a chatbot work from a developer perspective? I’m glad you asked. A chatbot is made up of intents, which represent a user’s intentional interactions with the chatbot i.e. why is this user talking to me? So for example, in the image below, we see that the user has said a sentence(or an utterance as AWS calls it) which tells “Travel Bot” that it should be using the “BookATrip” intent. By using this intent, it knows the next step is to use an AWS lambda function and then issue a confirmation. Once confirmed, it can proceed to ask you further about the trip you’d like to book e.g. “Are you booking a single or return journey?”.

Hidden Step: After booking your trip, 'robot_uprising.exe' will be executed.

Intents use slots to populate parameters that can later be used or returned to the user. These slots are essentially lists of possible words/values users will say to the bot. For example, an utterance could be “Book a {Trip}” where {Trip} is any value in the Trip slot. This is how, in the Travel Bot example, the bot has the Type slot filled in with “Flight”.

Voice or Text?

Hello World!

When developing our Lex Travel Chatbot, we spotted a lot of developer resources and tutorials giving tips on developing Lex chatbots that exclusively focus on text chat. Since there are so many resources using Lex as a text chatbot, we thought it might be an interesting exercise to investigate its possibilities as a voice application.

For our application, we wanted our users to give information over voice which would be used by an AWS Lambda function. This made it difficult to judge the capabilities of Lex voice chat before using it ourselves. Now that we’re out of the development phase with this product we can say that the voice chat has both strengths and weaknesses.

Strengths

Do you even lift, bro?

First off the bat, let’s get down that AWS Lex is a good service. For a service that does something as complex as create an artificially intelligent chatbot with voice capabilities, it has some big advantages and some things that fit just right:

  • Text understanding/extrapolation is quite strong - When using a chat-only program such as a social media chatbot or a slack bot, the misunderstanding rate is quite low. It’s rare that a user will mistype something and blame the chat for it. It’s even more rare that the chatbot would see you want to book a “single” journey and mistake that for a return.
  • Interface is simple and easy to understand - even for non-technicals, the setup for a chatbot is easy with Lex. Sure, there’s a CLI and an SDK for those who want to get more in depth, but for Joe Soap, someone who’s just mastered checking e-mail, it’s surprisingly easy to click “Create Bot”, “Create” and “Publish”.
  • Integration with Lambda is easy - for the developers among us, integration with AWS Lambda is made as simple as “Do you want a Lambda function to trigger at the beginning and/or end of this conversation? If so, click here to pick which one/s”. This integration could’ve been a lot harder to configure/find but as with a lot of AWS, Lex is well integrated into a lot of their other services such as Lambda and Connect.

Weaknesses & Pitfalls

AI is not responding. Close the program or wait for it to respond?

Despite the advantages of AWS Lex, there are a few non-obvious pitfalls in the development process. AWS have copious amounts of documentation around Lex but sometimes that just hinders your search to that one simple question. Here are a few we spent some time on:

  • Voice Formatting - Lex only accepts audio files in two formats - Opus and PCM. This is a real problem if your recording software isn’t very flexible and native Android or iOS do not support recording in either of these formats. This means that a conversion must take place somewhere between client and Lex and reduces the performance of the system.
  • Slots - Amazon provides built-in slots which can be used for common things like time and date. However, you can’t see the content of these slots and it’s a matter of trial and error to see what kinds of inputs you can expect them to successfully parse. However, you can at least expand on these built-in slots.
  • Word Recognition with Voice - Despite Alexa being quite good at recognising words, Lex seemed to struggle with things like airport names. It’s easy to understand why, as these are not actual words, they’re proper nouns e.g. Dublin, Barcelona, etc. However, the idea of pre-defined slots is supposed to help with this but the best attempts it had at these words, despite the slots being quite comprehensive, would still fall short.
  • No Direct Integration with API Gateway - if you’re using an API-first approach like we did, you’ll want to hide your chatbot behind an API. Unfortunately there’s no easy integration from API Gateway to Lex and you must resort to using Lambda as a proxy which adds to latency.
  • Only Exists in 2 Regions - speaking of latency, Lex is only available in North Virginia and Ireland. This is not ideal for two reasons. Firstly, the travel industry in Asia is big and growing. Having an API in Asia which contacts a bot in another region adds to latency and reduces system performance which is vital in a voice application. Secondly, this extends to any region apart from europe or the east of America and cross-region development can add its own complications.
  • Limited Language Options - Currently Amazon Alexa supports at least 3 languages with 5 dialects for English. However, Lex only supports 1 language in 1 dialect - US English.
  • Deletion - As explained previously, a chatbot is made of intents which uses slots to understand what a user is saying. However, let’s say you want to delete a slot for whatever reason - you MUST delete every version of every intent that uses that slot. Otherwise AWS will give you an error message telling you that the slot is in use by an intent. Similarly, for an intent you have to delete every version of every chatbot that uses that intent.
  • Change Propagation - Some changes to the chatbot would take up to 20 minutes to propagate through the service, meaning that the time difference between updating the bot and testing the update could be 20 minutes.

These being said, it’s important to focus on a bottom-up approach when building a voice chatbot with Lex. You need to establish what works and build from there, then test it again, ensuring that you build more bit by bit. This can be a real struggle for devs in the travel industry as Lex finds it difficult to process place names through voice, but very easily does so through text.

Conclusion

TBD

AWS Lex is a promising technology that features an easy to use interface for creating chatbots. With it, we created a travel chatbot that was perfectly suitable for text communication and chatting via social media to book a trip and identified some key considerations for working with Lex in the future.

Unfortunately, it’s just not ready for interactive voice communication where conversational flow is important. This is especially true in the travel industry where being able to resolve proper nouns like place names and airport names is very important. However, this has to be said with some caution as AWS are normally very quick at developing updates for their services and we are currently working with Amazon to overcome some of these difficulties. Watch this space.

Card image cap

Evan Morgan

Software Engineer