The Art and Business of Speech Recognition: Creating the Noble Voice
FedEx wanted to create a speech-recognition system that would help people in its U.S. market determine how much it would cost to send packages. They knew that people who call them on the phone (that is, don't use the Web interface or a rate book) don't ship frequently. So this application had to work for people with little FedEx shipping experience. Most people ship packages using the standard FedEx envelopes and boxes. But some use their own packaging. FedEx needed to obtain the dimensions of these nonstandard containers from the callers to ensure that they were not too large to send via the FedEx standard service. The system was designed to ask each dimension using slightly different text. Here's how the series of questions ran.
This is an example of scaffolding prompts. In scaffolding prompts, questions in a series become progressively shorter based on the knowledge callers have gained as they go along. When the FedEx system starts asking about the dimensions of the container, the system incrementally shortens the text with each successive question. This structure assumes ”as a "real person" would in similar circumstances ”that most people can remember the directions they were given when the questioning began . In the first question about dimensions, the system asks for the "approximate length" in "inches" ”also "rounded off to the nearest inch." This is meant to convey that the exact length isn't important, but the units must be in whole inches (not fractions) and not in centimeters, cubits, or any other unit of measurement. The next question is a variation of the first one. The system asks for the approximate "width" of the container, but again reminds the caller to measure in inches. Why the reminder? Because a caller might say a package is 18 inches long and 2 feet wide. We don't want them to forget that we're looking only for responses in inches. Now, after asking the caller two similar questions, and receiving two valid answers (as far as the recognizer is concerned ), we can assume the caller has the hang of things. Therefore, we can make the final question brief and casual ”"What's the height?" Callers appreciate the scaffolding prompt structure, not only because it spares them from answering unnecessarily lengthy questions, but also because it is what they would expect in a conversation with a real person. So what happens if the caller has problems with the second question? As data taken from the FedEx system (and others) show, callers seldom have problems knowing how to answer the subsequent , similar question. But even for the ones who do have problems, the timeout and retry prompts help them answer the question by providing an example or telling them how to use touchtones to enter the information. In the FedEx Rate Finder system, the system first asks for the length, then width, and then height. If the caller gets to the second question (for the width) and for some reason doesn't know how to answer the system, the first timeout prompt tells the caller, "I didn't hear you. Please say the approximate width of your package in inches, and round off to the nearest inch." However, if the caller said something that the system didn't understand, it assumes that the caller was on the right track, but for some reason it couldn't understand the response and says, "Please say the approximate width again, for example, you could say seven inches." If the caller doesn't respond a second time, then the system plays the second timeout prompt and provides even more information, "Sorry, I didn't hear you that time either. Using the telephone keypad, enter the approximate width of your package in inches. If you need help, just say 'Help.'" Whereas if the caller says something again that the system doesn't understand, it plays the second retry prompt that was designed to assume that the conditions of the call might have changed to make recognition difficult (after so many successful interactions with the system that has lead them here) and tells the caller, "Please enter the approximate width, in inches, using the telephone keypad." The point here is to understand that every caller must go through the entire sequence ”from step 1 to step 2 to step 3 ”and generally , the majority will learn along the way. Once the context has been established, the system need only prompt them with as much information as necessary to answer each question. |