The Art and Business of Speech Recognition: Creating the Noble Voice
Proper recording of audio prompts is key to creating an effective speech-recognition system. Once the voice talent has been cast and the prompt script finalized, it's time to get to a recording studio and ensure that all the words are spoken and recorded to express the right ideas in the right way with the right feeling. As the director of the recording session, the designer must ensure that all of the following elements are captured correctly.
If any of the above elements are off, unexpected errors can occur later on. Mistakes aren't always immediately obvious, either; often the difference between the right prompt and the wrong prompt can be quite subtle. For example, let's say we're recording a prompt for an airline flight information system. One of the prompts is "Would you like the gate information for an international flight?" Looks simple and unambiguous, right? But there are at least three possible ways to interpret that line. If a voice talent was asked to record the prompt without any specific direction about its context, he or she might assume that the system was simply offering information to the user ”and might, accordingly , read it as a fairly flat and simple question. But what if we actually wanted the prompt to distinguish "gate information" from some other type of information, such as the arrival time or baggage claim location? In that case, the word "gate" should be emphasized . "Would you like the gate information for an international flight? " Or perhaps we want to distinguish international flights from domestic flights . In that case, the correct reading would be "Would you like the gate information for an international flight? " By recording the prompts with the right emphasis, you can ensure that the system asks questions to convey the desired intent. An effective speech-recognition system should create an affinity between the client and the caller. That's why pronunciation can be important. Generally, callers want to hear words pronounced the way that the caller pronounces them. And even if the voice talent has no discernable regional accent , he or she might pronounce a word ”often a place name ”differently from the local population simply for lack of knowledge. For example, imagine if a local bank in Oregon got acquired by a larger bank, and the new bank's speech-recognition system had a prompt offering branch location information for "Ore'-uh-gahn" instead of "Ore'-uh-gin." Bank customers in that state, perhaps already concerned that their bank is no longer locally owned, might feel even more alienated by the perceived mispronunciation. This can also happen (and would be more likely to happen) if the voice talent had to read a list of a few hundred cities, not all of which were familiar to the talent. Of course, not all prompts need such careful attention. But there is indeed a direct relationship between the effectiveness of the system and the attention paid to how each prompt is spoken. Even very good voice talents ”people who have been working in the industry for 30 years ”won't necessarily understand how to say a particular prompt, because the prompts are often recorded out of context and out of order. Imagine recording the phrase "Since today is a national holiday ." on the first day of the recording sessions, and then, three days later the designer were to record all the prompts that fit into the next position, for example, " the bank's branches will be open from 8 A.M. until noon." The result could be the audio equivalent of a ransom note composed from cutout newspaper and magazine letters ; everything's there, but nothing fits together well. It is the job of the designer/director to get the voice talent to convey not only the correct meaning of the prompts, but also the ideas and feelings behind the prompts. Does the client want to present itself as caring and compassionate? Precise and efficient? Friendly and informal? Conservative and formal? The voice talent needs to know in order to effectively convey the ideas and feelings behind the prompts. Knowing how to get a voice talent to produce the right quality of sound is something that all directors for all media learn by doing. There are many techniques: some directors describe the intent of the prompt to the voice talent and then keep refining their descriptions, while other directors (like me) ask the voice talent to mimic and imitate the way I say it, until the talent gets the hang of it and can extrapolate how to say the subsequent phrases with far less directing. |