The Art and Business of Speech Recognition: Creating the Noble Voice
Prompt Creation ”Text-to-Speech and Recorded Voices
There are two ways to create audio prompts. One method is to record a real person, called a voice talent, saying phrases, while the other is to use text-to-speech (TTS) software that converts text stored in a digital form (for example, an e-mail message) to a spoken utterance, in real time. TTS is generally used to read dynamic information in a cost-effective manner that otherwise would be difficult or impossible to prerecord, for example, the daily news or the weather. There are two types of popular TTS engines ”those that synthesize the sound, formant TTS , and those that take thousands of small pieces of prerecorded human-speech and concatenate them, or string them, together, called concatenative TTS . The following are the primary differences among the three methods (recorded phrases and the two types of TTS) of producing the audio files.
Most often it's a good idea to use recorded prompts, since they will sound the most natural and the total time to record the prompts is generally a fraction of the total time of development. I don't advocate only using TTS prompts for an entire application, because that method could compromise the ability to express the endless amount of variation that the human voice can produce to convey particular thoughts. The preferred and more traditional method is to record a real person ”the voice talent ”saying phrases that are recorded and stored digitally in a computer, with each phrase saved as a unique file and played to the caller as appropriate. Even though callers know that they're not listening to a live person, they are much more comfortable interacting with something that sounds more like a fellow human being [1] and less like the somewhat emotionally removed HAL 9000 from 2001: A Space Odyssey. [1] See Byron Reeves and Clifford Nass, The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places (New York: Cambridge University Press, 1996), pp.106 “107. Production of effective audio prompts requires three tasks .
|