The Art and Business of Speech Recognition: Creating the Noble Voice

Directing voice talents requires a great deal of concentration and attention to detail. Just as on stage or in movies, even the most seasoned, professional talents need direction to deliver the best performance. It's up to the designer to articulate how the application should feel ”and consequently, how the voice talent should sound.

Should the talent affect a produced, professional sound? Or should he or she speak more casually ”for example, not hitting any ending consonants (like the last "t" in "that") so they sound more conversational? Even within these general directions there are subcategories , including

  • Speed (how quickly the words are spoken)

  • Range (the actual notes in the phrases)

  • Duration of pauses between words and phrases

  • Elongation of words (when people want to sound reluctant, they often stretch out some of the words, as in "Welllll, juuuust this once .")

Keep in mind that the entire application doesn't need to be totally consistent in terms of the style of direction. Sometimes it's a good idea to mix in casual phrases between the important phrases to add color to the application. This can also provide a mental respite to listeners as they think about what they've been doing. For example, a stock trading application may use a lightly humorous , off-the-cuff phrase like "You don't need to remember any commands ”I'll make sure you know what you're doing before I stop providing hints." If the talent is directed to deliver this line in a slightly tongue-in-cheek way, it can help relax callers .

Conversely, some casual applications may benefit from the adoption of a more formal tone for certain lines or phrases to provide added emphasis and ensure that the caller pays close attention. Here's an example from an airline application: "Since your luggage has been located, it will be sent to the address you provided us on the claim form. If you don't want it sent there, say 'Agent' and we'll change the address right away."

A great way to get voice talents into the right frame of mind for the recording is to ask them to visualize a place or an environment that conveys the feeling of the application. This is not as strange as it may sound. After all, when people write reviews of music, they often do so using art terminology (describing the color, form, or texture of a musical performance) and art is often described musically (using words like rhythm, dissonance, and loudness ).

I once directed the recording for an application that was going to be used by a very large and broad spectrum of people worldwide. I told the voice talent to imagine a Monet painting called "Entrance to the Village of Vetheuil in Winter," [2] which depicts a dirt road fading into the distance. In the painting, a few people walk down the road surrounded by fields with a couple of houses in the distance. Overhead, the clouds of a winter storm have just broken to reveal a glow of orange afternoon light.

[2] The painting is part of the permanent collection of the Museum of Fine Arts in Boston.

This imagery helped set the idea of a cooler , relaxed pace. But I also wanted the voice talent to imagine that he and the caller were walking down the dirt road in a purposeful yet unhurried way. It helped the voice talent to find a particular range in his voice that felt comfortable to him and would feel comforting to the caller. This use of guided imagery is a great way to elicit a good voice performance, and the imagery need not be fine art ”or any art at all. To get a voice talent in the right frame of mind to record for a super-hip voice portal application, for example, you might suggest imagining an evening among beautiful people at the coolest nightclub or disco in town. What's right is whatever works.

Often, the same voice talent will sound very different depending on who's directing him or her. This can become noticeable when a company goes back to record additional prompts for an existing application. If the voice talent is the same but the director is different, the resulting prompts may not match the earlier ones. Sometimes this effect can be mitigated if the voice talent listens to several of the previous recordings and tries to copy them; however, this rarely is a substitution for consistent direction. Once the voice talent is in the right mindset, it's time to start recording.

Usually, the most important phrases in the application are the initial prompts that every caller will hear ”the "welcome" prompt, for example, or the first questions asked. I highly recommend recording all of these high-level prompts in the first session ”before any other prompts. This enables the voice talent to retain the context of the recording and maintain a consistent tone.

That means you should select the most common, logical path for the majority of callers, and put all the prompts from that path at the beginning of the script. Then, when the voice talent is recording the phrases that have been logically arranged, he or she can gain an understanding of how the finished product will sound to callers ”and can start thinking about how to voice it properly. Often, when prompts are arranged in this way, the voice talents themselves will know enough to spot and correct some of their own errors before the director can do so.

Категории