You’ve most likely used an automatic speech recognition device before, whether you had a “natural” conversation with Siri on your iPhone or gave voice prompts to navigate through a menu when you called your bank. How much do you understand about how this technology works, though? Learning the basics of automatic speech recognition can help you figure out how best to communicate with these devices and get the responses or commands you’re actually looking for.
Let’s look at what’s actually happening when you use automatic speech recognition technology with this infograhpic from West Interactive. First, you speak into your device’s microphone, and the device creates a wave form while also reducing the background noise and normalizing the volume. The device will then break the wave form into the sounds that are the building blocks of words, called phonemes. Based on the first phoneme of a word, the device will use statistical analysis and context to determine the phonemes and words most likely to follow.
Devices that use natural language rather than direct dialog can actually learn from the people who speak to them. That’s because data from past interactions is stored in the automatic speech recognition program, and the program can draw on words and common combinations of words that you’ve used in the past to better determine your meaning. For example, the program could figure out that the homonyms “weather” and “whether” have two separate meanings based on how frequently you use the words and the context in which you use them (you’re more likely to ask about “weather” than “whether” when you want the program to pull up the local forecast, for example).
Now that you understand the basics of automatic speech recognition, here are a few tips to help you make the most of it.
Minimize background noise. While the program may be able to cut out some background noise, automatic speech recognition is often confused by multiple voices, loud sounds, or ambient noises.
Speak as clearly as possible. Since the program is trying to figure out the phonemes that you’re using, you’ll need to enunciate as clearly as possible to get the most accurate results.
Provide context for homonyms. As mentioned in the example above, an automatic speech recognition program will be better able to determine the meaning of “What’s the weather forecast?” than simply “What’s the weather?” because the word “forecast” provides additional context that helps the program recognize that you mean “weather” instead of “whether.”
Correct errors. Think about using autocorrect on a smartphone. When you manually change a word that your phone tries to autocorrect, the program will store that correction in its data. If you use the word that you manually entered much more frequently than the word that the phone tried to autocorrect to, data will show that your manually entered word has a higher probability of being used, and the program will begin anticipating that word rather than correcting it. Automatic speech recognition works the same way, so the more you talk to a program, the more it will learn.
While automatic speech recognition still has some limitations, the technology is rapidly evolving, and it has many applications in everyday life. It will doubtlessly be exciting to watch the software becoming more fine-tuned and the functionality more evolved in the coming years.