Using Speech Recognition and Voice Commands

3/14/2011 9:43:28 PM

The crew of the starship Enterprise thought nothing of talking to the computer that ran the ship and its systems. Windows Speech Recognition, a feature introduced in Windows Vista and available in all editions of Windows 7, comes closest to fulfilling that futuristic vision of computing. You won't be able to blast Klingon warships into space dust with voice commands, but if you set slightly more realistic expectations, we predict you'll be extremely impressed with Windows Speech Recognition.

Before you can get started, you'll need to have the right gear. The most important piece of equipment, naturally, is a high-quality microphone. Microsoft recommends a USB headset model for best performance. The headset ensures a consistent distance between your mouth and the microphone, and a USB connection has an all-digital signal path, unlike direct connections to an onboard sound card. Both factors increase your chances of success in accurate speech recognition.

1. Tuning and Tweaking Windows Speech Recognition

After installing your hardware and any required drivers, you're ready to begin using Windows Speech Recognition. You need to run through a quick setup routine, which in turn strongly encourages you to complete the Windows Speech Recognition tutorial. Even if you normally prefer to dive right in to a new feature, we recommend that you make an exception for this tutorial. In small part, that's because the tutorial does an excellent job of introducing the Speech Recognition feature. Much more important, though, is the fact that the speech recognition engine uses your responses during the tutorial to train itself to recognize your voice and phrasing. (And it's really not that long, honest.)

With the tutorial out of the way, you can start Windows Speech Recognition using its shortcut on the Start menu. If you need to adjust any setup options, you can do so from Speech Recognition in Control Panel.

When Speech Recognition is running, you see the capsule-shaped microphone interface pinned to the top of the screen. When the microphone icon is blue and the word Listening appears, the speech recognition engine is hanging on your every word—or for that matter, on stray sounds, which it will try to convert into text or commands. If you're not actively dictating, click the microphone button (or say "Stop listening"). The microphone icon turns gray. If you chose manual activation in the initial setup, you'll need to click the microphone icon again (or press Ctrl+Windows logo key) to resume; if you chose voice activation mode, the word "Sleeping" appears to indicate that it is listening only for the magic phrase "Start listening" to begin again.

Inside Out: Hide or move the speech recognition interface

If you find that the speech recognition interface covers up important information when docked at the top of the screen, you have several choices. You can move it so that it floats on the screen. Or you can hide it, by speaking the command "Hide speech recognition." To make the interface visible once again, just say "Show speech recognition."

To see a list of all options that you can adjust for Windows Speech Recognition, say "Show speech options" or right-click the microphone interface.

Without question, speech recognition embodies a learning curve. A modest amount of time and effort expended up front pays substantial dividends in the long run. One technique that can improve your skills and simultaneously improve the accuracy of the speech recognition engine is to run through the Speech Recognition Voice Training sessions. Each module includes tips, suggestions, and background information that you read out loud. The more modules you complete, the more information the computer has to work with when you begin speaking next time.

When you speak to the computer, it parses the sounds and tries to determine whether they represent commands (which control movement of on-screen objects and the behavior of programs) or dictation (which represents text you want to insert in an editing window or a text box). Windows Speech Recognition has an extensive vocabulary, and it's smart enough to limit the commands it listens for to those that are applicable to the activity you're currently engaged in. By learning the words and phrases it is most likely to respond to, you increase the odds of having it carry out your commands properly. At any time, you can say "What can I say?" This all-purpose command opens the Windows Speech Recognition Quick Reference Card, a Help And Support dialog box that breaks most commands down into related groups.

2. Controlling a PC with Voice Commands

The guiding principle for working with windows, dialog boxes, menus, and other on-screen objects is simple: "Say what you see." So, for example, you can say "Start," and Windows Speech Recognition will display the Start menu. You can then say "All Programs" to open that menu, and continue working your way to the program you want by saying the names of objects and menu items you see on the screen. If you know the name of the program you want to open, you can skip that navigation and just say "Open program."

You can also "click what you see" (or double-click or right-click). If a window has menus available, you can speak the names of those menus ("File," "Open") just as if you were clicking them.

If you can't figure out what to say to get Windows Speech Recognition to click an object on the screen, make a note of where the object you want to see is located, and then say "Show numbers." This command enumerates every clickable object on the screen and overlays a number on each one, as shown in Figure 1, which depicts what happens to Control Panel when you choose this option.

Figure 1. When you say "Show numbers," Windows Speech Recognition tags every clickable object with a number. Say "click" or "double-click" followed by the number to accomplish your goal.

Show Numbers works equally well with webpages, identifying clickable regions and objects on the page. It also works with the Start menu and the taskbar, offering an easy way to open and switch programs. If you prefer, you can use the "Switch to program" command, substituting the text in the title bar for the program you want to switch to. To work with individual windows, you can use the "minimize," "maximize," and "close" commands, followed by the name of the program. For the currently selected window, use the shortcut "that," as in "Minimize that." To minimize all open windows, say "Show desktop."

To scroll through text in a window, say "Scroll up" or "Scroll down." For more control over scrolling, add a number from 1 through 20 after the command (the larger the number, the greater the scrolling).

3. Using Speech to Enter and Edit Data

If it can't interpret what you say as a command, Windows Speech Recognition assumes that you're trying to dictate. It then inserts its best guess at what you meant to say at the current insertion point. The accuracy of speech recognition is reasonably good after a short period of training, and it gets much better after time and practice. But it's not perfect, nor are you likely to dictate smooth sentences with perfect syntax. As a result, you'll want to master the basics of text editing using the voice commands in this section.

To delete the most recent word or phrase you dictated, say "Undo" or "Undo that."

If you want to change a word, phrase, or sentence, start by saying "Select word" or, for a phrase, "Select word through word," substituting the actual text for the italicized entries here. "Select next [or previous] sentence" works, as does "Select previous five words" or "Select next two sentences." After you make a selection, you can delete it or copy it to the Clipboard ("Copy that").

The "Go to" command is powerful. If you follow it with a unique word that appears in the text, the word you spoke will be selected immediately. If the word appears multiple times, each one is highlighted with a number. Say the number and then say "OK." You can say "Go to before" or "Go to after" a particular word, and Windows Speech Recognition will obey your commands. To go to the top or bottom of the current editing window, say "Go to the start of the document" or "Go to the end of the document."

If you need to correct a word that was misrecognized, say "Correct word." When you do, Windows Speech Recognition reexamines what you said and displays a list of words or phrases that might be a better match, as shown in Figure 2. If the word you spoke is on the list, say its number, followed by "OK." If the word isn't on the list, try saying it again. Or say "Spell it" and then recite each letter, with or without phonetic helpers ("A as in apple").

Punctuation is easy: to insert a period, comma, colon, semicolon, or apostrophe, just say the word. Literally. The Quick Reference Card has a long list of punctuation marks the speech recognition engine will translate.

To enter a carriage return, say "New paragraph" or "New line."

Figure 2. Windows Speech Recognition takes these corrections to heart, adjusting its recognition database to ensure it doesn't make the same mistake twice.

You can simulate the action of pressing any key by saying "Press key," substituting the name of the key for the italicized word. To repeat a key, say "Press key nn times," substituting a number for nn. A handful of special keys are recognized without the magic introductory word "press": Home, End, Space, Tab, Enter, and Backspace all fall into this category.