|
|
Hardware and Software RequirementsA dictation application requires certain hardware and software on the user's computer. Not all computers have the memory, speed, microphone, or speakers required to support speech, so it is a good idea to design the application so that speech is optional. These hardware and software requirements should be considered when designing a speech application:
For a list of engine vendors that support the Speech API, see the ENGINE.DOC file included with the Speech Software Development Kit. LimitationsEven the most sophisticated speech recognition engine has limitations that affect what it can recognize and how accurate the recognition will be. The following list illustrates many of the limitations found today. The limitations do pose some problems, but they do not prevent the design and development of savvy applications that use dictation. Microphones and sound cardsThe microphone is the largest problem that speech recognition encounters. Microphones inherently have the following problems:
Most applications can do little about the microphone. One way that vendors can deal with this is to test and verify the user's microphone setup as part of the installation of any speech component software. Software to test a user's microphone can be delivered along with other components to ensure that the user can periodically test and adjust the microphone and configuration. Most users of dictation will wear close-talk microphones for maximum accuracy. Close-talk mikes have the best characteristics for speech recognition; they alleviate a number of the problems encountered in Command and Control recognition caused by weaknesses in the capabilities of user microphones in speech recognition and dictation applications. Speech Recognizers make mistakesSpeech recognizers make mistakes, and will always make mistakes. The only thing that is changing is that every two years recognizers make half as many mistakes as they did before. But, no matter how great a recognizer is it will always make mistakes. To make matters worse, dictation engines make misrecognitions that are correctly spelled and often grammatically correct, but mean nothing. Unfortunately, the misrecognitions sometimes mean something completely different than the user intended. These sorts of errors serve to illustrate some of the complexity of speech communication, particularly in that people are not accustomed to attributing strange wording to speech errors. To minimize some of the misrecognitions, an application can:
Is it a Command?When speech recognition is listening for dictation, user's will often want to interject commands such as "cross-out" to delete the previous word or "capitalize-that". Applications should make sure that:
Finite Number of WordsSpeech recognizers listen for 20,000 to 100,000 words. Because of this, one out of every fifty words a user speaks isn't recognized because it isn't in the 20,000 -- 100,000 words supported by the engine. Applications can reduce the error rate of an engine if the application tells the engine about what words the engine should expect. Other ProblemsSome other problems crop up:
Application Design ConsiderationsHere are some design considerations for applications using command and control speech recognition. Design Speech Recognition in From the StartDon't make the mistake of implementing speech recognition in your application as an afterthought. It's a poor design if the application is designed for a mouse and keyboard. Applications designed for just the keyboard and mouse get little benefit from speech recognition. The speech interface is at a point similar to where the mouse interface was when applications were designed for keyboard input only-not until applications were deliberately designed for mousing did the mouse prove generally effective for user input. Do Not Replace the Keyboard and MouseMost dictation systems provide discrete dictation, allowing users to speak up to 50 words per minute. While this is faster than hunt-and-peck typists, touch typists can type at least 70 words per minute. Discrete dictation will not be used by touch typists. Continuous dictation allows up to 120 words per minute. Communicate Speech AwarenessSince most applications today do not include speech recognition, users will find speech recognition a new technology. They probably won't assume that your application has it, and won't know how to use it. When you design a speech recognition application, it is important to communicate to the user that your application is speech-aware and to provide him or her with the commands it understands. It is also important to provide command sets that are consistent and complete. Manage User ExpectationsUsers will often have the expectation that speech-enabled applications will provide a level of comprehension and interaction comparable to the futuristic speech-enabled computers of Star Trek and 2001: A Space Odyssey. Some users will expect the computer to correctly transcribe every word that they speak, understand it, and then act upon it in an intelligent manner. You should convey as clearly as possible exactly what an application can and cannot do and emphasize that the user should speak clearly, using words the application understands. Where the Engine Comes FromIf an application implements speech recognition, it can work on an end user's PC only if the system has a speech recognition engine installed on it. The application has two choices:
|
Send mail to askazad@hotmail.com with questions or comments about this web site.
|