Speaking to Write / Word for Word: An Overview of Speech Recognition

by Bob Follansbee, Ed. D.

(Reprinted with permission from the International Dyslexia Association quarterly newsletter, Perspectives, Fall, 2003 vol. 29, No. 4. It's worth joining IDA just to get the Perspectives newsletter. Each issue has many articles like this.)

Few technological innovations have prompted as much promise for students with learning disabilities (LD), and as many dashed expectations, as speech recognition. You may have heard that speech recognition is the "magic bullet" for some students, but you may also have heard stories about how speech recognition simply "doesn't work." What do we really know about speech recognition for students with LD and how it works or doesn't?

The current situation with speech recognition

Generally, when people today speak of "speech recognition" they are referring to continuous speech recognition, so named because, based on its claims, one can speak to the computer in a natural voice and at a normal rate of speech, and it will simply write what was said. This is a fine example of marketing hyperbole that glosses over important aspects of using the software. We will discuss below how such results might be achieved for some students with LD.

The primary continuous speech products now available are various editions of Dragon NaturallySpeaking (now owned by Nuance) and IBM Via Voice. Dragon products operate in Windows only, while ViaVoice offers versions for both Windows and the Macintosh. There is also a newer Macintosh-only product, iListen., by MacSpeech. (An older technology, discrete speech recognition, represented by DragonDictate, is less readily available now but may still be preferable for some individuals - see below).

Product information can be obtained at the following sites:

Does speech recognition work? The promise and the reality

This question does not have a simple, one-size-fits-all, answer. However, with the proper introduction to and training on the technology, and with reasonable expectations, speech recognition software can operate with remarkable ease and accuracy, and can be a tremendous boon to some students who might otherwise never have a successful writing experience. It can be an avenue for some students to begin to participate more effectively and more independently in grade-appropriate work. Students who use speech recognition do not automatically become better writers, but they are almost always able to produce more work, more easily, which then allows them to engage in writing instruction and composition at a more advanced level. Instead of simple editing decisions like re copying a few sentences to make them more legible, or correcting spelling and j grammar mistakes, students can focus on the more complex cognitive tasks that come with producing more text - tasks like revision and reorganization. Students can begin to write at a level that matches their grade level and, in fact, the level at which they could actually dictate text.

On the other hand, failure to understand the training and usage requirements of speech recognition products often leads to unsuccessful experiences with this technology. Additional unsuccessful experiences are not what you want for your students with LD.

The first speech recognition usage issue relates to the potential user. Speech recognition is not the appropriate tool for every individual. Generally, successful users of speech recognition exhibit at least some of the following characteristics:

These points assume a basic level of cognitive ability and development, and most successful users are at least 10 years old. However, these points are not rules, but guidelines for consideration. With the appropriate training and support, students of varying ages and abilities can use speech recognition in some manner.

Evaluating the success of speech recognition is not necessarily a straightforward or objective matter, but should be based on how well the software addresses the individual's source(s) of writing difficulty. For example, a logical "objective" measure of success is the error rate in the software's recognition of a student's speech. However, some struggling and frustrated students may feel very successful, perhaps for the first time, simply being able to produce a much greater amount of text, regardless of the software's recognition accuracy. Corrections to errors in software accuracy can be treated as a different form of editing, which is part of any writer's task.

We know from considerable experience through our projects and from the input of many participants in our listserv, that speech recognition is being successfully implemented with students around the country. Many successful users are students with LD who are using the software at home to complete homework and longer written assignments - the school may not even be aware that a student is using speech recognition. Some successful users are individuals who begin using speech recognition in school through advocacy and under the requirements of an IEP; situations like this can often quickly result in other students using speech recognition also, with students helping to train each other.

What is needed?

Successful implementation of speech recognition does not happen by accident. Schools must be committed to providing basic resources, staff must be committed to working with the student and the technology, and students must be committed to the process of improving their performance both in the mastery of the software and in their writing skills.

First, it is desirable that speech recognition be considered as only part of a continuum of assistive technology strategies. Speech recognition is relatively intensive in terms of the training required for successful use, and it is also somewhat limited in terms of the environments where it can be used successfully. Other strategies should be considered for any given student. On the other hand it is a very powerful solution in some cases and may be the most effective (or only) way to "reclaim" certain students. Finally, there is no reason why some students might not use multiple text creation strategies, including speech recognition, depending on the demands of the assignment or the situation.

Introduction and training

Learning to use speech recognition effectively requires four things.

  1. First, the new user must train the software to recognize his or her voice through an initial enrollment process that can be demanding for poor readers: the enrollee must read text that is presented on the screen. However, the enrollment process allows pausing so that the text can be previewed with the student screen-by- screen. While this process used to take 30 minutes of continual reading in earlier software versions, it has now been cut down to about 10 minutes. Then the user must learn to:
  2. Speak so that the software can understand what is said. This is not the same as speaking in conversation. While the software does adapt to the user's particular voice, for most users each word must be enunciated relatively clearly (e.g., "I have to study..." rather than "I hafta study..."). Users are encouraged to speak in multiple word utterances, but this can be varied from whole sentences to phrases to even single words, depending on what works best for the student's voice and style. This process will require some monitoring by a knowledgeable support person and time spent practicing, much as it takes practice to learn to type. We recommend using simple writing tasks with low cognitive demands during this phase - don't begin with a major assignment.
  3. Make corrections and otherwise operate the software. Learning to make corrections through the software is especially important so that the software learns the user's voice better. This is the step that is often left too much to chance. The first few hours of dictation should be carefully monitored so that errors in recognition are corrected through the software and the student understands where the software is having difficulty understanding his or her voice. Other aspects of operating the software, such as using voice commands (e.g., "Save File") may be important or motivating to some students, but voice commands may be mis-recognized, thereby serving as additional sources of frustration. For students with LD, we recommend use of the mouse and keyboard for such actions whenever possible.
  4. Compose through a new medium. Composing via speech is different from doing so through a pencil or the keyboard. This requires time to accommodate. In the meantime, other scaffolding strategies to support writing, such as pre-writing activities and editing, may still be appropriate.

The most common complaint encountered in speech recognition implementation is that the software "just doesn't work" when the student talks to it. Digging below the surface of such complaints often demonstrates that the student and adult supporter have not really understood some of the basic requirements for training the software and have expectations of "natural speech" that exceed the capacities of the technology. Users must remember that the computer does not have any real comprehension of language. When the user is enrolling to a "voice file," the software is matching that user's voice to the models it already has. It is essential that the user learn both how to speak to the software to maximize its understanding (#2 above), and also how to make corrections the proper way to provide the software voice file with useful information about its recognition errors (#3). Proper training and patience is often the solution.

For successful implementation of speech recognition, students, staff, and schools have specific responsibilities.

Schools must provide:

Support staff should:

Individual students (and their parents) must:

From the perspective of costs, speech recognition software is relatively inexpensive. Moreover, a single piece of speech recognition software can be used by more than one student, limited only by amount of disk space for voice files and available time to use a single computer. Speech recognition does require relatively newer, more powerful computers, so there may be an initial hardware expense. The greatest costs in use of speech recognition are those involved in initial training of support staff who will be teaching students how to use the software effectively, and subsequent time for actual student training. However, we are seeing that training costs per student are dropping as each staff member begins working with more than one student.

Other considerations

Another type of speech recognition was mentioned above. Discrete speech recognition requires that the user speak one-word-at-a-time. We have found that some students prefer this slower pace of dictation and composition. The last product to provide this, DragonDictate, is now very hard to find. However, the current versions of continuous speech recognition products accommodate slower-paced dictation to some extent.

Our recommendations

There are important differences for individuals and especially children and students with disabilities in the operation of the various continuous speech products, so knowledge of these can be important. A few pointers are listed below.

All other things being equal, Dragon NaturallySpeaking Preferred is our recommended system for students with LD in schools. The Preferred edition costs between $150-200 and is recommended over less expensive versions of NaturallySpeaking because it includes several features that can be of particular benefit: digital playback of the user's voice (what the user actually said) and synthesized speech readback of the text (what the software actually put on the page). Students with LD can use these features to check for errors. All versions of NaturallySpeaking also include other features that can be important for students: dynamic updating of the words in the correction window, easy management of the correction window, and presence of an arrow to signal location (like a bouncing ball) during digital readback. Naturally- Speaking versions 4 and above accommodate adolescent voices.

According to most reviews and the author's personal experience, IBM Via Voice is equal to the Dragon products in accuracy for most adult and adolescent users. Via Voice also includes synthesized speech readback of the text to see what the software actually put on the page, but has more limited digital readback of the speaker's voice. On the whole, we believe that the interface is somewhat less effective than that of the Dragon products for some students with LD, as management of the correction window and cursor location seem more intrusive. These characteristics might not be a problem for many users. Certainly, if students need a Macintosh product, Via Voice is a fine alternative.

MacSpeech iListen is a Macintosh- only alternative. The author has only used an early release version that seemed promising, but was not a complete product. Several members of the Speak to Write e-mail list report that the product performed well with middle-school and older students, and compared it favorably to Via Voice for the Macintosh. It is not clear whether the most recent version has text- to-speech readback.

Two products from the UK bear mention. One is Screen Speaker (Keystone), a utility that provides much- needed text-to-speech support within the correction window and adaptations to the enrollment process for the Dragon speech recognition products. Another product is Read & Write Gold (textHELP!), a utility that provides general literacy support in many applications and which has a speech recognition module.

Final thoughts and words of encouragement

As with any promising new educational strategy, the use of speech recognition has proceeded in fits and starts. Yet, we have seen firsthand the positive outcomes of use of speech recognition software on the lives of individual students with ill. Now, there are increasing numbers of students across the country who are using speech recognition successfully to meet some or all of their writing needs. As more schools move forward with this and other technologies in a more systematic way, we better understand how to assure successful outcomes for students and also how to manage the costs that inevitably accompany such initiatives.

Endnotes

  1. This article is based on work done under two U.S. Dept. of Education/National Institute of Disability Rehabilitation and Research (NIDRR) funded projects awarded to the Education Development Center, Newton, MA: Speaking to Write (I H133G70143) and Word for Word (# H133GOOO204). Some of the information here is elaborated in Speaking to Write, although that project is now inactive. However, the Speaking to Write project operates a e-mail list that continues to be an active source of information about speech recognition from many people who are supporting its use in schools and elsewhere.
  2. Based on the following versions of the software: Dragon NaturallySpeaking v.6; IBM Via Voice v.9. Newer versions of both products now exist.
Bob Follansbee, Ed.D, works at the Education Development Center (EDC) in Newton, MA where he has directed several federally-funded projects that involve development of educational tools and strategies to help students with disabilities participate and succeed in the regular education curriculum. Among these are two projects focused on the use of speech recognition by students. He oversees the operation of the Speak to Write e-mail list. Before coming to EDC1 he was director of the Computer Learning Program an assistive technology service of the Communication Enhancement Center at Children's Hospital, Boston.

For clarity, there has been some editing of this document.


Copyright 1999- 2010 Next Generation Technologies Incorporated

Jump to: site navigation | page content

Highlighted Products and Services

Site Navigation