Speaking to Write / Word for Word: An Overview of Speech Recognition
by Bob Follansbee, Ed. D.
(Reprinted with permission from the International Dyslexia Association quarterly newsletter, Perspectives, Fall, 2003 vol. 29, No. 4. It's worth joining IDA just to get the Perspectives newsletter. Each issue has many articles like this.)
Few technological innovations have prompted as much promise for students with learning disabilities (LD), and as many dashed expectations, as speech recognition. You may have heard that speech recognition is the "magic bullet" for some students, but you may also have heard stories about how speech recognition simply "doesn't work." What do we really know about speech recognition for students with LD and how it works or doesn't?
The current situation with speech recognition
Generally, when people today speak of "speech recognition" they are referring to continuous speech recognition, so named because, based on its claims, one can speak to the computer in a natural voice and at a normal rate of speech, and it will simply write what was said. This is a fine example of marketing hyperbole that glosses over important aspects of using the software. We will discuss below how such results might be achieved for some students with LD.
The primary continuous speech products now available are various editions of Dragon NaturallySpeaking (now owned by Nuance) and IBM Via Voice. Dragon products operate in Windows only, while ViaVoice offers versions for both Windows and the Macintosh. There is also a newer Macintosh-only product, iListen., by MacSpeech. (An older technology, discrete speech recognition, represented by DragonDictate, is less readily available now but may still be preferable for some individuals - see below).
Product information can be obtained at the following sites:
Does speech recognition work? The promise and the reality
This question does not have a simple, one-size-fits-all, answer. However, with the proper introduction to and training on the technology, and with reasonable expectations, speech recognition software can operate with remarkable ease and accuracy, and can be a tremendous boon to some students who might otherwise never have a successful writing experience. It can be an avenue for some students to begin to participate more effectively and more independently in grade-appropriate work. Students who use speech recognition do not automatically become better writers, but they are almost always able to produce more work, more easily, which then allows them to engage in writing instruction and composition at a more advanced level. Instead of simple editing decisions like re copying a few sentences to make them more legible, or correcting spelling and j grammar mistakes, students can focus on the more complex cognitive tasks that come with producing more text - tasks like revision and reorganization. Students can begin to write at a level that matches their grade level and, in fact, the level at which they could actually dictate text.
On the other hand, failure to understand the training and usage requirements of speech recognition products often leads to unsuccessful experiences with this technology. Additional unsuccessful experiences are not what you want for your students with LD.
The first speech recognition usage issue relates to the potential user. Speech recognition is not the appropriate tool for every individual. Generally, successful users of speech recognition exhibit at least some of the following characteristics:
- Ability to compose orally. This does not have to be well-developed, but at least show potential.
- Ability to reflect on and modify their own speech patterns and articulation of words. This requires a level of linguistic self-monitoring and control over the muscles of articulation.
- Patience and perseverance needed to complete the initial stages of using the software.
- Understanding of the purposes of literacy. Even students with LD who shun participation in literacy-based activities may understand why literacy is important.
These points assume a basic level of cognitive ability and development, and most successful users are at least 10 years old. However, these points are not rules, but guidelines for consideration. With the appropriate training and support, students of varying ages and abilities can use speech recognition in some manner.
Evaluating the success of speech recognition is not necessarily a straightforward or objective matter, but should be based on how well the software addresses the individual's source(s) of writing difficulty. For example, a logical "objective" measure of success is the error rate in the software's recognition of a student's speech. However, some struggling and frustrated students may feel very successful, perhaps for the first time, simply being able to produce a much greater amount of text, regardless of the software's recognition accuracy. Corrections to errors in software accuracy can be treated as a different form of editing, which is part of any writer's task.
We know from considerable experience through our projects and from the input of many participants in our listserv, that speech recognition is being successfully implemented with students around the country. Many successful users are students with LD who are using the software at home to complete homework and longer written assignments - the school may not even be aware that a student is using speech recognition. Some successful users are individuals who begin using speech recognition in school through advocacy and under the requirements of an IEP; situations like this can often quickly result in other students using speech recognition also, with students helping to train each other.
What is needed?
Successful implementation of speech recognition does not happen by accident. Schools must be committed to providing basic resources, staff must be committed to working with the student and the technology, and students must be committed to the process of improving their performance both in the mastery of the software and in their writing skills.
First, it is desirable that speech recognition be considered as only part of a continuum of assistive technology strategies. Speech recognition is relatively intensive in terms of the training required for successful use, and it is also somewhat limited in terms of the environments where it can be used successfully. Other strategies should be considered for any given student. On the other hand it is a very powerful solution in some cases and may be the most effective (or only) way to "reclaim" certain students. Finally, there is no reason why some students might not use multiple text creation strategies, including speech recognition, depending on the demands of the assignment or the situation.
Introduction and training
Learning to use speech recognition effectively requires four things.
- First, the new user must train the software to recognize his or her voice through an initial enrollment process that can be demanding for poor readers: the enrollee must read text that is presented on the screen. However, the enrollment process allows pausing so that the text can be previewed with the student screen-by- screen. While this process used to take 30 minutes of continual reading in earlier software versions, it has now been cut down to about 10 minutes. Then the user must learn to:
- Speak so that the software can understand what is said. This is not the same as speaking in conversation. While the software does adapt to the user's particular voice, for most users each word must be enunciated relatively clearly (e.g., "I have to study..." rather than "I hafta study..."). Users are encouraged to speak in multiple word utterances, but this can be varied from whole sentences to phrases to even single words, depending on what works best for the student's voice and style. This process will require some monitoring by a knowledgeable support person and time spent practicing, much as it takes practice to learn to type. We recommend using simple writing tasks with low cognitive demands during this phase - don't begin with a major assignment.
- Make corrections and otherwise operate the software. Learning to make corrections through the software is especially important so that the software learns the user's voice better. This is the step that is often left too much to chance. The first few hours of dictation should be carefully monitored so that errors in recognition are corrected through the software and the student understands where the software is having difficulty understanding his or her voice. Other aspects of operating the software, such as using voice commands (e.g., "Save File") may be important or motivating to some students, but voice commands may be mis-recognized, thereby serving as additional sources of frustration. For students with LD, we recommend use of the mouse and keyboard for such actions whenever possible.
- Compose through a new medium. Composing via speech is different from doing so through a pencil or the keyboard. This requires time to accommodate. In the meantime, other scaffolding strategies to support writing, such as pre-writing activities and editing, may still be appropriate.
The most common complaint encountered in speech recognition implementation is that the software "just doesn't work" when the student talks to it. Digging below the surface of such complaints often demonstrates that the student and adult supporter have not really understood some of the basic requirements for training the software and have expectations of "natural speech" that exceed the capacities of the technology. Users must remember that the computer does not have any real comprehension of language. When the user is enrolling to a "voice file," the software is matching that user's voice to the models it already has. It is essential that the user learn both how to speak to the software to maximize its understanding (#2 above), and also how to make corrections the proper way to provide the software voice file with useful information about its recognition errors (#3). Proper training and patience is often the solution.
For successful implementation of speech recognition, students, staff, and schools have specific responsibilities.
Schools must provide:
- A support staff member to provide training and support in use of the technology. This person might be a special education teacher, speech pathologist, technology or inclusion specialist, or even the English teacher, etc.
- Opportunities for collaboration between the speech recognition support staff member and the teacher(s) who implement or support the student's writing requirements. In some cases this might be the same person, but will often involve two or more different teachers.
- Training in implementation for an appropriate staff member. This implies not only to actual workshop time, but also supported practice time for the staff member.
- Consulting support for staff and students as needed during implementation with students.
- Adequate hardware and technical support for hardware problems, software installation, etc.
- Space for use of speech recognition. This technology does not require absolute silence, and can be used with considerable background noise if set-up properly. However, some environments are very difficult to accommodate. A typically problematic space is the kind often encountered in older school buildings: high ceilings with hard surfaces (tile, plaster, etc.) everywhere and no acoustic absorption. Finding smaller spaces or area adjustments (e.g., a carpeted comer, use of a carrel, etc.) can help with this. Sometimes this space may be found within the classroom when it is not excessively noisy (e.g., one would not try to dictate during an "inside recess" period). Most successful users in schools have at least one alternative location identified for dictating in any given period, and the user must be willing to use that alternative location when it seems appropriate. See also the next two points.
- Space consideration for speech recognition is also important because the act of composing is often a private matter and some students may feel awkward "writing outloud" in front of others. On the other hand, the speech recognition user may react negatively to being removed from the regular classroom, so students' perceptions of these issues should be understood.
- Location of an appropriate space for use of speech recognition also requires sensitivity to the needs of other students in their classes. The use of speech recognition by one or more students might be disruptive to other students, so this matter must be considered. Time for staff to work with the student during initial stages of speech recognition use. Students need the most support when they are first using the software, and staff should have some leeway to provide this.
- Academic (substitute) credit for students who learn to use speech recognition. Rather than adding an extra requirement for the already over-burdened student with LD, learning to use speech recognition might count as part of a class in computer literacy or be integrated into requirements of an English writing class.
Support staff should:
- Be willing to learn how to support the students using speech recognition. Adequate support requires learning the strategies that successful speech recognition users must know, which is further helped by learning how to use the software themselves.
- Provide a gradual "ramping up" of work requirements for students using speech recognition. This is very important! Once students gain fluency in using speech recognition they are often faced with a new phenomenon - the requirement to complete the same work as their peers. We have seen situations where students responded to this realization by rejecting the technology or rebelling in other ways. We believe it is critical to increase work demands gradually to allow students who previously were unable to write effectively a chance to accept their "new" writing abilities and acclimate to these new responsibilities.
- Be committed to providing some "make up" instruction. Typically, by the time of middle school, students who have perennially struggled with writing have missed a lot of important instruction in writing basics. Immediate overemphasis on deficient mechanics can be discouraging. Teachers should first value and support the increase in amount produced and work on higher-level organization (thinking) issues while helping students come to a gradual appreciation of the importance of writing mechanics.
- Be willing to try speech recognition with some students who might be unlikely to ever use the technology completely independently, but who might use the software with some level of support to produce text that they otherwise could not.
Individual students (and their parents) must:
- Acknowledge that this technology is not necessarily the correct solution for all students, including themselves.
- Acknowledge that mastery of the software requires effort and some flexibility in ways of working.
- Acknowledge that mastery of the software will entail an increase (hopefully gradual) in workload to reflect the level of work expected for grade level (or depending on other identified disabilities), and express a willingness to participate on that basis.
From the perspective of costs, speech recognition software is relatively inexpensive. Moreover, a single piece of speech recognition software can be used by more than one student, limited only by amount of disk space for voice files and available time to use a single computer. Speech recognition does require relatively newer, more powerful computers, so there may be an initial hardware expense. The greatest costs in use of speech recognition are those involved in initial training of support staff who will be teaching students how to use the software effectively, and subsequent time for actual student training. However, we are seeing that training costs per student are dropping as each staff member begins working with more than one student.
Another type of speech recognition was mentioned above. Discrete speech recognition requires that the user speak one-word-at-a-time. We have found that some students prefer this slower pace of dictation and composition. The last product to provide this, DragonDictate, is now very hard to find. However, the current versions of continuous speech recognition products accommodate slower-paced dictation to some extent.
There are important differences for individuals and especially children and students with disabilities in the operation of the various continuous speech products, so knowledge of these can be important. A few pointers are listed below.
All other things being equal, Dragon NaturallySpeaking Preferred is our recommended system for students with LD in schools. The Preferred edition costs between $150-200 and is recommended over less expensive versions of NaturallySpeaking because it includes several features that can be of particular benefit: digital playback of the user's voice (what the user actually said) and synthesized speech readback of the text (what the software actually put on the page). Students with LD can use these features to check for errors. All versions of NaturallySpeaking also include other features that can be important for students: dynamic updating of the words in the correction window, easy management of the correction window, and presence of an arrow to signal location (like a bouncing ball) during digital readback. Naturally- Speaking versions 4 and above accommodate adolescent voices.
According to most reviews and the author's personal experience, IBM Via Voice is equal to the Dragon products in accuracy for most adult and adolescent users. Via Voice also includes synthesized speech readback of the text to see what the software actually put on the page, but has more limited digital readback of the speaker's voice. On the whole, we believe that the interface is somewhat less effective than that of the Dragon products for some students with LD, as management of the correction window and cursor location seem more intrusive. These characteristics might not be a problem for many users. Certainly, if students need a Macintosh product, Via Voice is a fine alternative.
MacSpeech iListen is a Macintosh- only alternative. The author has only used an early release version that seemed promising, but was not a complete product. Several members of the Speak to Write e-mail list report that the product performed well with middle-school and older students, and compared it favorably to Via Voice for the Macintosh. It is not clear whether the most recent version has text- to-speech readback.
Two products from the UK bear mention. One is Screen Speaker (Keystone), a utility that provides much- needed text-to-speech support within the correction window and adaptations to the enrollment process for the Dragon speech recognition products. Another product is Read & Write Gold (textHELP!), a utility that provides general literacy support in many applications and which has a speech recognition module.
Final thoughts and words of encouragement
As with any promising new educational strategy, the use of speech recognition has proceeded in fits and starts. Yet, we have seen firsthand the positive outcomes of use of speech recognition software on the lives of individual students with ill. Now, there are increasing numbers of students across the country who are using speech recognition successfully to meet some or all of their writing needs. As more schools move forward with this and other technologies in a more systematic way, we better understand how to assure successful outcomes for students and also how to manage the costs that inevitably accompany such initiatives.
- This article is based on work done under two U.S. Dept. of Education/National Institute of Disability Rehabilitation and Research (NIDRR) funded projects awarded to the Education Development Center, Newton, MA: Speaking to Write (I H133G70143) and Word for Word (# H133GOOO204). Some of the information here is elaborated in Speaking to Write, although that project is now inactive. However, the Speaking to Write project operates a e-mail list that continues to be an active source of information about speech recognition from many people who are supporting its use in schools and elsewhere.
- Based on the following versions of the software: Dragon NaturallySpeaking v.6; IBM Via Voice v.9. Newer versions of both products now exist.
For clarity, there has been some editing of this document.
Copyright 1999- 2010 Next Generation Technologies Incorporated