Multimodal Input for Computer Access and Augmentative Communication

Alice Smith, John Dunaway, Patrick Demasco, Denise Peischl · 1996 · Proceedings of the Second Annual ACM Conference on Assistive Technologies (Assets '96) · doi:10.1145/228347.228361

Summary

This paper from the University of Delaware describes a research project exploring the integration of speech recognition with head-pointing as a multimodal input method for people with physical disabilities who cannot use standard keyboards and mice. The authors frame the problem using the concept of information bandwidth — traditional assistive technologies force users to channel all interaction through a single narrow input mode, whereas multimodal input can widen this bandwidth by exploiting multiple communication channels simultaneously. The project is structured in two phases: Phase I addresses computer access for users with relatively unimpaired speech who need alternatives to keyboard and mouse, while Phase II targets augmentative communication for people with moderate to severe speech impairments. The paper draws on Nigay and Coutaz's design space model for multimodality, which characterizes interaction along dimensions of fusion (combined vs. independent modalities) and temporal use (sequential vs. parallel). The authors focus on the "exclusive" style — independent, sequential use of modalities — where speech recognition handles symbolic input (typing) and head-pointing handles spatial input (pointing and dragging), matching each technology to its natural strength.

Key findings

The pilot experiments compared head-pointing with an on-screen keyboard (HeadMaster/WiViK) against speech recognition (DragonDictate) for text generation tasks. Head-pointing with prediction and learning achieved the best results in its category, with typing rates ranging from 7.3 to 15.1 words per minute and a mean of 12.0 WPM. DragonDictate with learning averaged 26.6 WPM, roughly double the head-pointing rate. However, speech recognition without learning was unanimously judged difficult and extremely frustrating by participants. A critical finding was that neither system alone showed clear asymptotic performance improvement, suggesting a learning curve that extends well beyond the pilot testing period. The authors found that speech recognition performance without learning showed only slight improvement over successive trials, while head-pointing with prediction and no learning showed the only condition trending toward horizontal leveling. Subjective feedback revealed that transcribing complete paragraphs was awkward — subjects had difficulty maintaining focus between the transcription text, target paragraphs, and the DragonDictate/WiViK interface. The pilot results led to significant methodological refinements including simplified dictation protocols and modified evaluation software.

Relevance

This paper provides valuable early evidence for the principle that combining assistive input technologies can outperform either technology used alone — a concept that remains central to modern assistive technology design. The finding that speech recognition works best for symbolic input while head-pointing excels at spatial tasks has been validated by subsequent research and is reflected in current multimodal assistive systems that combine voice control with eye-tracking or head-tracking. For practitioners, the paper highlights the importance of adequate training time for alternative input technologies — the dramatic difference between speech recognition with and without learning underscores that user training is as critical as technology selection. The research also demonstrates the challenges of conducting rigorous assistive technology evaluation, including the difficulty of recruiting sufficient participants and the need for extended learning periods, issues that continue to affect AT research methodology today.

Tags: multimodal interaction · speech recognition · head pointing · augmentative and alternative communication · assistive technology · computer access · motor disability · alternative input