← All reviews

Exploring the Use of Speech Input by Blind People on Mobile Devices

Shiri Azenkot, Nicole B. Lee · 2013 · Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/2513383.2513440

Summary

This paper investigates how blind people use speech input on mobile devices through two studies: a survey of 169 participants (64 blind/low-vision, 105 sighted) and a laboratory study with 8 blind participants composing paragraphs on an iPod Touch using speech dictation versus the on-screen keyboard with VoiceOver. The research was motivated by the observation that while significant effort has gone into developing gesture-based nonvisual text entry methods (Braille-based systems achieving 7-23 WPM), speech is a natural, fast, and already well-integrated input modality on iOS and Android that had not been studied as an eyes-free input method. The survey explored usage frequency, message types, and satisfaction, while the lab study had participants compose formal paragraphs (4-8 sentences, professional tone) to observe the full dictation-review-edit workflow. The study used composition tasks rather than transcription to reflect real-world usage, since speech recognizers are trained on conversational speech rather than read-aloud phrases.

Key findings

The survey found that 90.6% of blind/low-vision participants had used dictation recently versus 55.2% of sighted participants. BLV participants used speech more frequently (within the last day vs. last week for sighted), composed longer messages, and were significantly more satisfied with speech input (p < 0.001). In the lab study, speech input was nearly five times faster than the keyboard (19.5 WPM vs. 4.3 WPM, p < 0.001), but participants spent an average of 80.3% of their time reviewing and editing the recognized text — compared to only 9% when using the keyboard. The average ASR word error rate was 10.2% (ranging from 0% to 35.6%). Three editing techniques were observed: "hone in, delete, and reenter" (most common — navigate to error, backspace, retype), "hone in, select, and reenter" (more efficient but rarely used), and "delete and start over" (used by less experienced participants). Six of eight participants preferred speech despite the editing challenges. Key frustrations included VoiceOver not flagging low-confidence recognitions, not communicating punctuation clearly, and not identifying misspelled words that were only visually marked with underlines.

Relevance

This research reveals a critical bottleneck in mobile accessibility: while speech input dramatically speeds up text entry for blind users, the editing process is so inefficient that it consumes 80% of composition time. This finding has direct implications for mobile platform developers — improving eyes-free error detection and correction would yield far greater productivity gains than further improving recognition accuracy. The paper identifies specific VoiceOver shortcomings that remained relevant for years: the inability to communicate low-confidence words, inconsistent punctuation feedback, and no indication of spelling errors. For accessibility practitioners, the key insight is that input speed is only part of the text entry equation; the entire compose-review-edit workflow must be accessible. The study also highlights that blind users have fundamentally different needs from sighted users in editing — they cannot quickly scan text for errors and must review word-by-word or character-by-character. The research challenges for nonvisual text entry identified (text selection, cursor positioning, error detection, grammar correction, out-of-vocabulary words) remain active areas of work.

Tags: visual impairment · blindness · speech input · speech recognition · mobile accessibility · text entry · VoiceOver · dictation · eyes-free interaction