← All reviews

Improving the Usability of Speech-Based Interfaces for Blind Users

Ian J. Pitt, Alistair D. N. Edwards · 1996 · Proceedings of the Second Annual ACM Conference on Assistive Technologies (Assets '96) · doi:10.1145/228347.228367

Summary

This paper from the University of York examines the usability problems inherent in speech-based interfaces for blind computer users and presents a study comparing how blind and sighted subjects process information delivered through synthetic speech. The authors identify six key issues with existing screen reader adaptations drawn from psycholinguistic research: the overwhelming quantity of information converted to speech, poor ordering of information (important content buried amid less relevant text), inappropriate placement of pauses (speech synthesizers pause at line breaks and punctuation rather than at grammatically meaningful boundaries), inadequate prosody (the rise and fall in pitch that helps listeners parse sentences), poor pronunciation quality increasing cognitive load, and underuse of non-speech sounds which could convey simple information more efficiently. The evaluation used a Hangman-style spelling game presented through a Hal screen reader with an Apollo speech synthesizer to 10 blind and 7 sighted subjects at the RNIB Vocational College in Loughborough, UK. Subjects worked in pairs (one blind, one sighted) or individually, with their discussions and interactions recorded on video.

Key findings

The study revealed striking differences between blind and sighted users. All sighted subjects and over half the blind subjects wanted less speech output. 55% of subjects preferred a different information ordering, and 40% wanted more pauses. Critically, blind subjects recalled information sequences correctly more often than sighted subjects, though it is unclear whether this reflects better auditory processing skills or coping strategies developed through experience with speech interfaces. Both groups had considerable difficulty remembering already-guessed letters and remaining lives — information that was presented but buried amid other speech. Subjects frequently talked over speech output they considered unimportant, missing subsequent important information. The phrase "Please enter a letter" was universally considered too long and distracting. Subjects strongly agreed that information should be available on demand rather than forced, with hot keys providing access to specific items like the letter sequence or remaining lives. When asked directly, no subjects wanted non-speech sounds, but during discussion many spontaneously suggested tones for success/failure indicators or to replace the "Please enter a letter" prompt — recognizing that non-speech sounds could convey simple information more quickly and with less distraction than speech.

Relevance

This paper identifies usability problems with screen reader output that remain highly relevant nearly 30 years later. The core findings — that speech interfaces present too much information, in the wrong order, with inadequate pausing and prosody — describe issues that modern screen reader users still encounter. The recommendation that information should be available on demand rather than forced parallels current best practices for screen reader interaction design, where verbose and brief output modes give users control over information density. The finding that blind users develop superior strategies for processing speech but still struggle with excessive output reinforces the principle that good accessibility design should minimize cognitive load rather than rely on users adapting to poor design. For practitioners designing speech-based interfaces, the paper provides actionable guidelines: keep speech short and terse, place important information at the end of utterances (leveraging the recency effect), use grammatically correct pause placement, provide user control over what is spoken, and consider non-speech sounds for simple status information. The observation that users wanted non-speech sounds despite initially rejecting them highlights the gap between stated and actual preferences in accessibility research.

Tags: blindness and low vision · screen reader · speech synthesis · usability · speech dialogue design · text-to-speech · auditory interface · prosody · cognitive load