Deaf, Hard of Hearing, and Hearing Perspectives on Using Automatic Speech Recognition in Conversation

Abraham Glasser, Kesavan Kushalnagar, Raja Kushalnagar · 2017 · Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/3132525.3134781

Summary

This experience report describes the real-world accessibility challenges encountered by five participants — two deaf, one hard of hearing, and two hearing — including the authors, when using the top seven most popular ASR applications (DEAFCOM, Dragon Dictation, Siri, Virtual Voice, Ava, Google Assistant, and Amazon Alexa) for commands and group conversation over a period from Fall 2016 through Summer 2017. The study was conducted at Rochester Institute of Technology and examined ASR use across four contexts: classroom communication, job interviews, informal conversation, and speech production practice. The paper provides critical context: approximately 30 million Americans have bilateral hearing loss, about 1 million are functionally deaf, and speech production quality is correlated with hearing loss. DHH speakers show wide variation in articulation, pitch, and prosody, and even when their speech is intelligible to hearing peers, ASR systems trained on non-DHH speech perform poorly on it. Deaf speech had approximately a 78% word error rate compared to 18% for hearing speech in commercial ASR systems. Even with real-time human captioners, DHH individuals receive only 50-80% of information compared to 84-95% for hearing peers. The study found that DHH users cannot effectively participate in conversations if ASR lag exceeds 5 seconds or variance exceeds 2-3 seconds.

Key findings

Under ideal conditions (quiet one-to-one settings, American-accented hearing speakers, good WiFi, conversations under 5 minutes), all apps performed reasonably well. Performance degraded significantly with: duration beyond 5 minutes (5 of 7 apps showed significant lag and jitter), background noise (all apps inserted random text even when noise was imperceptible to hearing peers), multiple speakers (lag increased and apps could not identify who was speaking), hearing accents from other countries, and deaf speech/accent (most apps failed to recognize DHH speakers' prosody and articulation even when their speech was understandable to hearing peers). DHH-specific apps (DEAFCOM, Virtual Voice, Ava) had less lag but still had high error rates for deaf speech. Key usability barriers included: voice-only output on devices like Alexa making responses inaccessible to DHH users; inability to edit errors in real-time transcripts; lack of speaker identification in group settings; and the social dynamic where hearing peers' patience and attitude significantly affected the interaction. DHH users could not monitor their own speech volume or inflections, needing feedback about phone placement and volume. Writing or typing, while accurate and low-variance, was 3-4 times slower than speech and unsustainable for extended communication. Deaf signers expressed a preference for visual interfaces that recognize sign language rather than speech. The shared experience of using ASR together created a "collective learning experience" where hearing users realized they too had comprehension difficulties with the technology.

Relevance

This report powerfully demonstrates that the shift toward voice-controlled and aural interfaces (Alexa, Siri, Google Assistant) creates a significant and growing accessibility barrier for DHH individuals. As these interfaces replace visual and text-based alternatives, DHH users are increasingly excluded from mainstream technology. The 78% word error rate for deaf speech versus 18% for hearing speech in commercial ASR is a stark disparity that current systems are not designed to address. For accessibility practitioners and ASR developers, the recommendations are clear: include text input alongside voice; provide visual feedback for all aural output; support external microphones (lapel, Bluetooth) for noisy environments; add speaker identification for group use; and critically, train ASR models on DHH speech patterns. The study also highlights that lab-reported ASR accuracy improvements (Microsoft at 6% WER, Google at 5%) do not translate to real-world conversational settings where error rates of 20-25% persist even for hearing speakers. The social insight — that using ASR together helped hearing people understand the communication barriers DHH people face — suggests potential for ASR tools as empathy-building experiences.

Tags: automatic speech recognition · deaf and hard of hearing · speech recognition · communication accessibility · voice interface · workplace accessibility · aural interface