Expanding a Large Inclusive Study of Human Listening Rates

Danielle Bragg, Katharina Reinecke, Richard E. Ladner · 2021 · ACM Transactions on Accessible Computing · doi:10.1145/3461700

Summary

This study presents the first large-scale, inclusive online investigation of human listening rates for fast synthetic speech, conducted via the LabintheWild crowdsourcing platform over 12 months with 1,409 participants. The research aimed to understand how fast people can comprehend text-to-speech output, with implications for optimizing conversational agents and assistive technology. The study used a binary search algorithm to efficiently identify each participant's maximum intelligible listening rate—participants heard words at varying speeds and identified rhyming pairs, with the algorithm adjusting speed based on accuracy. The innovative methodology made the study accessible to screen reader users, addressing a significant gap in prior research that typically excluded visually impaired participants despite their being primary users of synthetic speech. The participant pool included 270 people who are visually impaired, enabling direct comparison between experienced and inexperienced listeners.

Key findings

The mean Listening Rate across all participants was 55.3 on a 0-100 scale (approximately 298 words per minute), significantly faster than typical human speech of 120-180 WPM. Visually impaired participants achieved higher listening rates than sighted participants (55.7 vs 55.2), with visual impairment increasing predicted listening rate by 7.36 points when controlling for other factors. Age decreased listening rate by 0.21 points per year, while being a native English speaker increased it by 9.17 points. For visually impaired participants specifically, each year of screen reader experience increased listening rate by 3.08 points—suggesting that early exposure to screen readers, particularly during childhood, leads to faster listening abilities. The study identified "super-listeners" (top 0.6%, 9 participants) who achieved perfect scores at maximum VoiceOver speed, and "great-listeners" (top 4%, 56 participants) with rates of 86+. Notably, young visually impaired participants under 45 outperformed all other demographic groups, demonstrating the combined effects of youth and extensive screen reader experience.

Relevance

This research has direct implications for designing voice interfaces and conversational agents. Current systems often use fixed speech rates optimized for average users, but this study demonstrates that experienced screen reader users can comprehend speech at rates far exceeding these defaults. Developers should consider implementing adaptive speech rates that adjust to individual users' abilities. The findings also highlight vocational opportunities—people who are visually impaired may excel at time-sensitive auditory work like real-time transcription, translation, or audio analysis. For accessibility practitioners, the study reinforces the importance of inclusive research design: making studies accessible to screen reader users both improves scientific validity and demonstrates that people with disabilities contribute unique abilities, not just accessibility requirements. The limitation of testing only one synthetic voice (VoiceOver's Alex) suggests future work should explore how voice characteristics affect comprehension.

Tags: screen readers · synthetic speech · text-to-speech · listening rates · auditory processing · visual impairment · crowdsourced research · VoiceOver · LabintheWild