Accessibility Evaluation of Classroom Captions
Raja S. Kushalnagar, Walter S. Lasecki, Jeffrey P. Bigham · 2014 · ACM Transactions on Accessible Computing · doi:10.1145/2543578
Summary
This paper presents a comprehensive evaluation of real-time captioning approaches for classroom lectures, comparing Communication Access Realtime Translation (CART), Automatic Speech Recognition (ASR), and a novel collaborative captioning system called Legion:Scribe. The authors first analyzed 240 YouTube educational lectures to characterize the captioning challenge: speakers average 169.4 words per minute but can burst to over 400 WPM, vocabularies range from 2,000-3,000 unique words per hour (larger than TV captions), and 12% of content includes visuospatial references that are difficult to caption. The collaborative approach uses multiple non-expert typists working simultaneously, with their partial inputs merged using a sequence alignment algorithm similar to those used in bioinformatics. The study employed both quantitative analysis and a controlled eye-tracking experiment with 48 participants (24 deaf or hard of hearing, 24 hearing) to measure caption readability, comprehension, and user preferences across the three captioning methods.
Key findings
The collaborative captioning approach significantly outperformed ASR and performed comparably to professional CART on most measures. In user ratings, collaborative captions (M=3.15) were rated easier to follow than CART (M=3.09) and significantly better than ASR (M=2.19). Latency measurements showed CART at 4.2 seconds, collaborative at 3.87 seconds, and ASR at 7.9 seconds. Eye-tracking data revealed that participants spent significantly more time fixating on ASR captions, indicating greater reading difficulty. Qualitative feedback emphasized the importance of caption "flow" - participants preferred the smoother, more consistent text delivery of collaborative captions over the verbatim but sometimes choppy CART output. Both hearing and DHH participants showed similar preferences. The study also found that deaf readers, accustomed to TV captions, were more tolerant of disfluencies and omissions than hearing readers, suggesting that caption design should consider the target audience's reading experience.
Relevance
This research has significant implications for educational accessibility and demonstrates that professional stenography is not the only path to high-quality real-time captions. The collaborative crowdsourcing approach could dramatically reduce captioning costs while maintaining quality, potentially enabling on-demand captioning services through smartphones and online platforms. The finding that caption "flow" matters as much as accuracy challenges the field to develop better evaluation metrics beyond word error rate. For practitioners, the study validates that non-expert typists working collaboratively can produce captions that DHH users find acceptable, opening possibilities for peer-based captioning in classrooms. The eye-tracking methodology also provides a model for evaluating caption accessibility that goes beyond subjective ratings.
Tags: real-time captioning · deaf and hard of hearing · classroom accessibility · crowdsourcing · eye tracking · CART · ASR · caption readability