Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students

Saba Kawas, George Karalis, Tzu Wen, Richard E. Ladner · 2016 · Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16) · doi:10.1145/2982142.2982164

Summary

This paper takes a holistic, qualitative approach to understanding deaf and hard of hearing (DHH) university students' experiences with real-time captioning in mainstream classrooms, examining both human-based captioning (CART — Communication Access Realtime Translation) and machine-based solutions using automatic speech recognition (ASR). While previous research has focused primarily on captioning accuracy and technical performance, this study investigates the full user experience including setup, reliability, autonomy, and presentation. The researchers employed multiple qualitative methods at the University of Washington: five in-class observations with two DHH students, a weeklong diary study with two participants, semi-structured interviews with five DHH students plus two professors, two disability services coordinators, and one CART captioner, and usability evaluations of Skype Translator as an ASR tool with three DHH students. Data was analyzed through affinity diagramming and thematic analysis. The study also included a 2.5-hour co-design workshop with eight stakeholders (three DHH university students, a CS professor, two technology providers including one who is deaf, plus a CART captioner and two ASL interpreters for accommodation).

Key findings

Accuracy and reliability remained the top concerns across all captioning solutions, but the study uncovered significant user experience issues beyond accuracy. CART limitations included: captioners arriving late (one student reported her captioner was always 5 minutes late walking across campus), captioner unavailability when sick or on vacation requiring complex remote setup by the student, equipment placement dictating where students must sit (near power outlets), and students having no control over caption presentation (font, size, scrolling, brightness). Students could not independently scroll back through transcripts during class. ASR via Skype Translator offered greater student autonomy and the ability to scroll back, but suffered from lower accuracy (typos, incorrect grammar, extraneous words, unnatural breaks), complex two-laptop setup, single-microphone limitation (missing other students' questions), and inability to capture multiple speakers. Both solutions were affected by professor speaking speed, speaker accents, background noise, and the fundamental attention-splitting problem of dividing focus between the speaker, lecture slides, and caption screen. The co-design workshop validated these findings and added that screen brightness was distracting in dark lecture halls. Participants generated 14 prioritized design requirements and six recommended features including: student control over caption display, subject-specific dictionaries uploadable by students and teachers, automatic microphone detection, error communication with troubleshooting guidance, instructor guidance resources, and post-class transcript availability.

Relevance

This research is significant for reframing real-time captioning from a purely technical accuracy problem to a holistic user experience challenge. For accessibility practitioners, it reveals that even the gold-standard CART service has substantial usability problems — captioner scheduling logistics, equipment constraints on seating, and lack of student control over presentation all limit the DHH student's autonomy and classroom experience. The study's finding that students desire independence and solutions they can operate without coordinating with others is a crucial design principle: the best-performing technology is not necessarily the best user experience if it requires complex dependencies on other people. The 14 design requirements and six feature recommendations provide an actionable framework for improving any real-time captioning tool. The methodology — combining observations, diary studies, interviews, and co-design across multiple stakeholder groups — is also a model for how to evaluate access technology holistically rather than focusing narrowly on technical metrics.

Tags: deaf and hard of hearing · real-time captioning · CART · automatic speech recognition · education · co-design · inclusive design · captioning · classroom accessibility