Understanding Social and Environmental Factors to Enable Collective Access Approaches to the Design of Captioning Technology

Emma McDonnell · 2022 · Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '22) · doi:10.1145/3517428.3550417

Summary

This doctoral consortium paper presents a dissertation research program that reimagines how captioning technology should be designed by applying the disability justice principle of collective access — the idea that accessibility is a shared responsibility of all group members, not solely the concern of disabled individuals. Emma McDonnell, a hearing researcher at the University of Washington, argues that HCI accessibility research too often designs technology to help Deaf and disabled people navigate existing (inaccessible) social contexts, rather than designing technology that changes those contexts by engaging hearing and nondisabled people as active participants in creating access. The dissertation comprises four studies spanning different captioning contexts. The first (completed, published at CSCW 2021) interviewed 15 DHH captioning users about their experiences in small-group conversations, finding that social dynamics — such as whether hearing interlocutors adapt their behavior — profoundly shape whether captioning actually provides access. The second study (completed) used co-design sessions with mixed groups of DHH and hearing participants (13 total across 3 groups) to collaboratively design online captioning features, using a three-session protocol involving gameplay with auto-captions, group interviews, sketching, and video prototype evaluation. The third (proposed) will systematically analyze TikTok captioning practices and their impact on DHH users, examining how platform-specific factors like algorithmic censorship and user-generated caption styles affect accessibility. The fourth (proposed) will explore professional CART captioners' experiences and compare human and automatic captioning approaches.

Key findings

From the completed first study, a critical finding was that captioning is often poorly suited to interactive conversations (as opposed to one-way lectures) due to lag, overlapping speech, and inability to convey signed language. Social dynamics were found to be as important as technical quality — hearing interlocutors' willingness to adapt their behavior (speaking clearly, taking turns, checking comprehension) directly determined whether captioning provided meaningful access. Online conversations introduced new barriers (lack of spatial information about speakers) but also new opportunities (stronger turn-taking norms, omnipresent text channels). From the co-design study, mixed groups had varying levels of planning around accessibility — some trusted DHH members would raise concerns, while others established explicit access approaches. Participants consistently wanted speaker identification, overlap alerts, and feedback to hearing participants about their speech rate and volume. A key insight was that while technology cannot force behavioral change, it could provide information and structure to guide people toward more captioning-friendly norms. Participants valued minimizing visual and cognitive overload, user control and customizability, and using established interaction paradigms. The TikTok study will address a unique collective access problem: since captioning on TikTok depends on majority-hearing creators, and the platform's censorship practices actively disincentivize accurate captioning through algorithmic penalties.

Relevance

This research fundamentally reframes how accessibility practitioners should think about captioning — and accessible technology more broadly. Rather than treating captioning as a tool that helps DHH individuals bridge a gap, McDonnell positions it as a shared communication infrastructure that all participants have a stake in. This has practical implications: captioning tools designed only for DHH users miss the opportunity to nudge hearing participants toward more accessible communication behaviors (speaking clearly, not overlapping, checking that captions are accurate). The collective access framework is particularly relevant for organizations implementing captioning in meetings and events. Instead of simply providing captions and considering the job done, the research suggests designing systems that give real-time feedback to all participants — such as alerts when speech rate is too fast or when speakers overlap — making access a visible, shared responsibility. The TikTok component highlights how platform design choices (like algorithmic censorship that penalizes certain words) can create cascading accessibility barriers, relevant for anyone working on social media accessibility. The emphasis on co-design with mixed DHH and hearing groups offers a methodological model for developing accessibility features that serve entire communities rather than individual users.

Tags: captioning · collective access · disability justice · deaf and hard of hearing · co-design · social accessibility · automatic speech recognition · social media accessibility