Captions Versus Transcripts for Online Video Content

Raja S. Kushalnagar, Walter S. Lasecki, Jeffrey P. Bigham · 2013 · Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/2461121.2461142

Summary

This paper investigates a fundamental challenge for deaf and hard of hearing (DHH) users consuming online video: while hearing viewers can watch and listen simultaneously using separate sensory channels, DHH viewers must process two competing visual streams — the video content and the text captions — using a single visual channel. This creates cognitive overload, particularly for educational content like MOOCs where the video contains dense visual information (slides, formulas, demonstrations) that must be attended to alongside captions. The authors compare two presentation styles: traditional on-screen captions (1-2 lines overlaid on the video, showing 1-2 seconds of audio) and off-screen transcripts (multiple lines displayed separately from the video, providing several seconds of content history). The study recruited 17 DHH college students aged 18-24, all experienced with both captions and transcripts, who watched four YouTube clips split into caption and transcript segments — three educational videos (chemistry lecture, optical illusion, software demonstration) and one action video (bike race). Participants rated each on readability, ease of following, and ease of understanding using 5-point Likert scales, and provided open-ended feedback.

Key findings

On-screen captions were significantly preferred for readability (Z = 144.5, p < 0.001), but there was no significant difference between captions and transcripts for following the video (Z = 342.0, p = 0.114) or understanding the content (Z = 290.0, p = 0.243). This nuanced result reveals an important insight: although captions were easier to read (being closer to the video), transcripts compensated by providing content history — allowing viewers to look back at words that stay on screen up to eight times longer than captions. Eye-tracking research cited in the paper found that viewers spend 84% of their time looking at captions versus only 68% on transcripts, suggesting transcript users can devote more visual attention to the video itself. Participant feedback highlighted three key issues: captions were hard to read against varying video backgrounds ("make the color of the font yellow so it easier to read"); captions moved too fast for dense content ("I cannot understand captions when they are too fast"); and transcript text was not as prominent as captions ("I want to see the transcript text be bigger and easier to find"). Participants with dense educational content showed a slight preference for transcripts in following and understanding, though this was not statistically significant.

Relevance

This study has direct implications for the rapidly growing online education industry. The finding that standard on-screen captions — the overwhelmingly dominant accommodation for DHH users — may not be optimal for educational content challenges a fundamental assumption in video accessibility. For MOOCs, lecture recordings, and technical tutorials where viewers need to simultaneously attend to visual demonstrations and read text, providing transcript options alongside or instead of traditional captions could improve learning outcomes. The paper anticipates the need for personalized, adaptive captioning interfaces that adjust based on content complexity and the degree of simultaneous visual and textual information. For accessibility practitioners, the key takeaway is that caption compliance alone does not ensure an effective learning experience — the presentation method matters as much as the presence of captions. Modern video platforms should offer viewers choice between caption styles, and educational content creators should consider whether their visual content creates cognitive overload when combined with standard captions.

Tags: captioning · deaf accessibility · transcripts · online education · MOOCs · video accessibility · cognitive overload · multimedia accessibility