Redesigning Educational Videos for Deaf and Hard-of-Hearing Learners

Si Chen, Haocong Cheng, Suzy Su, Lu Ming, Sarah Masud, Qi Wang, Yun Huang · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3791470

Summary

Educational videos have exploded in higher education and online learning, but accessibility guidance for d/Deaf and Hard-of-Hearing (DHH) learners has barely moved beyond captions and transcripts. Chen and colleagues argue this is a theoretical gap as much as a practical one: Mayer's Cognitive Theory of Multimedia Learning assumes a dual-channel (visual + auditory) processing model, yet DHH learners who rely primarily on vision are forced to process captions, on-screen text, speaker, diagrams, and animations through a single overloaded channel. The paper proposes motion-driven design — using animation, timing, typography, and overlays to re-sequence visual information — as a way to adapt multimedia learning theory for single-channel learners. The authors conducted a three-phase study around a 15-minute Coursera video on Augmented Reality. Phase 1 was a formative study with six DHH college students and three DHH-experienced instructors, who annotated the video with 105 challenge-suggestion pairs across nine participants; thematic analysis yielded four recurring delivery challenges and four corresponding design ideas. Phase 2 evaluated the design ideas with 16 DHH learners using 17 original/revised video-clip pairs, NASA-TLX cognitive load measures, and Learning Cognitive Demand Likert scales from MLT. Phase 3 asked six DHH educators to judge how each idea applied to talking-head, screencast, animation, hand-drawing, programming, interview, recorded classroom, and slide-based video types.

Key findings

The four design ideas are D-Illustrate (replace irrelevant/monotonous visuals such as static talking-heads with relevant graphics), D-Align (use overlays, shading, and sequencing to sync visuals to the caption moment and prevent split attention), D-Declutter (use typography to emphasise essential on-screen text and fade irrelevant text rather than removing it), and D-Slowdown (insert micro-breaks and slow fast visuals to allow caption reading). Mixed linear model analysis of the TLX data (n=16) showed D-Illustrate produced the strongest, most consistent gains - significant reductions in mental demand, physical demand, and temporal pressure, plus significantly increased learning satisfaction (EMM +1.15, p<.001). D-Align and D-Declutter improved satisfaction only; D-Slowdown produced no significant TLX changes and was often perceived as unnatural, especially for talking-heads and speech-driven scenes. On LCD scales, D-Illustrate was strongest for Focusing on Essential Information (M=5.50) and Fostering Connection between Text and Image (M=5.46). Educators in Phase 3 mapped the ideas to a 2x2 visual-load x textual-load taxonomy: D-Illustrate fits low-visual contexts, D-Align fits visually complex contexts (slides, coding, hand-drawing), D-Declutter fits text-heavy contexts, and D-Slowdown fits high-textual-load contexts but must be applied selectively. Some deaf signers reported no synchronisation benefit, highlighting individual residual-hearing differences.

Relevance

For practitioners producing educational or training video, this paper reframes DHH accessibility as more than caption quality. The single-channel reality means even a perfectly captioned talking-head can overwhelm a DHH learner because the caption, slide text, and speaker all compete for one visual channel. The 2x2 load taxonomy in Table 4 is directly actionable: match the design technique to the visual and textual load of the content rather than applying one rule everywhere. Be cautious with uniform slowdowns - learners found them unnatural and infantilising. Limitations: a single AR video, a predominantly culturally-Deaf signing sample, no learning-outcome measures, and US university context. The findings likely don't generalise to mobile/TikTok-style content, DeafBlind learners (colour typography is inaccessible via braille), or DHH learners with low literacy. The work opens a clear research line on AI-assisted video editing for accessibility, though current generative tools were judged inadequate for the required precision.

Tags: deaf and hard of hearing · video accessibility · captioning · multimedia learning · cognitive accessibility · educational technology · motion design · inclusive design

Standards referenced: WCAG · Universal Design for Learning