Evaluating the Benefit of Highlighting Key Words in Captions for People who are Deaf or Hard of Hearing

Sushant Kafle, Peter Yeung, Matt Huenerfauth · 2019 · Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/3308561.3353781

Summary

This paper investigates whether visually highlighting important words in video captions benefits Deaf and Hard of Hearing (DHH) users, through formative studies and a larger evaluation study. DHH users face a unique visual-attention challenge when watching captioned video: they must simultaneously read captions, watch the instructor, view slides, and follow other visual elements — all competing for the same visual channel that hearing users can distribute between auditory and visual processing. Prior research showed that highlighting important words in static text improves reading comprehension and recall, but this had not been studied in the dynamic context of video captions, which appear briefly (2-4 seconds), in short segments (1-2 lines), and while users are engaged in divided attention across multiple visual streams. Two formative studies with 6 DHH participants each investigated design preferences. Round 1 compared seven markup strategies (font color, bold, background color, uppercase, underline, italics, font size) with 20% of words highlighted. Participants preferred underlining as the least distracting yet most noticeable strategy, with bold as a close second. Italics and uppercase were found distracting, and italicizing was harder to read. Round 2, using underlining only, compared four percentages of highlighted words (5%, 15%, 25%, 35%). Participants preferred 5-15% of words highlighted, finding 5% most readable and 15% easiest to follow along with.

Key findings

The main evaluation study with 30 DHH participants (17 Deaf, 13 Hard of Hearing; mean age 25, SD 6.02; WRAT reading score 82.6 — one SD below national average) compared online educational lecture videos with and without underline-highlighted important words (15% of words). Words were identified as important by consensus of three researchers. Each participant viewed 12 video segments (1.5 minutes each) from four lecture topics, alternating between highlighted and non-highlighted conditions (counterbalanced). Across 270 responses, statistically significant differences were found on all subjective measures except temporal demand. Highlighted captions were rated significantly easier to follow (mean 3.9 vs 3.32, p<0.0001), easier to read (mean 4.09 vs 3.53, p<0.0001), better for identifying important words and concepts (mean 4.18 vs 3.05, p<0.0001), and more understandable overall (mean 3.9 vs 3.5, p<0.001). NASA TLX measures showed significantly lower mental demand (p<0.01) and lower effort (p<0.001) with highlighting. Temporal demand did not differ significantly — participants felt equally comfortable with the video pace in both conditions, suggesting highlighting reduces cognitive load without affecting time pressure perception. Importantly, despite prior research showing DHH users are sensitive to text-appearance changes in captions and concerned about distraction, participants in this study did not find the highlighting distracting — contradicting concerns from earlier work on ASR confidence-score visualization in captions.

Relevance

This research addresses a significant gap in caption accessibility for DHH users. While captioning is legally required in many contexts, the quality of the captioning experience varies enormously. DHH users of educational videos face a fundamentally different cognitive challenge than hearing students: they cannot passively absorb spoken information while watching visual content, but must actively divide their gaze between captions and all other visual elements. Any enhancement that reduces the cognitive burden of caption reading has direct educational implications, as DHH students in mainstream educational settings achieve less access to content than their hearing peers. For accessibility practitioners, the paper provides concrete, empirically-validated design guidance: use underlining (not bold, italics, or color) to highlight 5-15% of caption words identified as important. The finding that highlighting reduced perceived mental demand and effort while improving readability and comprehension — without creating distraction — suggests this is a low-risk enhancement. The research is timely given the growth of online educational video: platforms like Coursera, Khan Academy, and university lecture recordings could implement automatic caption highlighting using word-importance prediction systems. The multi-phase methodology (formative preference studies to select design parameters, then a larger summative evaluation) provides a responsible template for caption enhancement research.

Tags: Deaf and hard of hearing · captioning · text highlighting · educational accessibility · video accessibility · reading comprehension · cognitive load · user study · visual attention · natural language processing