Deaf and Hard-of-hearing Users Evaluating Designs for Highlighting Key Words in Educational Lecture Videos

Sushant Kafle, Becca Dingman, Matt Huenerfauth · 2021 · ACM Transactions on Accessible Computing · doi:10.1145/3470651

Summary

This empirical study investigates design preferences for keyword highlighting in video captions among Deaf and Hard of Hearing (DHH) users. While style guides like the APA Publication Manual and Chicago Manual of Style provide recommendations for highlighting text in static documents, these guidelines were never designed for dynamic captions nor validated with DHH readers. The researchers conducted two experimental studies with 72 total DHH participants to systematically compare design parameters. DHH viewers face unique cognitive challenges when reading captions: they must divide visual attention between the caption text, the speaker, and any visual aids like slides, all while the caption pace is controlled by the video rather than the reader. Research shows English literacy rates are lower among deaf adults in the US, and lower-literacy users find fast-moving captions especially challenging. Additionally, captions lose information present in speech—vocal emphasis, emotional tone, and accents—making keyword highlighting a potential way to restore some of this lost emphasis. The studies compared three text decoration styles (underlining, italicizing, boldfacing), two granularity levels (word-level vs. sentence-level highlighting), and two approaches for repeated keywords (highlighting first occurrence only vs. all occurrences). Participants viewed educational lecture videos from the Harper dataset and TED Talks, with keywords manually selected by human annotators after automatic keyword extraction via Microsoft Azure proved insufficiently accurate. Participants responded to Likert-scale questions about readability and ability to identify important concepts.

Key findings

The study produced clear, empirically-grounded recommendations that partially contradict prior guidance. For text decoration, **boldface was significantly preferred** over both italics (p=0.001 for concept identification, p=0.005 for readability) and underlining (p=0.031 for readability). This contradicts the researchers' own prior ASSETS'19 paper, which had recommended underlining based on underpowered formative interviews. Participants reported italics were difficult to perceive, especially those with low vision, and some noted italics could unintentionally convey sarcasm. Underlining was perceived as more distracting than boldface. For granularity, **word-level highlighting was strongly preferred** over sentence-level highlighting. Participants found word-level highlighting significantly better for identifying important concepts (p=0.0001) and easier to follow (p=0.02). Qualitative feedback revealed that sentence-level highlighting felt "irregular"—one participant noted their brain "automatically stopped paying attention" during long unhighlighted stretches between highlighted sentences. Results for repeated keywords were inconclusive, with no significant difference between highlighting only the first occurrence versus all occurrences. Participant comments were split: some felt repeated highlighting "can lose its meaning," while others preferred seeing all occurrences marked. For video genres, educational videos showed significantly higher interest in highlighting than entertainment or news content.

Relevance

This research provides the first empirically-validated guidelines for keyword highlighting specifically designed for DHH caption users. The key practical recommendation is clear: use **boldface, word-level highlighting** when emphasizing important terms in educational video captions. This contradicts traditional static-text style guides that often recommend italics, demonstrating why accessibility guidelines must be validated with the actual target user population. For captioning practitioners and platform developers, the findings suggest that caption highlighting features should default to boldface rather than underlining or italics. The strong preference for word-level over sentence-level highlighting aligns with static-text style guides but provides empirical validation for this recommendation in the dynamic caption context. The study also highlights important limitations in current automatic keyword extraction—the researchers found Microsoft Azure's text analytics produced keywords of "insufficient quality" and had to rely on human annotators. This gap presents an opportunity for AI/ML research to develop better keyword extraction specifically tuned for caption accessibility. Participants expressed concerns about subjectivity in keyword selection and the risk of highlighting wrong words, suggesting that any deployed system would need high accuracy to avoid being counterproductive. The finding that educational videos generated the most interest in highlighting provides clear prioritization guidance for where to deploy this technology first.

Tags: deaf and hard of hearing · captions · captioning · video accessibility · text highlighting · educational accessibility · user study