Understanding How Deaf and Hard of Hearing Viewers Visually Explore Captioned Live TV News

Akhter Al Amin, Saad Hassan, Sooyeon Lee, Matt Huenerfauth · 2023 · Proceedings of the 20th International Web for All Conference (W4A '23) · doi:10.1145/3587281.3587287

Summary

This study investigates how Deaf and Hard of Hearing (DHH) viewers distribute their visual attention across different information regions on screen while watching captioned live television news. More than 360 million people worldwide are DHH, and many rely on captions to access television content. However, captions can block important onscreen visual information — a particular problem during news broadcasts, which feature dense layouts with multiple simultaneous information regions such as headlines, scrolling news tickers, speaker faces, over-the-shoulder graphics, and channel logos. Prior research had proposed general caption placement guidelines but had relied primarily on DHH viewers' subjective ratings of region importance rather than direct behavioral measures of where viewers actually look. This study used a Tobii Pro Nano eye tracker to record the gaze behavior of 19 DHH participants (16 D/deaf, 3 hard of hearing) as they watched 28 news video clips from 9 TV channels. The researchers carefully annotated 11 distinct information regions in each video frame and placed captions in controlled locations (upper or lower third of the screen) that did not occlude other regions, isolating attention patterns from occlusion effects. Participants also provided qualitative feedback through semi-structured interviews conducted at two points during the study, explaining what influenced their viewing behavior and which regions they considered most important.

Key findings

The eye-tracking data revealed that the importance ranking of information regions based on actual gaze behavior differed significantly from rankings based on subjective judgments in prior work. For instance, the speaker's face received the highest proportional fixation time, followed by the discussion topic and over-the-shoulder video — whereas prior subjective studies had ranked scrolling news as most important. The researchers identified four distinct temporal attention patterns: Group 1 (discussion topic, scrolling news, over-the-shoulder text) showed a peak at the video's start followed by slowly decreasing sustained attention; Group 2 (speaker's face, listener's face, over-the-shoulder video/animation) maintained sustained attention throughout; Group 3 (speaker's name, program title, speaker's location) drew low attention with occasional brief peaks; and Group 4 (channel logo, time and temperature) received very low attention overall. Qualitative analysis revealed that static textual content was scanned early and attention diminished once read, while dynamic visual content like faces and animations maintained continuous attention due to conveying emotion, body language, and ongoing context.

Relevance

This research has direct practical implications for captioning professionals, broadcasters, and developers of automated caption placement systems. The four-group temporal framework provides concrete guidance: during the first few seconds of a news segment, captioners should especially avoid blocking Group 1 regions; Group 2 regions (faces, animations) should never be blocked as they receive sustained attention throughout; Group 3 regions can tolerate brief occlusion between caption blocks; and Group 4 regions are lowest priority. The study demonstrates that subjective importance ratings alone are insufficient for understanding DHH viewers' actual information needs — behavioral eye-tracking data reveals different priorities. This finding is relevant for policy makers developing captioning standards, as current guidelines from bodies like the FCC and DCMP do not address the temporal dimension of region importance during information-dense video. The work also motivates the development of more sophisticated caption quality metrics that penalize occlusions differently based on when they occur during a video.

Tags: deaf and hard of hearing · captioning · eye tracking · television accessibility · caption placement · gaze behavior · live captioning · news consumption · attention · caption quality metrics

Standards referenced: DCMP Captioning Key · BBC Mobile Accessibility Guidelines