Enhancing Non-Speech Information Communicated in Closed Captioning Through Critical Design

May, Lloyd, Park, So Yeon, Berger, Jonathan · 2023 · Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/3597638.3608398

Summary

This paper investigates the communication of non-speech information (NSI) in closed captioning for d/Deaf and hard of hearing (DHH) viewers through a mixed-methods study combining needfinding interviews, prototype development, surveys, and evaluation interviews with 20 DHH participants from the USA and South Africa. NSI encompasses environmental sounds, music, and sound effects in video content — the sounds typically described in brackets like "[ominous music]" or "[footsteps]" in captions. The researchers first conducted needfinding interviews with six DHH participants to identify frustrations with current NSI captioning, revealing four key issues: inconsistency in what gets captioned and how across platforms; absence of temporal information (e.g., when music changes intensity, starts, or stops); genre-specific problems (e.g., tension in horror movies poorly communicated); and unclear narrative importance of captioned sounds. Using a critical design framework — a research methodology that foregrounds design ethics and reveals hidden values — the team developed an Audio-Reactive Animated Overlay (ARAO) prototype: two animated lines on either side of the video screen that dynamically respond to audio features including beat onset, loudness, spectral roughness, and frequency content, mapping these to visual properties like line movement, thickness, color offset, and graininess. The prototype was built in Unity and tested at two sensitivity levels (low and high reactivity) across six 15-second video clips representing three tension profiles (stagnant, gradually changing, suddenly changing) from documentary content.

Key findings

Survey results revealed overwhelmingly negative perceptions of current NSI captioning quality: the average agreement score for "NSI captioning is overall of high quality" was just 1.9 out of 5 (sigma = 0.8). Participants ranked the most important NSI characteristics to communicate as: type of sound (ranked first most often), exact timing, volume/loudness, and sound location, with timbre/sound color ranked least important. The ARAO prototype received mixed but informative responses. No statistically significant differences in ease of understanding, distraction, emotion communicated, or emotion felt were found between caption-only and ARAO conditions or between sensitivity levels. However, qualitative interviews revealed nuanced perspectives: the prototype showed particular promise in communicating temporal aspects of NSI that captions cannot convey, such as rising tension in music before a jump-scare or the rhythm and loudness of sounds over time. Many participants (N=8) found the ARAO most suitable for specific genres — horror movies, action/sci-fi, romantic movies, and narrative-driven content — rather than as a universal solution. A strong theme of customization emerged: the average agreement score for wanting to customize caption visual design was 4.6/5 (sigma = 0.6), and participants wanted control over ARAO color, placement, thickness, and sensitivity, as well as the ability to toggle it on/off by genre. The study identified three phases of NSI communication design: Selection (what NSI to communicate), Curation (what properties of the selected NSI to convey), and Communication (how to present the curated information), proposing this as the Selection, Curation, and Communication (SCC) framework.

Relevance

This research challenges the long-standing one-size-fits-all approach to NSI captioning and provides evidence that the current model — fixed text descriptions in brackets — fails to meet DHH users needs. The finding that NSI captioning quality is rated so poorly (1.9/5) despite being essential for narrative comprehension should concern every content creator and streaming platform. For accessibility practitioners, the Selection, Curation, and Communication (SCC) framework offers a practical design tool for thinking about NSI captioning decisions in any media context. The strong demand for customization (4.6/5) suggests that future captioning systems should allow users to control not just font size and color of speech captions, but also the level of detail, style, and modality of NSI communication — potentially including visual overlays alongside or instead of text descriptions. The international scope of the study (including South African participants using SASL) also highlights how captioning access varies globally, with some regions lacking captions entirely for important content like government addresses. While the ARAO prototype itself needs further refinement, the underlying insight — that abstract visual representations can communicate temporal and emotional qualities of sound that text cannot — opens new design directions for multimodal captioning.

Tags: captioning · non-speech information · deaf and hard of hearing · critical design · sound communication · media accessibility · customization · video accessibility

Standards referenced: WCAG 2.0 · FCC Captioning Requirements