Rescribe: Authoring and Automatically Editing Audio Descriptions
Amy Pavel, Gabriel Reyes, Jeffrey P. Bigham · 2020 · Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST '20) · doi:10.1145/3379337.3415864
Summary
This paper introduces Rescribe, a tool that helps authors create and refine audio descriptions for videos. Audio descriptions make video content accessible to blind and visually impaired viewers by narrating important visual information during gaps in the existing audio track. A key challenge is fitting all necessary descriptive content into the limited silent gaps without overlapping dialogue or important sounds. Professional audio description production requires specialized teams of script writers, audio engineers, voiceover actors, and producers, making it expensive and inaccessible to most content creators. Rescribe addresses this by letting authors first draft all desired descriptions without worrying about timing constraints, then using a dynamic programming algorithm to optimize the placement and length of descriptions within available gaps. The system uses NLP techniques including sentence compression (removing adjectives, prepositional phrases, and clauses from parse trees) and extractive summarization to automatically shorten descriptions that are too long. It scores candidate descriptions on coherence (using GPT-2 language model probability), informativeness (prioritizing video-specific vocabulary over common words), and edit quality (minimizing audio cuts). Rescribe also introduces a novel form called extended-inline descriptions, which slightly lengthen the source audio track by looping music segments to accommodate more description content without the jarring pauses of traditional extended descriptions.
Key findings
In interviews with 7 blind and visually impaired audio description users, all participants preferred inline and extended-inline descriptions over extended descriptions, which they found disruptive to the flow of the video. Extended-inline descriptions were received with particular enthusiasm as they combined the benefits of extended descriptions (more content) with inline descriptions (preserving video flow). In a comparative user study with 8 novice audio description creators, Rescribe eliminated placement errors entirely — reducing overlapping descriptions from an average of 45% with Final Cut Pro to 0% with Rescribe. Using Final Cut Pro, 6 of 8 participants failed to fit their descriptions within available gaps, producing overlapping content that made the source audio difficult to understand. Rescribe users produced descriptions with a similar number of words (96 vs 80 mean) but achieved significantly better placement. Professional audio describers who reviewed Rescribe expressed enthusiasm about the optimization for placement and shortening, noting it could replace their existing scripting software while automating tedious iteration.
Relevance
This work directly addresses a major barrier to video accessibility: the difficulty and cost of producing audio descriptions. By automating the most technically challenging aspects of audio description production — timing, placement, and compression — Rescribe lowers the barrier for novice describers and could dramatically increase the availability of described video content. The tool is especially relevant as user-generated video platforms like YouTube lack built-in audio description support, and only about one-third of newly produced traditional media includes descriptions. The extended-inline description format represents a genuinely novel contribution that could influence future audio description standards and player implementations. For accessibility practitioners, this research validates that algorithmic approaches can meaningfully assist human describers without replacing creative judgment, offering a model for human-AI collaboration in accessibility content creation.
Tags: audio description · video accessibility · blind and low vision · NLP · sentence compression · authoring tools · media accessibility
Standards referenced: WCAG 2.0