Accessibility for the HTML5 <video> element
Silvia Pfeiffer, Conrad Parker · 2009 · Proceedings of the 2009 International Cross-Disciplinary Conference on Web Accessibililty (W4A) · doi:10.1145/1535654.1535679
Summary
This paper documents early efforts to standardize how subtitles, captions, and other accessibility data should be associated with the then-new HTML5 <video> element. Written in 2009 when Firefox, Opera, and Safari were just beginning to implement <video> support, the paper addresses a critical gap: while browsers were adding native video playback, no agreed mechanism existed for including accessibility data — even basic subtitles, let alone audio descriptions or sign language. The work was funded by a Mozilla grant and developed through discussions within the Xiph accessibility group and the WHATWG mailing list. The paper presents three existing JavaScript-based implementations for synchronizing SRT (SubRip) subtitle files with <video> elements: Jan Gerber's jQuery.SRT approach using HTML5 custom data attributes, a proposed <text> sub-element syntax implemented by Michael Dale at Metavid.org supporting multiple languages and categories, and Philippe Le Hegaret's W3C adaptation using DFXP (Timed Text) with the proposed syntax. All three approaches map time-aligned text to HTML <div> elements with CSS styling, making subtitles part of the web page DOM.
Key findings
The paper identifies two fundamentally different approaches to video accessibility data: external text files (preferred for web applications that need editable, database-stored captions) and embedded tracks within binary video files (needed for audio descriptions, sign language, and portable file exchange). The initial <text> sub-element proposal encountered two problems: a naming conflict with SVG's <text> element and security concerns from mapping external files into HTML. The proposed solution was an <itext> element with a secure browsing context similar to an iframe, rendering time-aligned text as a transparent overlay isolated from the main page. For the Ogg container format, the paper describes multi-track capabilities for audio descriptions (using Speex codec) and sign language video (using Theora), with Ogg Kate as an overlay codec for embedded subtitles. The paper notes that both Mozilla and Google (Chrome) were converging on SRT support as a first step, with both teams focusing their implementation efforts on this simplest format. The authors envision time-aligned text extending well beyond subtitles to include karaoke text, ticker text, music lyrics, chapter markers, transcripts, and clickable annotations.
Relevance
This paper captures a pivotal moment in web video accessibility history — the period when native browser video was being standardized but accessibility was at risk of being left out of the specification. The work directly influenced the development of what eventually became the HTML5 <track> element and the WebVTT format, which are now the standard mechanisms for providing captions, subtitles, descriptions, and chapters for web video. While the specific proposals described here (the <text> and <itext> elements, Ogg Kate) were not adopted in their exact form, the core architectural decisions they explored — external vs. embedded tracks, secure rendering contexts, category-based track types, multiple language support — all shaped the final standard. For practitioners today, the paper provides valuable historical context for understanding why the <track> element works the way it does. The emphasis on open, patent-free formats (Ogg, Speex, Theora) reflects an important strand of the web standards movement that influenced HTML5's approach to media accessibility.
Tags: video accessibility · HTML5 · captioning · subtitles · web standards · timed text · open source · Ogg
Standards referenced: HTML5 · DFXP · SRT · Ogg