← All reviews

aiBrowser for multimedia: introducing multimedia content accessibility for visually impaired users

Hisashi Miyashita, Daisuke Sato, Hironobu Takagi, Chieko Asakawa · 2007 · Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '07) · doi:10.1145/1296843.1296860

Summary

A companion paper to Sato et al.'s Flash-transcoding work, this 2007 paper from the same IBM Tokyo Research Lab team introduces aiBrowser — an accessible web browser purpose-built for multimedia-heavy sites of the era (ABC News Video, YouTube, Disney). The authors identify two problems that screen reader users faced with Rich Internet Applications built in DHTML and Flash. First, streaming audio (often auto-playing ads) drowned out the screen reader's own synthesized speech, and the system had only one global volume control: turning the page quiet also silenced the screen reader. Second, RIA interfaces relied on custom DIV/SPAN widgets, mouse-only controls, and DHTML updates that no screen reader of the time could follow. aiBrowser addresses these with two mechanisms. The first is a set of fixed browser-level shortcut keys (Ctrl-P play, Ctrl-S stop, Pause, Ctrl-J/K volume up/down, Ctrl-M mute) that reach directly into the embedded media object regardless of its type — Flash, Windows Media Player, RealPlayer, or QuickTime — by exposing a unified DOM-style interface over the base browser's runtime. The second is 'Fennec', an XML metadata format that sits alongside a page and describes an alternative, simplified UI for screen readers. Fennec uses XPath, ID, and Flash target-path queries to reach into both HTML and Flash content, attaches ARIA roles, regroups visually fragmented sections, and re-evaluates itself incrementally as the DHTML page mutates. The approach anticipates many of the ideas later formalised in WAI-ARIA.

Key findings

The authors benchmarked aiBrowser against JAWS 8.0 on eight tasks across ABC News, YouTube, and Disney. With JAWS alone, three of the eight tasks were flat-out impossible — including muting the video on YouTube and anything at all on Disney's fully-Flash homepage — because the Flash content either ran in windowless mode (invisible to MSAA) or had no alternative text on its buttons, leaving JAWS announcing only '6 Button, 8 Button'. aiBrowser completed every task. Keystroke counts dropped dramatically: the YouTube 'search IBM and play a Linux video' task fell from 210 novice keystrokes with JAWS to 48 with aiBrowser, and from 22 advanced keystrokes to 14. Media controls that were impossible in JAWS collapsed to a single keystroke in aiBrowser. The authors also observed that JAWS advanced-mode users relied on obscure navigation commands (M for frame, G for graphic) that require foreknowledge of a site's DOM structure — a hidden expertise tax that Fennec's heading-based landmarks eliminated. One honest caveat: when the core task was 'pick one item from a long list', even the best metadata can't save you from linear navigation.

Relevance

This paper is a direct ancestor of WAI-ARIA and the modern accessibility-overlay market, written before either existed. Fennec's queries-into-another-document model prefigures how today's accessibility remediation services inject ARIA attributes from external configuration, and the paper's honest accounting of where the approach breaks (radical DHTML re-layouts, long list navigation, sites whose DOM structure changes faster than the metadata) still applies. The browser-level media-control shortcuts are also worth noting: nearly twenty years later, autoplay video that drowns out screen readers remains a live accessibility complaint, and browsers still do not offer the kind of global, per-page media controls aiBrowser shipped in 2007. For practitioners, the paper is a useful reminder that audio interference is an accessibility issue in its own right, not merely a usability annoyance, and that metadata-driven remediation only works when the site's structure is stable enough for the metadata to stay in sync.

Tags: multimedia · streaming media · flash · DHTML · rich internet applications · screen readers · alternative text · metadata · visual impairment · web accessibility · audio interference · keyboard navigation

Standards referenced: WAI-ARIA · MSAA · XPath · XHTML 2