← All reviews

Site-Wide Annotation: Reconstructing Existing Pages to Be Accessible

Hironobu Takagi, Chieko Asakawa, Kentarou Fukuda, Junji Maeda · 2002 · Proceedings of the Fifth International ACM Conference on Assistive Technologies (Assets '02) · doi:10.1145/638249.638265

Summary

This paper from IBM Tokyo Research Laboratory presents a system for making inaccessible web pages accessible through external annotations, without modifying the original pages. The core problem is "page fragmentation" — on visually designed web pages, different types of content (news, advertisements, navigation) are scattered across the page and visually grouped using colours, spacing, and images, but these visual groupings have no structural representation in the HTML. Screen reader users must therefore listen to a linear stream of mixed content with no way to identify or skip between groups. The Accessibility Transcoding System intercepts web pages as a proxy, retrieves corresponding annotation files from a database, and applies transformations such as rearranging content groups by importance, inserting textual delimiters between groups ("Group 4 Press releases"), adding page indexes for direct access to each section, inserting missing ALT text, and adding skip-to-main-content links. The annotations are stored as XML files containing XPath expressions that identify specific DOM nodes, along with metadata about each group's title, role, and importance. The system had been publicly available since March 2001 with over 100,000 users.

Key findings

The key innovation was Dynamic Annotation Matching, an algorithm that automatically applies annotations to pages they were not specifically authored for. The algorithm analyses the DOM tree structure of annotation files to extract "layout patterns" — tree structures of XPaths with common root nodes — and matches these patterns against target pages. This means an annotator can create one annotation file that works across many pages sharing the same layout template, rather than creating annotations page-by-page. In a feasibility experiment annotating the USA Today website (approximately 7,885 cached pages within 4 levels of the domain), a single expert annotator created 245 annotation files in 30 hours and 20 minutes, with each file covering an average of 32.2 pages. One annotation file for article pages covered approximately 1,000 pages. The Site Pattern Analyzer (SPA) tool supported the process by loading thousands of cached pages, analysing matches, visualising matching status, and offering semi-automatic correction when XPaths pointed to incorrect content. Of 2,684 pages in the most-annotated directory ("sports"), only 42 were hand-edited and only 20 remained unmatched. Applied site-wide, 58 pages were unmatched and 12 had double matches out of the full corpus.

Relevance

This research from IBM Japan, led by the team behind the influential IBM Home Page Reader, tackled a fundamental and persistent challenge in web accessibility: the vast majority of web content is created without accessibility in mind, and waiting for all authors to fix their pages is impractical. The annotation-based transcoding approach offered an alternative by letting third parties add accessibility metadata externally. The Dynamic Annotation Matching algorithm was particularly significant because it addressed the scalability problem — major news sites generate thousands of pages from templates, and annotating them individually would be impossible. The concept of extracting layout patterns from DOM structures anticipated modern approaches to web scraping and automated accessibility remediation. While the specific technology has been superseded by advances in web standards (semantic HTML, ARIA) and automated testing, the core idea of third-party accessibility overlays remains active and controversial in the accessibility community today, with companies offering similar proxy-based solutions. The paper's estimate that 80-90% of annotation effort could be eliminated through automation foreshadowed the growing role of AI in accessibility remediation.

Tags: web accessibility · web transcoding · web annotation · screen reader · blindness and low vision · DOM · XPath · content adaptation

Standards referenced: Section 508 · WCAG