Towards One World Web with HearSay3

Yevgen Borodin, Jeffrey P. Bigham, Amanda Stent, I. V. Ramakrishnan · 2008 · Proceedings of the 2008 International Cross-Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/1368044.1368074

Summary

This short paper presents HearSay 3, a self-voicing non-visual web browser developed at Stony Brook University and the University of Washington, designed to address accessibility challenges introduced by Web 2.0. Building on two previous versions — the original HearSay that segmented pages into semantically meaningful parts, and version 2 that used link context to identify relevant page segments — HearSay 3 introduces three key capabilities. First, transparent multilingual support that detects languages on web pages using the XHTML lang attribute and automatically switches between appropriate text-to-speech engines, enabling seamless reading of multilingual content like Wikipedia pages. Second, a collaborative labeling system that allows blind users to assign descriptive labels to any web page element using keyboard shortcuts or voice commands, with labels stored in both local and shared remote repositories so that one user's annotations benefit the entire community. Third, a dynamic content handling system that detects AJAX and DHTML updates and provides a non-disruptive interface for reviewing changes. The paper frames these features within the global context of web accessibility, noting that over 45 million blind people worldwide face compounded barriers from both inaccessible websites and the lack of screen readers in many languages.

Key findings

HearSay 3's collaborative labeling approach represents an important design insight: rather than relying on web developers who have minimal personal stake in accessibility, the system aligns incentives by letting blind users improve their own browsing efficiency while simultaneously contributing labels that help others. Labels are anchored to page elements using XPath and element-specific attributes, with personal labels taking precedence over shared ones. The system also envisions using user-contributed labels as training data for machine learning algorithms that could automatically detect and label similar unlabeled elements, reducing the manual burden over time. For dynamic content, HearSay 3 uses a mixed-initiative design — playing an earcon to notify users of updates without interrupting their current activity, then letting them review changes at their convenience and return to their original location. This approach works independently of WAI-ARIA markup, meaning it can handle dynamic pages even when developers haven't implemented accessibility standards. The paper also notes that observations of browsing patterns showed blind users tend to avoid dynamic web pages entirely, highlighting the severity of the accessibility gap that Web 2.0 introduced.

Relevance

HearSay 3 anticipated several ideas that became central to modern web accessibility practice. The collaborative labeling concept prefigured crowdsourced accessibility efforts and the recognition that end users — not just developers — can contribute to making the web more accessible. The mixed-initiative approach to dynamic content notifications influenced how screen readers eventually handled live regions, balancing awareness of updates with user control. The multilingual support addressed a global accessibility gap that remains significant: most screen reader development has concentrated on a handful of major languages, leaving blind users in many countries without adequate tools. For practitioners, the paper underscores that accessibility solutions must consider the full diversity of users — across languages, technical skill levels, and browsing contexts — rather than assuming a single-language, developer-driven model. The vision of user labels feeding machine learning to automatically improve accessibility foreshadowed current AI-driven accessibility remediation tools.

Tags: screen readers · non-visual web browser · blind users · collaborative accessibility · multilingual accessibility · dynamic content · Web 2.0 · crowdsourcing · text-to-speech

Standards referenced: WAI-ARIA · XHTML