Sasayaki: An Augmented Voice-Based Web Browsing Experience

Shaojian Zhu, Daisuke Sato, Hironobu Takagi, Chieko Asakawa · 2010 · Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2010) · doi:10.1145/1878803.1878870

Summary

This paper introduces Sasayaki (meaning "whisper" in Japanese), an intelligent voice-based agent designed to augment voice browser navigation for users with visual impairments. The authors, from IBM Research Tokyo and UMBC, identify two key problems with voice-based web browsing: voice browsers can only convey limited information sequentially, making it nearly impossible for users to form a mental model of complex page structures; and voice browsers lack the contextual support that sighted users get from visual cues like text styles, layout, and cursor position. Sasayaki addresses these gaps by introducing a secondary voice channel that whispers contextually relevant information alongside the primary voice browser output. Implemented as a plug-in for the aiBrowser voice browser, Sasayaki monitors the virtual cursor position and user behavior via DOM events, retrieves predefined role-based data from the Accessibility Commons server (identifying content roles like "main", "header", "advertisement"), and determines the most appropriate contextual or task-relevant information to present. The system uses a different speech synthesizer engine and sound device from the primary voice, creating a whispered-hints experience that is less intrusive than interrupting the main audio stream.

Key findings

A pilot study with three visually impaired participants (two blind, one low vision, aged 37-44, all native Japanese speakers with prior JAWS experience but no aiBrowser experience) showed that Sasayaki effectively improved navigation. Users spent less time navigating to target page elements, and advanced features like page overview, role-based jumping, and contextual prompts helped users retrieve information more rapidly. For example, on shopping websites, Sasayaki could collect and categorize data like price, stock status, and shipping across product pages, presenting it in a convenient format. For review-heavy pages, Sasayaki could invoke text-mining to summarize sentiments across many reviews, saving users from listening to each one sequentially. All participants gave consistently high ratings for usefulness and role-based context support. Less experienced voice browser users found the advanced functions particularly valuable, while the expert user was more neutral, suggesting existing expertise partially compensated for the lack of context. One user found dual simultaneous voices confusing, indicating a need for user-controllable timing preferences.

Relevance

Sasayaki represents an early and creative approach to a problem that persists today: screen reader and voice browser users lack the at-a-glance contextual awareness that sighted users take for granted. The dual-voice-channel concept — a primary voice for content and a whispering secondary voice for context — is an innovative interaction pattern that anticipated later developments in intelligent assistive agents. For accessibility practitioners, the key insight is that simply reading page content aloud is insufficient; users also need structural context, location awareness, and content summaries to browse effectively. The role-based content identification through Accessibility Commons foreshadowed the importance of ARIA landmarks and roles in modern web accessibility. The finding that one user found dual voices confusing highlights a recurring challenge in assistive technology: providing more information risks cognitive overload, requiring careful user-configurable controls.

Tags: web accessibility · screen reader · voice browser · visual impairment · contextual support · intelligent agent · web navigation