VERSE: Bridging Screen Readers and Voice Assistants for Enhanced Eyes-Free Web Search

Alexandra Vtyurina, Adam Fourney, Meredith Ringel Morris, Leah Findlater, Ryen W. White · 2019 · Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2019) · doi:10.1145/3308561.3353773

Summary

This paper identifies the complementary strengths and weaknesses of screen readers and voice assistants (VAs) for blind web searchers, then presents VERSE (Voice Exploration, Retrieval, and SEarch), a prototype that merges the best of both technologies. An online survey of 53 legally blind screen reader and VA users revealed six key trade-off themes: brevity vs. detail (VAs give quick single answers but cannot go deeper; screen readers allow thorough exploration but require wading through clutter); granularity of control vs. ease of use (screen readers offer fine-grained navigation at the cost of complex keyboard commands; VAs are simple but imprecise); text vs. voice input (voice is faster and avoids spelling errors but suffers from speech recognition failures and environmental constraints); portability vs. agility (VAs are always at hand via smartphones or smart speakers; screen readers require sitting at a computer and launching a browser); incidental vs. intentional accessibility (web content is often inaccessible to screen readers due to poor WCAG compliance, while VA content is inherently audio-first, "levelling the playing field"); and transitioning between modalities (39 of 53 respondents reported needing to switch from a VA to a screen reader when the VA’s answer was insufficient). VERSE addresses these trade-offs by augmenting a smart-speaker-based VA with screen-reader-inspired capabilities: after providing an initial concise answer (like a VA), it allows users to explore multiple search results across different verticals (web pages, Wikipedia, news, videos, images), navigate within articles by headings/paragraphs/sentences (like a screen reader), and transition seamlessly to a phone’s screen reader for deeper browsing. A companion smartphone or smartwatch serves as an optional input accelerator using familiar VoiceOver-style gestures.

Key findings

In a design probe study with 12 blind screen reader users (average age 36.6, mean 18.5 years of screen reader experience, 5.7 years of VA experience), VERSE received a SUS usability score of 71.0 and all participants successfully completed both search tasks. Participants especially valued access to multiple search results and search verticals — features absent from current VAs. P3 noted: "Most screen readers and search engines do use headings, but it’s hard to switch search verticals. This is different and kind of interesting." P5 highlighted the aggregation: "One thing that immediately caught my eye was that different forms of data were being pulled together. [VERSE] gathers the relevant stuff and groups it in different ways." The Wikipedia navigation feature was particularly well received, as current VAs typically read only the first sentence or introduction of articles. P7 compared: "Even though the smart speaker I use has some ability to read Wikipedia, I can’t get back and forth by section and skip around. In that way, it’s an improvement." All 12 participants preferred the phone over the smartwatch as companion device due to the watch’s small touch target, aggressive power-saving behaviour, and latency from wireless radio management. Participants strongly desired more natural conversational interaction ("I should just have the ability to use a more natural voice like I’m having a conversation with you") and document comprehension capabilities ("read the paragraph that talks about this person’s work and it should understand").

Relevance

This paper maps a critical design space for non-visual information access that has major implications as voice interfaces become ubiquitous. The survey’s six trade-off themes provide a comprehensive framework for understanding why blind users currently maintain two separate technology paradigms (screen readers and VAs) and what a unified system would need to offer. The concept of "incidental accessibility" — where VAs inadvertently provide more accessible content than many websites because their output is audio-first — is a powerful insight that could motivate web developers to design for audio channels as part of responsive design. For accessibility practitioners, VERSE demonstrates that voice assistants need not be limited to simple question-answering: by adding structured navigation, search verticals, and content summarisation, they can support the complex information-seeking tasks that blind users currently depend on screen readers for. The finding that participants wanted conversational understanding ("read the section about X") rather than rigid commands points toward a future where large language models could fulfil VERSE’s vision more naturally.

Tags: screen readers · voice assistant · blind · web search · information retrieval · web accessibility · smart speaker · smartwatch · multimodal interaction · VoiceOver

Standards referenced: WCAG