Speaking with My Screen Reader: Using Audio Fictions to Explore Conversational Access to Interfaces

Mahika Phutane, Crescentia Jung, Niu Chen, Shiri Azenkot · 2023 · ASSETS 2023: The 25th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3597638.3608404

Summary

This paper explores whether and how human-like conversational assistants could extend the screen reader experience for blind and low vision (BLV) users. Current screen readers provide linear, impersonal access to interfaces through keyboard-driven cursor navigation and text-to-speech output, while conversational assistants like Alexa and Siri offer targeted voice-based access but lack the granular control BLV users need for interface navigation. The researchers conducted two studies to investigate the design space between these paradigms. A formative interview study with 10 BLV participants (ages 20-60) explored current screen reader and voice assistant practices but found it difficult to situate conversations about a hypothetical future technology. This led the team to adopt a speculative design approach using "audio fictions" — pre-recorded 2-3 minute dialogues between fictional users and CANVAS (Conversational Assistance for NonVisual Access to Screens), a hypothetical screen reader assistant that assumed four distinct roles: Friend (funny, casual, irreverent), Butler (neutral, formal, respectful), Expert (serious, formal, detailed), and Caregiver (serious, neutral, mindful). Each role was situated in a different everyday context — email and social media at home, medical website browsing, food ordering on public transit, and health monitoring on a smartwatch outdoors. The roles were designed using Nielsen Norman's Four Dimensions of Tone of Voice framework, varying humor, style, manner, and mood, with different levels of control, detail, and personalization. The assistant voices were generated using AI text-to-speech to distinguish them from the human user voices.

Key findings

Fourteen BLV participants (ages 30-58, mean 40.1) responded strongly to the audio fictions, with most believing conversational interactions would enrich their screen reader experience. Butler (n=6) and Caregiver (n=5) were the most preferred roles for their empathetic, polite, and caring qualities, while Friend was the least preferred (n=8 least preferred) for being "bossy," "overbearing," and "dictatorial," though some appreciated its humor. A key tension emerged between AI adaptation and screen reader customization: participants wanted CAs to learn and adapt to their behaviors and moods, but also expected extensive manual customization options similar to current screen readers, including speech rate, voice selection, and verbosity levels. Participants identified three distinct levels of control they need to maintain: granular cursor movement (knowing exact position on screen), medium-level screen representation (how content is summarized and presented), and high-level task assistance (delegating multi-step tasks). Privacy and consent were critical concerns, with participants distinguishing between tasks appropriate for personable assistants versus those requiring impersonal "manual" screen reader access. Participants were particularly excited about conversational image descriptions that would let them ask follow-up questions about visual content on social media and websites, moving beyond static alt text. The concept of visual semantic awareness — CAs conveying font sizes, colors, and layout — was valued for collaborative work with sighted colleagues.

Relevance

This research is highly relevant as large language models and conversational AI become increasingly integrated into accessibility tools. The finding that BLV users view conversational screen readers as a natural next step validates the direction of current industry developments, while the nuanced findings about control, trust, and role preferences provide essential design guidance. The three levels of control framework (cursor, representation, task) offers practitioners a useful lens for evaluating how much agency AI-powered accessibility tools should assume. The tension between adaptation and customization highlights a fundamental design challenge: users want AI that learns their preferences but also want explicit control over every setting, suggesting a hybrid approach is needed. The audio fiction methodology itself is a valuable contribution, demonstrating how speculative audio artifacts can elicit richer responses about future assistive technology than traditional interviews. For screen reader developers incorporating LLM capabilities, this paper's findings about privacy concerns, consent for autonomous actions, and the importance of maintaining fallback to traditional screen reader navigation are directly actionable.

Tags: screen readers · conversational agents · blind and low vision · design fiction · voice assistants · nonvisual access · speculative design · anthropomorphism · assistive technology · AI accessibility

Standards referenced: WCAG