Reading Between the Guidelines: How Commercial Voice Assistant Guidelines Hinder Accessibility for Blind Users
Stacy M. Branham, Antony Rishin Mukkath Roy · 2019 · Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2019) · doi:10.1145/3308561.3353797
Summary
This paper presents a qualitative document review of public-facing voice assistant design guidelines published by five leading commercial vendors — Amazon (Alexa), Google (Google Assistant), Microsoft (Cortana), Apple (Siri), and Alibaba (Xiao AI) — to examine how these guidelines might empower or constrain blind users. The researchers collected 271 pages of guidelines from Google, 83 from Amazon, 42 from Microsoft, 15 from Apple, and 19 from Alibaba in September 2018, then conducted inductive thematic analysis, producing 190 coded instances organized into 18 open codes and five axial codes. The central finding is that all five vendors’ guidelines share a fundamental metaphor: voice assistant interactions should be modeled after human-human conversation. This metaphor is enacted through recommendations to model dialogue after human speech patterns, aim for "natural" conversation using everyday language, enact distinct personas with human-like personality traits, and draw on formal linguistic theories like the Cooperative Principle. While this conversational model may work well for many users, the paper argues it embeds assumptions about human capabilities — what constitutes acceptable conversation length, complexity, and speed — that systematically exclude blind users and other people with disabilities.
Key findings
The analysis identified five major themes across guidelines: make conversations human (39 instances), make conversations personal (50 instances), make conversations efficient (68 instances), make conversations relational (13 instances), and give users a sense of control (20 instances). The efficiency theme proved most problematic for accessibility. Guidelines impose restrictive limits on response length (maximum two keywords per turn, three items per list, a "one-breath test" for utterances, 20-second maximum for list reading) and discourage complexity, arguing these constraints prevent cognitive overload. However, research shows blind people are remarkably superior to sighted peers at serial memory tasks and can comprehend speech at up to 25 syllables per second — more than double the 10 syllables per second maximum for sighted people. Experienced screen reader users already find VAPA speech rates frustratingly slow. Similarly, guidelines recommend handling complex multi-intent commands from users but pair this with inflexible session timeouts that disadvantage blind users and people with intellectual disabilities who need more time to formulate complete utterances. Guidelines also discourage non-speech audio cues (earcons), which Google calls a "mismatch for conversational interfaces," despite these being valuable navigation tools for blind users accustomed to screen reader audio feedback. Notably, none of the guidelines from any vendor contained any specific advice about making voice assistants accessible to people with disabilities.
Relevance
This paper reveals a structural accessibility problem: the design guidelines that shape how voice assistants work for tens of millions of users are built on a definition of "human" conversation that excludes the capabilities and preferences of blind users. The irony is sharp — voice assistants are arguably the most promising mainstream platform for blind accessibility, yet their foundational design principles actively constrain that potential. The three design recommendations are practical and actionable: (1) allow conversation preferences like response length, list size, and speech rate to be customized by users on-the-fly rather than hard-coding limits; (2) allow custom voice commands ("voice macros") so users can define shorter trigger phrases for frequently used complex commands; and (3) support adjustable speech rates, as screen reader users expect. The finding that guidelines completely omit disability considerations — despite explicitly addressing children, older adults, and non-native speakers — highlights a systematic oversight in commercial voice platform development. For accessibility practitioners, this work demonstrates the value of examining upstream design documentation, not just end-user experiences, to identify where accessibility barriers are baked into technology at the design level.
Tags: voice assistant · blind · visual impairment · conversational user interfaces · design guidelines · Alexa · Siri · Google Assistant · screen readers · accessibility barriers