Conversational Voice User Interfaces Supporting Individuals with Down Syndrome: A Literature Review
Franceli L. Cibrian, Concepción Valdez, Lauren Min, Vivian Genaro Motti · 2025 · ACM Transactions on Accessible Computing · doi:10.1145/3715160
Summary
This systematic literature review examines 43 papers published between 1998 and 2023 on the use of Conversational Voice User Interfaces (CVUIs) to support individuals with Down syndrome. Using the PRISMA methodology, the authors searched ACM Digital Library, IEEE Xplore, PubMed, Scopus, and Web of Science to identify research at the intersection of voice-based technology and Down syndrome—the first review to comprehensively map this specific domain. The review addresses three research questions: what interfaces have been designed, what challenges exist with speech-to-text technologies, and what evaluations have been conducted. Down syndrome, caused by an extra 21st chromosome with a prevalence of 1 in 1,000-1,100 births, affects speech production in distinctive ways. Individuals with Down syndrome experience differences in voice (pitch, breathiness, quality), speech articulation, fluency (higher rates of stuttering and cluttering), and prosody—all of which impact their ability to use voice assistants trained on neurotypical speech patterns. The research landscape reveals that 72% of studies focused exclusively on children, with only 7% including both children and adults. Most projects aimed at skill development (35%), particularly speech and communication skills, followed by voice recognition (21%) and therapy support (21%). The review categorizes CVUIs by form factor: displays with or without speakers (the most common), speaker-only devices like Amazon Echo, and socially assistive robots. Each form factor presents different affordances for engagement and interaction design.
Key findings
Machine learning plays a central role, appearing in 62% of reviewed papers—primarily for developing speech recognition systems (57%), creating speech datasets (52%), and supporting diagnosis or assessment (30%). However, a critical gap exists: most speech datasets for Down syndrome involve small sample sizes, limiting the generalizability and accuracy of trained models. Current commercial speech recognition achieves only around 60% accuracy for individuals with Down syndrome, compared to much higher rates for neurotypical speech. Studies found that Google achieved 36.7% and Windows only 1.7% accuracy when transcribing speech from adults with Down syndrome, while familiar human listeners understood 77.5% of the same utterances. User evaluations (55% of papers) primarily assessed communication skills (25%), social skills (13%), and daily life assistance (18%). Socially Assistive Robots (SARs) showed promise in therapeutic contexts, improving engagement in therapy sessions and facilitating therapist-child interactions. Display-based applications like serious games helped reduce frustration compared to conventional therapies while supporting literacy and communication skills. The review identifies three critical design considerations: form factor (balancing cost, engagement, and accessibility), dialogue design (requiring participatory methods with Down syndrome users to determine appropriate vocabulary and formality), and target outcomes (currently skewed toward children's skill development rather than adult independence). A significant gap exists in research supporting adults with Down syndrome in daily activities.
Relevance
This review provides essential guidance for developers and researchers working on voice interface accessibility. The finding that commercial speech recognition performs dramatically worse for Down syndrome users—sometimes achieving near-zero accuracy—underscores that voice assistants are not accessible by default. Organizations deploying voice interfaces must recognize this limitation and consider multimodal alternatives or specialized recognition models. The emphasis on participatory design is crucial: dialogues for CVUIs cannot be designed assuming neurotypical communication patterns. Vocabulary, formality, conversation structure, and response timing must be co-designed with individuals with Down syndrome. The research also highlights that physical form factors matter—robots and displays may engage users differently than speaker-only devices, and design choices should match the target users and use contexts. For practitioners, the review reveals an underserved population: adults with Down syndrome who could benefit from voice assistants for daily living but are largely excluded from both research and product design. As CVUIs become increasingly integrated into smart homes and mobile devices, ensuring they work for neurodiverse users requires investment in diverse speech datasets, improved recognition algorithms, and inclusive design processes. The 60% recognition accuracy ceiling represents a significant barrier that accessibility advocates should prioritize.
Tags: voice user interfaces · Down syndrome · speech recognition · intellectual disability · conversational AI · systematic review · machine learning · neurodevelopmental disorders