Overcoming Speech Barriers: Non-Verbal Voice Cue Interaction Technique for Enhancing Smart Voice Assistant Accessibility for Individuals with Dysarthria
Aisha Jaddoh, Fernando Loizides, Khadijah Alreafai, Omer Rana · 2025 · ACM Transactions on Accessible Computing · doi:10.1145/3726874
Summary
This study presents DARIA (DysARthrIA), a novel smart voice assistant interaction system that uses non-verbal voice cues instead of spoken words and sentences. The system addresses a critical accessibility gap: commercial voice assistants like Alexa and Google Home rely on automatic speech recognition (ASR) trained on typical speech, achieving word error rates as high as 78% for users with dysarthria compared to 9% for typical speakers. Dysarthria, a motor speech disorder caused by neurological conditions such as stroke, ALS, cerebral palsy, and traumatic brain injury, results in slurred speech, poor articulation, and reduced intelligibility that worsens with longer utterances. DARIA maps five non-verbal sounds to common smart home commands: /a/ for lights, /i/ for news, /u/ for weather, humming for music, and /ŋ/ (ng sound) for phone calls. These sounds were selected based on user preferences, ease of production for dysarthric speakers, and acoustic discriminability (using vowel chart corners to minimize overlap). The system runs on a Raspberry Pi with edge computing for privacy, using a classification model trained on dysarthric speech recordings from the TORGO database and custom recordings from nine individuals with dysarthria. Two between-subjects studies were conducted with 20 participants (14 male, 6 female) at the Sultan Bin Abdulaziz Humanitarian City rehabilitation hospital in Saudi Arabia, representing mild, moderate, and severe dysarthria from various etiologies. Study 1 tested DARIA with pre-defined sound-to-command mappings against Alexa. Study 2 tested a customization option where participants created their own mappings.
Key findings
**Performance comparison with Alexa**: DARIA achieved significantly higher and more consistent success rates across dysarthria severity levels. For the pre-defined mapping group, DARIA success rates were 72% (mild), 64% (moderate), and 66% (severe), while Alexa dropped dramatically from 82.67% (mild) to 33.6% (moderate) to just 24% (severe). Statistical analysis showed significant differences in 4 of 5 commands tested. DARIA's performance remained relatively stable as severity increased, while Alexa's declined sharply. **Usability**: DARIA with pre-defined mappings achieved an "excellent" SUS score of 85.75, compared to Alexa's 71.5. The customization group scored 80.6 ("good"), with no significant difference from Alexa's 81.2 in that comparison. **Memorability**: Pre-defined mappings showed 80% recall at 24 hours post-use, while custom mappings dropped to only 28% recall. Participants in the custom group reported performing mappings randomly without sufficient time to create meaningful associations, making recall difficult. **User experience (SASSI)**: DARIA scored higher on likability (significant difference, large effect size), response accuracy, and speed. Both systems showed equivalent low cognitive demand. Seven of 10 participants in Study 1 and 6 of 10 in Study 2 preferred DARIA over Alexa. **Customization paradox**: Despite 4 of 10 participants expressing desire for customization, the pre-defined system outperformed the custom option on usability, memorability, and required less physical effort. Customization required additional setup steps that reduced independence.
Relevance
This research demonstrates ability-based design principles by leveraging retained vocalization abilities rather than expecting users to adapt to systems designed for typical speech. The approach aligns with the Accountability principle—adapting the system to the user rather than vice versa. **Design recommendations for non-verbal voice cue systems**: 1. **Meaningful mapping**: Build on natural associations between sounds and actions to enhance memorability and ease of use 2. **Guidance for customization**: If offering customization, provide clear instructions and emphasize the importance of meaningful mappings 3. **Simplicity in design**: Prefer pre-mapped, plug-and-play solutions that work immediately without complex setup 4. **Balance customization with usability**: Simplify customization processes; provide adjustable defaults rather than blank-slate configuration 5. **Minimize physical effort**: Streamline configuration to avoid multiple steps that increase fatigue For practitioners developing voice interfaces, this study highlights that shorter utterances improve recognition for dysarthric users, edge computing can address privacy concerns with voice data, and pre-defined "good enough" solutions may outperform flexible customization options that impose cognitive and physical burdens. The tradeoff between vowel sound vocabulary size and memorability is also important—more sounds enable more commands but increase cognitive load.
Tags: dysarthria · speech impairment · voice assistants · smart speakers · non-verbal interaction · assistive technology · speech recognition · ability-based design