Speech Interaction with Personal Assistive Robots Supporting Aging at Home for Individuals with Alzheimer's Disease

Frank Rudzicz, Rosalie Wang, Momotaz Begum, Alex Mihailidis · 2015 · ACM Transactions on Accessible Computing (TACCESS) · doi:10.1145/2744206

Summary

This study examines speech-based interaction between older adults with Alzheimer's disease (AD) and a mobile assistive robot called ED, designed to help with activities of daily living. The research addresses a critical healthcare challenge: many nations lack capacity to support rapidly aging populations, and removing older adults from homes into long-term care is neither desirable nor sustainable. Ten older adults with AD (ages 59-88, MMSE scores 9-25) and their caregivers participated in the study at the Toronto Rehabilitation Institute's HomeLab—a fully functional simulated apartment. ED, a 102cm tall robot with an LCD face and text-to-speech output, guided participants through hand washing and tea making tasks while a teleoperator controlled movement and monitored for distress. The study analyzed two aspects: communication patterns using a 12-category taxonomy of "trouble-indicating behaviors" (TIBs) that signal confusion or difficulty, and automatic speech recognition performance under realistic home conditions. The researchers intentionally used modest microphone arrangements (environmental Kinect microphones rather than expensive arrays) to establish a baseline for what could realistically be deployed in homes without specialized equipment.

Key findings

The most striking finding was that over 40% of trouble-indicating behaviors when interacting with the robot were "lack of uptake"—participants simply ignoring the robot entirely. However, paradoxically, interactions with the robot were more likely to be successful (18.1% showed no TIB) compared to human conversants (only 6.7%), suggesting the structured, task-oriented nature of robot prompts may reduce certain types of confusion. Speech recognition proved extremely challenging. Signal-to-noise ratios ranged from -3.42 dB to 8.14 dB—far below the 40dB typical for clean speech. Baseline ASR accuracy was only 13.7-25.1% during sit-down interviews and dropped to 5.8-19.2% during household tasks. However, applying noise reduction (LSAE) and adapting acoustic/language models using dementia speech corpora (Carolina Conversations and DementiaBank) improved accuracy by approximately 9% absolute for speakers with AD. ASR accuracy showed a trend of improvement with higher MMSE cognitive scores, though this was not statistically significant. Critically, accuracy was significantly higher during interviews than during tasks—likely due to microphone proximity and competing noise from activities like running water. Communication strategies validated by the research include: slow speech rate, simple sentences with reduced syntactic complexity, one instruction at a time, closed-ended yes/no questions, minimal pronouns, and using the person's name to attract attention.

Relevance

This research provides essential baseline data for anyone developing voice-controlled assistive technology for people with dementia—a population that will grow dramatically as healthcare systems face the demographic shift of aging societies. The findings highlight that standard ASR trained on typical speech performs very poorly with this population, and that real-world home environments introduce acoustic challenges far beyond laboratory conditions. For accessibility practitioners, the key insights are: (1) speech recognition for people with AD requires specialized acoustic and language model adaptation, not just off-the-shelf systems; (2) the most common response to confusion is simply ignoring the system, requiring design strategies to re-engage users; (3) multimodal interaction combining speech with nonverbal cues (gesture, gaze, facial expression) may be necessary since human caregivers use nonverbal strategies about one-third as often as verbal ones. The research also validates that physically embodied robots may be more acceptable than embedded smart home systems, and that step-by-step prompting approaches developed for smart homes can transfer to mobile robots. For organizations considering voice interfaces for older adults with cognitive impairment, this study demonstrates both the potential and the significant technical barriers that remain.

Tags: Alzheimer's disease · dementia · assistive robotics · speech recognition · aging in place · smart home · human-robot interaction · activities of daily living · older adults · cognitive accessibility