Evaluation of a Context-Aware Voice Interface for Ambient Assisted Living: Qualitative User Study vs. Quantitative System Evaluation

Michel Vacher, Sybille Caffiau, François Portet, Brigitte Meillon, Camille Roux, Elena Eluj, Benjamin Lecouteux, Pedro Chahuara · 2015 · ACM Transactions on Accessible Computing (TACCESS) · doi:10.1145/2738047

Summary

This study evaluates Sweet-Home, a voice-controlled smart home system designed to help older adults and people with visual impairments maintain independence at home. The research was conducted in a realistic 30-square-meter smart apartment (Domus) equipped with 150 sensors, 7 microphones distributed throughout the space, and controllable home automation devices including lights, blinds, doors, and a music player. Eleven participants—six older women (mean age 81.2) and five people with visual impairments (mean age 62.2)—completed scripted scenarios involving daily activities like waking up, preparing breakfast, and receiving visitors, all while using voice commands to control the environment. The system uses distant speech recognition (no wearable microphone required), voice activity detection to distinguish speech from environmental sounds, and Markov Logic Networks to make context-aware decisions about which device a command targets. For example, saying "close" near the window triggers the blinds, while the same command near the door locks it. The researchers compared quantitative ASR performance metrics against qualitative observations from video analysis and participant interviews, revealing a significant gap between technical measurements and actual user experience.

Key findings

ASR performance showed 43.23% word error rate for voice commands, with 59% recall but perfect precision—meaning the system only acted on commands it was confident about, avoiding false activations. Age strongly correlated with missed command ratio (R²=0.6145), suggesting older users had more difficulty being recognized. The context-aware decision system achieved 98.5% accuracy in selecting the correct target device. User behavior diverged significantly between groups. Older participants personified the system—adding politeness markers like "please" and "thank you," looking around for an interlocutor, and deviating frequently from the command grammar. Visually impaired participants treated it as a tool, following the grammar more strictly and showing greater tolerance for system failures. Despite researchers' privacy concerns, no participant considered the microphones problematic; instead, they valued reduced fear of falling and losing autonomy. The main complaints were slow response times (1.5x utterance duration) and lack of feedback confirming commands were heard.

Relevance

This research demonstrates both the promise and current limitations of voice-controlled assistive environments. The finding that technical metrics like WER poorly predict user satisfaction is crucial for accessibility practitioners—a system with 43% error rate was still accepted because it avoided false activations and provided meaningful independence benefits. The contrast between user groups highlights how different disabilities shape technology expectations and interaction patterns. For smart home and voice interface developers, the study underscores the need for flexible grammar recognition (users will say "please" regardless of training), clear feedback mechanisms, and faster response times. The age-correlated recognition difficulties point to potential ageism in ASR training data. The finding that privacy concerns are less important than autonomy for this population challenges assumptions about surveillance acceptability in assisted living contexts.

Tags: ambient assisted living · voice interface · smart home · aging in place · visual impairment · speech recognition · context-aware computing · older adults