Making Emergency Calls More Accessible to Older Adults Through a Hands-free Speech Interface in the House

Michel Vacher, Frédéric Aman, Solange Rossato, François Portet, Benjamin Lecouteux · 2019 · ACM Transactions on Accessible Computing (TACCESS) · doi:10.1145/3310132

Summary

This paper presents CirdoX, a hands-free voice user interface system designed to detect emergency calls for help from older adults living alone at home. Wearable personal emergency response (PER) systems — typically pendant buttons worn on the neck or wrist — are the mainstream solution for allowing frail individuals to call for help, but research consistently shows they are poorly adopted: older adults forget to wear them, find them stigmatizing, cannot reach them after a fall, or simply choose not to use them. A voice-based system eliminates these problems because it requires no physical proximity or wearable device, operates through natural language ("Help me!" or "I fell"), and is always available through ceiling-mounted microphones. The system faces multiple compounding technical challenges: it must process distant speech (2.5+ meters from the microphone) rather than close-talk input, handle emotionally stressed voices altered by pain, fear, or panic, recognize older adults' voices which differ acoustically from typical ASR training populations, distinguish emergency calls from ordinary conversation, and operate in real-time with very low false alarm rates in a privacy-respecting manner. The research used an innovative methodology: since it is ethically and practically difficult to have older adults enact falls, 13 younger participants wore old-age simulators that hampered their mobility, vision, and hearing while performing emergency scenarios (slips, stumbles, falls, inability to rise from a sofa) in a living lab, alongside 4 actual older adults aged 61-83.

Key findings

The online (real-time) system achieved a Call Error Rate (CER) of 27% with the adapted acoustic model, meaning 73% of emergency calls were correctly detected. The off-line improved system using Subspace GMM acoustic models reduced CER to 24% overall (23% for younger participants with simulators, 28% for older adults). A critical challenge was that 26% of emergency calls were filtered out before even reaching the ASR stage because the speech/non-speech discrimination module classified them as non-speech — particularly when the call for help was overlaid with noise from the fall itself (only 43.2% of noisy speech events were correctly classified as speech versus 86.3% for clean speech). Very short emergency utterances like "aïe" (ouch) or "oh là" proved particularly challenging for the language model. WER for emergency calls improved dramatically from 80.5% with the generic model to 49.3% with the adapted model, but remained high due to distant speech conditions and emotional vocal qualities. The false alarm rate was extremely low (0.66%), which is critical for real-world deployment — a system that frequently triggers false alarms would be quickly abandoned. The adapted acoustic model, trained using Maximum Likelihood Linear Regression with just 9 minutes of older adult speech data from the Sweet-Home corpus, produced the most significant performance gains.

Relevance

This research addresses a life-or-death accessibility problem: falls are the leading cause of dependency and a major cause of death among older adults, and "long-lies" (remaining on the floor for over an hour after a fall) dramatically increase mortality risk. Current PER technology fails precisely when it is most needed because it requires the person to be wearing a device and able to press a button. For ambient assisted living developers, the paper provides a detailed technical blueprint for voice-based emergency detection, including the critical finding that adapting acoustic models to older voices with very small amounts of data (9 minutes) yields substantial improvements. The 24% call error rate, while not yet sufficient for standalone deployment, establishes that the approach is viable and identifies specific areas for improvement — particularly noise robustness during falls and recognition of very short utterances. For the aging-in-place technology community, this work demonstrates a privacy-respecting alternative to camera-based monitoring: the system listens only for emergency calls and ignores all other speech, addressing the surveillance concerns raised elsewhere in our collection.

Tags: speech recognition · aging · emergency response · ambient assisted living · smart home · voice user interface · older adults · hands-free interaction · fall detection