SoundWatch: Exploring Smartwatch-based Deep Learning Approaches to Support Sound Awareness for Deaf and Hard of Hearing Users

Dhruv Jain, Hung Ngo, Pratyush Patel, Steven Goodman, Leah Findlater, Jon Froehlich · 2020 · Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '20) · doi:10.1145/3373625.3416991

Summary

This paper presents SoundWatch, a smartwatch-based sound awareness system that uses deep learning to classify environmental sounds in real time and provide visual and haptic notifications to deaf and hard of hearing (DHH) users. The research addresses the finding from prior surveys that smartwatches are the most preferred device for non-speech sound awareness among DHH people, due to their glanceable, always-available, private, and socially acceptable form factor. The authors conducted two complementary studies: a quantitative evaluation comparing four low-resource deep learning models (MobileNet, Inception, ResNet-lite, and VGG-lite) across four device architectures (watch-only, watch+phone, watch+cloud, and watch+phone+cloud), followed by a qualitative lab evaluation with eight DHH participants. The system classifies 20 common environmental sounds prioritized by DHH users from prior research, including fire/smoke alarms, alarm clocks, door knocks, doorbells, speech, dog barking, car horns, and sirens. SoundWatch processes audio through a privacy-preserving pipeline: sound is recorded on the watch, features are computed as non-reconstructable mel-spectrograms (ensuring conversational content cannot be retrieved), and classification occurs on the watch, phone, or cloud depending on the architecture. The app displays sound identity, loudness, classification confidence, and time of occurrence, with customizable mute options and the ability to select which sounds to monitor via a companion phone app.

Key findings

VGG-lite achieved the best classification accuracy at 81.2% (SD=5.8%) across all 20 sound classes and 97.6% (SD=1.7%) for the three highest-priority sounds (fire alarm, alarm clock, door knock), performing comparably to the state of the art for non-portable devices while using one-third the memory. However, there was a strict accuracy-latency tradeoff: VGG-lite averaged 3,397ms latency on the watch versus MobileNet at 256ms, though MobileNet only achieved 26.5% accuracy. The watch+phone architecture provided the best overall balance of CPU usage (22.3%), memory, battery life (15.2 hours to 30% drain vs. 3.3 hours for watch-only), and end-to-end latency (~2.2 seconds). In the qualitative study, all eight DHH participants found SoundWatch useful across home, office, and outdoor contexts, but misclassifications and latency were significant concerns. Participants expressed nuanced accuracy-latency preferences depending on sound urgency: for urgent sounds like fire alarms and car horns, they wanted minimum delay even at the cost of accuracy, while for non-urgent sounds like speech, they preferred maximum accuracy to avoid annoying false alerts. Privacy was a major factor in architecture preference — watch+phone was most preferred because it requires no internet connection, keeping all sound data local. Participants customized enabled sounds differently per location, and five of eight participants reduced outdoor sound categories to compensate for higher background noise interference.

Relevance

SoundWatch represents an important advance in making environmental sound awareness practical and wearable for DHH users in daily life. The research is valuable for accessibility practitioners for several reasons. First, it demonstrates that useful sound classification can run on consumer smartwatches with acceptable performance, moving beyond lab prototypes toward real-world deployment. Second, the nuanced findings about accuracy-latency-privacy tradeoffs provide a practical framework for designing any always-on assistive sensing system — users do not want a single setting but context-dependent customization. The privacy implications of always-listening devices are particularly relevant: participants wanted control over what sound data leaves their device, with preferences varying by context (more private at home/office, more tolerant of cloud processing outdoors). Third, the finding that participants actively customized which sounds to monitor per location challenges one-size-fits-all approaches to assistive technology and supports ability-based design that adapts to individual preferences and contexts. The open-source nature of SoundWatch enables further research and development in this space.

Tags: deaf accessibility · hard of hearing · sound awareness · deep learning · wearable technology · smartwatch · machine learning · sound classification