AdaptiveSound: An Interactive Feedback-Loop System to Improve Sound Recognition for Deaf and Hard of Hearing Users

Hang Do, Quan Dang, Jeremy Zhengqi Huang, Dhruv Jain · 2023 · ASSETS '23: Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3597638.3608390

Summary

This paper introduces AdaptiveSound, the first feedback-loop sound recognition system designed for deaf and hard of hearing (DHH) users. Sound recognition tools help DHH people become aware of environmental sounds ranging from safety-critical alerts like fire alarms and sirens to everyday information like door knocks, microwave beeps, and water running. However, existing systems use models pre-trained on generic sound datasets that achieve only 60-80% accuracy when deployed in real-world environments, because sounds vary dramatically across contexts — a door knock sounds different on wood versus metal, and background noise levels differ between homes, offices, and outdoor spaces. AdaptiveSound addresses this by allowing users to provide corrective feedback to the model through a simple thumbs up/thumbs down interface on an Android app. When the model misclassifies a sound, the user can indicate the correct label and rate their confidence on a 1-5 scale. The system uses incremental reinforcement learning (IRL) to continuously fine-tune the model based on this feedback without requiring full retraining. The technical architecture addresses three practical challenges: class imbalance from frequently occurring sounds (mitigated using conditional GANs to generate synthetic training data for underrepresented classes), unreliable user feedback (handled through a tunable surety parameter that adjusts the learning rate), and memory constraints on mobile devices (resolved through small-batch incremental training with data deletion after training). The system uses a lightweight MobileNetV2 CNN architecture (about 8MB), processes 3-second audio windows via mel spectrograms, and runs entirely on-device using TensorFlow Lite for privacy.

Key findings

Quantitative experiments showed that after approximately 100 feedback steps (about 10 minutes of user effort), AdaptiveSound achieved 93.8% average accuracy across 22 sound categories — significantly higher than ProtoSound (85.9%, p<.001) and SoundWatch (79.2%, p<.001), two state-of-the-art baseline systems. The improvement was particularly pronounced in outdoor environments (+11.5% over ProtoSound, +23.3% over SoundWatch), where variable background noise makes pre-trained models least reliable. The highest-performing sound classes were microwave (99.2%), dog bark (98.9%), and hazard alarm (98.8%), while the most confused classes included alarm clock (confused with phone ringing) and water running (confused with dishwasher). A 3-day field evaluation with 12 DHH participants revealed that users found the feedback process "straightforward," "intuitive," and "effortless." Participants used the app an average of 6.1 hours per day across homes, streets, parks, restaurants, and malls. The system recognized an average of 264.3 sound events per day, with participants providing feedback on about 25% of recognized sounds. Feedback decreased over time (81.7 on day one to 45.3 on day three) as the model adapted. Five participants who had used existing systems like iOS sound recognition and SoundWatch explicitly stated AdaptiveSound was "much more reliable" and "accurate." Participants particularly valued the confidence rating feature for situations where they were unsure which sound had occurred.

Relevance

AdaptiveSound represents an important shift in assistive technology design — from static, one-size-fits-all models to personalized, user-adaptive systems. The work acknowledges that DHH communities are culturally diverse, with different preferences for sound awareness: hard of hearing individuals may desire speech sound detection more than Deaf individuals, and preferences for which sounds to be notified about vary by context. The human-in-the-loop approach empowers users to shape the technology to their specific environments and needs rather than accepting a generic model. For practitioners, the system demonstrates that meaningful accuracy improvements are achievable with minimal user effort on consumer hardware, making it practical for real deployment. The open-source release of both the Python pipeline and the Android app (github.com/AccessibilityLab/AdaptiveSound) enables researchers and developers to build on this work. The authors responsibly note that sound recognition may not be universally desired — some Deaf people may view it as privileging hearing norms — and suggest it can be constrained to a small subset of safety-critical sounds while otherwise avoiding replicating the hearing world.

Tags: deaf and hard of hearing · sound recognition · human-in-the-loop · machine learning · incremental learning · sound awareness · personalization · mobile accessibility