Sound Event Detection

Also known as: Audio Tagging, Automatic Sound Recognition

A machine learning technique that automatically identifies and classifies sounds within an audio stream, such as music, applause, laughter, environmental noises, and other non-speech audio events. In accessibility contexts, sound event detection can complement automatic speech recognition to produce more complete captions that include non-speech sounds. This technology is also used in sound awareness applications for deaf and hard of hearing people, alerting them to important environmental sounds like doorbells, fire alarms, or approaching vehicles. Challenges include accurately distinguishing between similar sounds, handling overlapping audio events, and determining which detected sounds are important enough to surface to users.

Category: artificial intelligence · captioning · deaf and hard of hearing · Assistive Technology

Related: Non-Speech Sounds · Automatic Speech Recognition · Closed Captioning

Sources

https://doi.org/10.1145/3517428.3544808