Target Sound Extraction

Also known as: Target Sound Separation, TSE

A machine-learning task in which a model isolates a specific target sound (or class of sounds) from a complex acoustic mixture, conditioned on some specification of the target - a text label, a reference recording, or an embedding. Distinct from blind source separation (which separates all sources without semantic labels) and from speech enhancement (which targets speech specifically), target sound extraction generalizes to arbitrary environmental sounds. In accessibility, TSE underpins emerging 'semantic hearing' systems that let noise-sensitive users foreground or attenuate specific sounds in real time, and powers personalized sound-awareness tools for d/Deaf and hard-of-hearing users who want notifications when, for example, a doorbell or a baby's cry occurs.

Category: AI and accessibility · Hearing Accessibility · Machine Learning · Auditory Accessibility

Related: Semantic Hearing · Sound Classification · Active Noise Cancellation

Sources