Speaker Adaptation

Also known as: Voice Adaptation, Speaker-Adaptive Training, Voice Personalization

Speaker adaptation is the process of adjusting an existing automatic speech recognition (ASR) system — usually one trained on a large, demographically broad corpus of able-bodied speakers — to a particular individual's voice using a relatively small amount of that person's speech data. Classical techniques include Maximum Likelihood Linear Regression (MLLR) and Maximum A Posteriori (MAP) adaptation; modern systems fine-tune neural acoustic models or train per-speaker embeddings. Speaker-adaptive models are typically contrasted with speaker-dependent models (trained from scratch on one user) and speaker-independent models (no per-user customisation). Adaptation is central to accessibility work on atypical speech — dysarthria, deaf speech, post-stroke speech, ALS — and underlies products such as Google Project Euphonia, Apple Personal Voice, and Voiceitt that aim to make voice assistants and dictation usable for speakers whom mainstream models otherwise mis-recognise.

Category: Speech Technology · Speech Recognition · Machine Learning · Assistive Technology · Personalization

Related: Acoustic Model · Automatic speech recognition · Dysarthria · Hidden Markov Model · Speech Intelligibility

Sources