Speaker Diarisation

Also known as: Speaker Diarization, Speaker Segmentation

The automatic process of segmenting an audio recording by speaker identity — answering "who spoke when" — and labelling each segment. A critical pre-requisite for accessible transcripts of multi-voice audio such as interviews, podcasts, and meetings, since a flat transcript without speaker labels is substantially harder to parse for listeners with aphasia, cognitive disabilities, or hearing loss. Modern pipelines combine voice activity detection, speaker embeddings, and clustering.

Category: Speech Technology · Media Accessibility · Audio

Related: Automatic Speech Recognition · Captioning · Transcript

Sources

https://developers.google.com/speech-to-text/docs/multiple-voices