← All terms

Endpoint Detection

Also known as: Voice Activity Detection, VAD

The process by which a speech-recognition system decides when a user has finished speaking, so the system can stop listening and send the captured audio for recognition. Off-the-shelf voice assistants typically use a silence threshold of 500ms-1s, which cuts off users who pause, stammer, breathe between syllables, or speak slowly — disproportionately affecting people with speech disorders, intellectual disabilities, autism, motor speech impairments, older adults, and non-native speakers. Accessible voice interface design should expose the listening window as a configurable setting rather than hiding it in accessibility-only modes, and should use multi-turn sessions and extended re-prompt intervals to avoid premature cut-off.

Category: Speech Technology · Voice Interface · Automatic Speech Recognition · Speech Accessibility

Related: Automatic Speech Recognition · Voice Assistant · Speech Accessibility

Sources