Speech-to-Text

Also known as: STT, Speech Recognition, Automatic Speech Recognition, ASR

Technology that converts spoken language into written text, enabling voice-based input for digital systems. In accessibility, speech-to-text serves multiple roles: it powers voice command interfaces for users who cannot use keyboard or touch input, generates real-time captions for deaf and hard of hearing users during conversations and media playback, and enables dictation for users with motor impairments. Modern speech-to-text systems use deep learning models and can operate in real-time with high accuracy across multiple languages. Key accessibility considerations include accuracy for diverse speech patterns (accents, speech disabilities), latency for real-time captioning, and handling of background noise in real-world environments.

Category: assistive technology · artificial intelligence

Related: Text-to-Speech · Voice Interaction · ASR Captioning · Captioning · Voice-First Design

Sources

https://www.w3.org/WAI/media/av/captions/