Automated Speech Recognition

Also known as: ASR, Speech-to-Text, Voice Recognition

Technology that converts spoken language into written text using machine learning and signal processing algorithms. In accessibility, ASR is used for real-time captioning, voice control of devices and software, and generating transcripts of audio and video content. While ASR accuracy has improved dramatically, it remains affected by factors such as background noise, speaker accents, overlapping speech, domain-specific vocabulary, and audio quality. Hybrid approaches that combine ASR with human correction can bridge the gap between fully automated and fully manual captioning.

Category: Assistive Technology · speech recognition · Captioning · Artificial Intelligence

Related: Real-Time Captioning · CART · Natural Language Processing · Captions

Sources

https://www.w3.org/WAI/media/av/captions/
https://doi.org/10.1145/2745555.2746648