Automatic Speech Recognition (ASR)

Also known as: ASR, Speech-to-Text, Voice Recognition, Auto-Captioning

Technology that converts spoken language into written text using computational algorithms and machine learning models. ASR powers auto-captioning features in video conferencing, media players, and assistive devices. While ASR has improved significantly, its accuracy is affected by background noise, speaker accents, speech rate, and distance from the microphone. For accessibility purposes, ASR provides an on-demand alternative to human captioners but currently lacks features like speaker identification, punctuation, and contextual sound descriptions that human captioners can provide.

Category: Technology · speech recognition · Assistive Technology · captioning

Related: Real-Time Captioning

Sources

https://www.w3.org/WAI/media/av/captions/
https://dl.acm.org/doi/10.1145/3290607.3308461