Synthesized Video Description

Also known as: TTS Video Description, Text-to-Speech Description, Synthesized Audio Description

An audio description for video content that is generated using text-to-speech (TTS) technology rather than recorded by a human narrator. A describer writes a text script describing the visual elements of a video, and speech synthesis software converts this text into spoken narration that is synchronised with the video playback. Synthesized descriptions offer significant cost and scalability advantages over human-narrated descriptions: they eliminate the need for professional voice talent and recording equipment, can be easily edited and updated by modifying the text, and allow customisation of voice parameters such as speed, volume, and gender. Research has shown that synthesized descriptions are generally acceptable to blind and low-vision users, particularly for educational and informational content, though human narration remains preferred for entertainment and emotionally complex material.

Category: Video Accessibility · Audio · Assistive Technology · Speech Technology

Related: Audio Description · Extended Description · Text-to-Speech · Video Accessibility

Sources

https://doi.org/10.1145/1878803.1878833