Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

Supervector(also: GMM Supervector): A supervector is a high-dimensional feature representation created by concatenating the mean vectors from all components of a Gaussian Mixture Model (GMM) adapted to a specific speaker or utterance. This concatenation transforms variable-length speech into a fixed-length vector…
Synthesized Speech(also: Synthetic Speech, Speech Synthesis, TTS Output): Computer-generated speech produced by text-to-speech (TTS) engines that convert written text into spoken audio output. Synthesized speech is the primary means by which screen readers convey on-screen content to blind and visually impaired users. While modern TTS voices have…
Synthesized Video Description(also: TTS Video Description, Text-to-Speech Description, Synthesized Audio Description): An audio description for video content that is generated using text-to-speech (TTS) technology rather than recorded by a human narrator. A describer writes a text script describing the visual elements of a video, and speech synthesis software converts this text into spoken…
Synthetic Speech(also: Artificial Speech, Computer-generated Speech): Speech that is artificially produced by computer systems rather than recorded from human speakers. Synthetic speech is the output of text-to-speech systems and is fundamental to screen readers and voice assistants. Modern synthetic speech uses various generation methods…
Talking Head(also: Virtual Talking Head, Animated Face, 3D Talking Head): A talking head is a computer-generated 3D or 2D animated representation of a human face and articulatory system that produces visible speech movements synchronised with audio output. In accessibility and speech therapy contexts, talking heads are particularly valuable because…
Text-to-Speech(also: TTS, Speech Synthesis): Technology that converts written text into spoken audio output. Text-to-speech is a fundamental component of many assistive technologies, including screen readers, audio description tools, and communication devices for people with speech disabilities. Modern TTS systems use…
Time-compressed Speech(also: Accelerated Speech, Speed-altered Speech): Speech that has been digitally processed to play at a faster rate than it was originally recorded or synthesized, while preserving pitch. Unlike simply increasing playback speed (which raises pitch), time compression algorithms remove small portions of the audio signal to reduce…
Unit Selection Synthesis(also: Concatenative Unit Selection, Unit Selection TTS): A text-to-speech synthesis approach that generates speech by selecting and concatenating variable-length segments of pre-recorded human speech from a large database to match the input text. Unit selection synthesizers generally produce more natural-sounding speech than…
Universal Background Model(also: UBM): A Universal Background Model (UBM) is a large Gaussian Mixture Model trained on speech from many speakers to represent speaker-independent acoustic characteristics. The UBM serves as a reference distribution against which individual speaker models are compared, typically using…
Visual Feedback(also: Visual Biofeedback): A method of providing real-time visual information to a user about their actions, performance, or physiological state. In speech therapy and assistive technology, visual feedback systems display graphical representations of vocal output to help users understand and modify their…
Visual Speech Aid(also: Speech Reading Aid, Visual Communication Aid): A visual speech aid is an assistive device or system that converts auditory speech information into visual form to help individuals with hearing impairments follow spoken conversation. These aids may display text (as in captioning systems), phonetic symbols, lip-shape cues,…
Vocalization Analysis(also: Vocal Analysis, Infant Vocalization Analysis): Vocalization analysis is the systematic study and measurement of vocal productions, including speech, pre-speech sounds, and non-speech vocalizations. In developmental and clinical contexts, vocalization analysis involves recording, digitizing, and examining acoustic features of…
Voice Cloning(also: Voice Synthesis Cloning, Personalized Text-to-Speech): The use of machine-learning models to synthesise a target speaker's voice from a short reference recording, enabling text-to-speech output that sounds like that specific person. For accessibility, voice cloning has transformative potential: people whose voices are at risk of…
Voice Conversion(also: VC, speech conversion): A speech processing technique that transforms one person's voice to sound like another while preserving the linguistic content. In accessibility applications, voice conversion can improve the intelligibility of speech from people with articulation disorders by replacing unclear…
Wav2Vec(also: Wav2Vec2, Wav2Vec 2.0): A family of self-supervised speech representation models from Meta AI that learn rich acoustic embeddings directly from raw waveform audio without requiring transcribed training data. Wav2Vec 2.0, introduced in 2020, became a backbone for low-resource automatic speech…

Category

Search results