Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

SAPI(also: Speech Application Programming Interface, Microsoft SAPI): The Speech Application Programming Interface (SAPI) is a Microsoft Windows API that enables applications to use speech recognition and text-to-speech synthesis. SAPI provides a standardized interface between speech engines and applications, meaning that a synthetic voice built…
Semantically Unpredictable Sentences(also: SUS, SUS Test): A standardised method for evaluating speech intelligibility in which listeners are presented with sentences that are grammatically correct but semantically meaningless, such as "A polite art jumps beneath the arms" or "The law that finished shows the boots." Because the…
Speaker Adaptation(also: Voice Adaptation, Speaker-Adaptive Training, Voice Personalization): Speaker adaptation is the process of adjusting an existing automatic speech recognition (ASR) system — usually one trained on a large, demographically broad corpus of able-bodied speakers — to a particular individual's voice using a relatively small amount of that person's…
Speaker Diarisation(also: Speaker Diarization, Speaker Segmentation): The automatic process of segmenting an audio recording by speaker identity — answering "who spoke when" — and labelling each segment. A critical pre-requisite for accessible transcripts of multi-voice audio such as interviews, podcasts, and meetings, since a flat transcript…
Spectrogram(also: Sonogram, Spectral Display): A spectrogram is a visual representation of the frequency spectrum of a signal as it varies over time, typically showing time on the horizontal axis, frequency on the vertical axis, and intensity represented by color or brightness. In speech science and accessibility research,…
Speech Composer(also: Speech Generation, Message Composition Engine): A software component in AAC (Augmentative and Alternative Communication) systems that takes user input — whether typed text, selected symbols, or telegraphic phrases — and processes it for spoken output through a text-to-speech synthesiser. Advanced speech composers may include…
Speech Diversity(also: Diverse Speech, Non-Typical Speech): The full range of ways human speech varies from the narrow 'typical' speech on which most speech-AI systems are trained and benchmarked. Speech diversity includes people who stutter, d/Deaf and Hard-of-Hearing speakers, people with dysarthria, aphasia, or other neurological…
Speech Language Model(also: SLM, Audio Language Model, Speech Foundation Model): A class of large neural models that processes both speech and text in a single end-to-end framework, integrating tasks — automatic speech recognition, spoken language understanding, dialogue, speech generation — that traditionally required separate modular systems. Examples…
Speech Neuroprosthesis(also: Speech BCI, Speech Brain-Computer Interface): A brain-computer interface that decodes neural activity associated with attempted or imagined speech and converts it into text, synthesized voice, or both. Speech neuroprostheses are designed for people with anarthria or severe dysarthria from ALS, brainstem stroke, locked-in…
Speech Prosodics(also: Prosodic Features, Suprasegmental Features): Speech prosodics refers to the nonverbal acoustic features of speech that convey meaning beyond the words themselves, including pitch (fundamental frequency), rhythm, stress, intonation patterns, pausing, and speaking rate. In accessibility research, prosodic analysis serves as…
Speech Rate(also: Speaking Rate, Articulation Rate): The speed at which speech is produced, typically measured in words per minute (WPM) or syllables per second. Normal conversational speech ranges from 120-180 WPM, while screen reader users often configure synthetic speech at rates of 300-400 WPM or higher. Speech rate settings…
Speech Repair(also: Self-Correction, Speech Self-Repair, Command Correction): Speech repair is the process of correcting or modifying a spoken utterance after it has been produced, either within the same turn or in a subsequent one. In natural conversation, speakers commonly interrupt themselves to fix errors, change wording, or update information using…
Speech Visualization(also: Visual Speech Display, Speech-to-Visual Display): Speech visualization refers to techniques that convert spoken language into visual representations to aid comprehension, particularly for individuals who are deaf or hard of hearing. These displays can range from real-time captioning and waveform displays to more abstract…
Speech-Generating Device(also: SGD, Voice Output Communication Aid, VOCA): An electronic AAC device that produces spoken output from text or symbol input, enabling people with speech disabilities to communicate verbally with others. Speech-generating devices range from dedicated hardware (such as Tobii Dynavox devices) to software applications running…
Speech-to-Speech(also: S2S, Speech-to-Speech Conversion): A class of systems that transform one speech signal directly into another — for example, converting atypical input (whispered, dysarthric, accented, or cross-lingual speech) into clear, intelligible output in a target voice or language. Speech-to-speech systems differ from…
Spoken Dialogue System(also: SDS, Voice Dialogue System): A computer system that communicates with users through spoken natural language, allowing them to interact via voice rather than visual or manual interfaces. Spoken dialogue systems are used in telecare, customer service, and home care applications, and are particularly relevant…
Supervector(also: GMM Supervector): A supervector is a high-dimensional feature representation created by concatenating the mean vectors from all components of a Gaussian Mixture Model (GMM) adapted to a specific speaker or utterance. This concatenation transforms variable-length speech into a fixed-length vector…
Synthesized Speech(also: Synthetic Speech, Speech Synthesis, TTS Output): Computer-generated speech produced by text-to-speech (TTS) engines that convert written text into spoken audio output. Synthesized speech is the primary means by which screen readers convey on-screen content to blind and visually impaired users. While modern TTS voices have…
Synthesized Video Description(also: TTS Video Description, Text-to-Speech Description, Synthesized Audio Description): An audio description for video content that is generated using text-to-speech (TTS) technology rather than recorded by a human narrator. A describer writes a text script describing the visual elements of a video, and speech synthesis software converts this text into spoken…
Synthetic Speech(also: Artificial Speech, Computer-generated Speech): Speech that is artificially produced by computer systems rather than recorded from human speakers. Synthetic speech is the output of text-to-speech systems and is fundamental to screen readers and voice assistants. Modern synthetic speech uses various generation methods…

20 results.

Category

Search results