Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

Affective Captions(also: Affective Captioning, Emotive Captions): Captions that convey not only the spoken words but also the emotional qualities of speech — such as valence (positive vs. negative tone) and arousal (intensity) — typically through typographic modulations like font-color, font-weight, or font-size, and increasingly through…
Algospeak(also: Algorithm-Friendly Language, Algo-Speak): The practice of using code words, creative spellings, or substitutions in online content to avoid algorithmic detection, censorship, or demonetization by social media platforms. Examples include spelling "lesbian" as "le$bian" or "le dollar bean" on TikTok. While algospeak…
Automated Speech Recognition(also: ASR, Speech-to-Text, Voice Recognition): Technology that converts spoken language into written text using machine learning and signal processing algorithms. In accessibility, ASR is used for real-time captioning, voice control of devices and software, and generating transcripts of audio and video content. While ASR…
Automatic Caption Evaluation(also: ACE, ACE Framework, ACE Metric): A caption-quality evaluation framework introduced by Sushant Kafle and Matt Huenerfauth (2017-2018) that scores automatically generated captions based on their usability for Deaf and Hard-of-Hearing readers, rather than simply counting transcription errors. For each mismatch…
Automatic Captions(also: Auto-Generated Captions, Auto Captions, ASR Captions): Captions produced by automatic speech recognition (ASR) systems without human transcription, typically generated by the hosting platform (e.g., YouTube, Zoom, Microsoft Teams) as an optional layer on uploaded or live video. Automatic captions have dramatically expanded caption…
Automatic Speech Recognition (ASR)(also: ASR, Speech-to-Text, Voice Recognition): Technology that converts spoken language into written text using computational algorithms and machine learning models. ASR powers auto-captioning features in video conferencing, media players, and assistive devices. While ASR has improved significantly, its accuracy is affected…
C-Print(also: C-Print Pro): A meaning-for-meaning real-time captioning service where a trained captioner produces a condensed transcription of spoken classroom content, as opposed to the verbatim word-for-word transcription provided by CART. C-Print captioners are trained in text-condensing strategies that…
CART(also: Communication Access Realtime Translation, Computer-Aided Real-Time Translation): A real-time captioning service in which a trained stenographer uses a specialized keyboard to transcribe spoken language into text as it is spoken, typically achieving accuracy rates above 98%. CART is considered the gold standard for real-time captioning accuracy but is…
CART(also: Communication Access Realtime Translation, Real-Time Captioning, Realtime Captioning): A professional service providing instant, verbatim text display of spoken content, typically delivered by trained stenographers using specialized equipment. CART achieves accuracy rates of 98% or higher, far exceeding automatic speech recognition systems. It is commonly used in…
CART(also: Communication Access Real-Time Translation, Real-Time Captioning, Stenography): A real-time captioning service where a trained stenographer uses a specialized keyboard to transcribe speech into text as it is spoken, typically with only a few seconds of delay. CART provides word-for-word transcription of spoken content for deaf and hard of hearing…
CEA-708(also: CTA-708, EIA-708, Digital Closed Captioning): A US standard for digital closed captioning on digital television broadcasts and streaming, superseding the analog-era CEA-608 standard. CEA-708 supports richer presentation than its predecessor, including multiple fonts, colours, opacity, text positioning, and up to 63 caption…
Caption Accuracy(also: Captioning Accuracy, Transcription Accuracy): A measure of how correctly captions represent the spoken content, typically expressed as the percentage of words that match the ground truth transcript. Caption accuracy is critical for deaf and hard of hearing users who depend on captions for comprehension, particularly in…
Caption Customization(also: Caption Personalization, Adaptive Captioning): The ability for viewers to adjust caption properties to match their individual preferences and needs. Caption customization can encompass visual attributes like font size, color, and positioning, as well as content-level attributes like level of detail, expressiveness, and sound…
Caption Delay(also: Caption Latency, Synchronization Delay): The time lag between spoken audio and the appearance of the corresponding caption on screen. In live captioning, typical delays are around 5–6 seconds due to the time needed for captioners to hear, process, and produce text plus transmission overhead. In fast-paced sports, such…
Caption Density: The amount of caption text displayed on screen relative to the available display time and screen space. High caption density—common in fast-paced scenes with many sound events—can overwhelm viewers by requiring rapid reading while also attending to visual content. Caption…
Caption Flow(also: Captioning Flow, Text Flow): The smoothness and regularity with which caption text appears and updates on screen during real-time captioning. Good caption flow means text arrives at a consistent pace without jarring delays, sudden bursts, or choppy delivery. Research shows that caption flow significantly…
Caption Occlusion(also: Caption Blocking, Subtitle Occlusion): The phenomenon where captions or subtitles visually block or cover other important information displayed on a video screen. Caption occlusion is a significant accessibility concern for Deaf and Hard of Hearing viewers, who depend on captions for dialogue access but may…
Caption Placement(also: Caption Positioning, Subtitle Placement): The decision of where captions or subtitles are positioned on a video screen, which significantly affects the viewing experience of Deaf and Hard of Hearing users. Poor caption placement can occlude important visual information such as speakers' faces, onscreen graphics, or news…
Caption Quality(also: Subtitle Quality): The overall fitness of a set of captions or subtitles for their intended accessibility purpose. Quality is multi-dimensional: it includes text accuracy (whether spoken words are correctly transcribed, commonly measured by Word Error Rate or the NER model), synchronicity with the…
Caption Readability: The ease with which viewers can read and process caption text on screen, influenced by factors including font size, display duration, caption density, reading speed requirements, and competition with on-screen visual content. Caption readability is a core accessibility concern…
Captioning Key(also: DCMP Captioning Key): A set of guidelines and best practices for creating high-quality captions, most notably published by the Described and Captioned Media Program (DCMP). The Captioning Key covers standards for caption accuracy, consistency, placement, and the representation of non-speech sounds.…
Closed Captioning(also: CC, Closed Captions): Text displayed on screen that represents dialogue, sound effects, music, and other audio information in video content, which viewers can toggle on or off. Unlike open captions, closed captions are a separate data stream that can be enabled or disabled by the viewer. Closed…
Closed Captioning(also: CC, Closed Captions): Text displayed on a screen that transcribes spoken dialogue, identifies speakers, and describes relevant sound effects in video content. Unlike open captions which are permanently embedded in the video, closed captions can be toggled on or off by the viewer. Closed captioning is…
Colour Commentary(also: Color Commentary, Colour commentator): In sports broadcasting, the analytical and contextual commentary provided alongside the play-by-play — offering opinions, background on players and teams, strategy discussion, and remarks during gameplay pauses. Colour commentary conveys information that is not visually present…
Communication Access Realtime Translation(also: CART, Realtime Captioning, Stenographic Captioning): A captioning service where a trained professional uses a stenographic keyboard to transcribe spoken language into text in real time, producing near-verbatim captions. CART provides the highest accuracy among live captioning methods and includes speaker identification, tone of…
Crowdsourced Captioning(also: Crowd Captioning, Collaborative Captioning): Crowdsourced captioning is an approach to creating video captions or subtitles by distributing the work across multiple contributors rather than relying on a single professional captionist. This method can leverage diverse workers with varying language skills, hearing abilities,…
Expressive Captions(also: Affective Captions, Emotion Captions, Typographic Captions): Captions that go beyond literal word-for-word transcription to convey the prosodic, emotional, or speaker-identity information that traditional captions strip out. Expressive captions may modulate font weight, size, colour, position, or animation to signal loudness, pitch,…
Extra-Speech Information(also: ESI, Paralinguistic Information): Aspects of spoken language beyond the words themselves that convey additional meaning, including how something is said rather than what is said. Examples include tone of voice (yelling, whispering), vocal emotion (sarcasm, anger, joy), singing, the language being spoken, speaker…
Game Captioning(also: Video Game Captions, Gaming Subtitles, In-Game Captions): The practice of displaying text representations of dialogue, sound effects, and other audio content within video games for deaf and hard-of-hearing players. Game captioning differs from film or television captioning because games are interactive rather than passive — players…
Gaze Switching(also: Visual Attention Switching, Split Attention): The act of shifting visual focus between two or more information sources, such as between captions and presentation slides in a classroom, or between a sign language interpreter and a speaker. Gaze switching is particularly costly for deaf and hard of hearing students who rely…
Genre Alignment: In captioning, the practice of adapting caption style, vocabulary, and tone to match the genre of the media content being captioned. For example, horror content may benefit from captions that emphasize tension and dread, while comedy content may use lighter, more playful…
Graphic Captions(also: Visual Captions, Animated Captions): A captioning approach that uses visual elements such as GIFs, animated stickers, icons, or emojis to represent sounds in audio-visual content, as an alternative or complement to traditional text-based bracket notation. Graphic captions can convey additional information about a…
Hybrid Captioning(also: AI-Augmented Captioning, Blended Captioning): A captioning approach that combines human-generated captions with AI-powered correction or enhancement to achieve higher accuracy than either method alone. Hybrid systems leverage the reliability and contextual awareness of trained human captioners while using automatic speech…
Keyword Reading Strategy(also: Content Word Strategy): The keyword reading strategy is a sentence-comprehension approach in which a reader focuses primarily on high-content words (nouns, verbs, adjectives, and adverbs) to derive the meaning of a sentence, while paying less attention to function words (determiners, prepositions, and…
Latency(also: Delay, Lag, Response Time): The time delay between when an event occurs and when its accessible representation is delivered to the user. In real-time captioning, latency is the gap between spoken words and their appearance as text, typically measured in seconds. In screen readers and other assistive…
Live Captioning(also: Real-Time Captioning, Live Captions): The process of converting spoken language into text displayed in real time, enabling Deaf and hard of hearing individuals to follow live audio content such as meetings, lectures, broadcasts, and events. Live captioning may be performed by human stenographers (CART providers),…
Live Captioning(also: Real-Time Captioning, CART): The process of creating captions in real time as audio content is being produced, rather than from a pre-existing script. Live captioning is used in television news broadcasts, live events, videoconferences, and classrooms. It presents unique challenges including a natural…
Logocentrism: In captioning studies, the systematic prioritization of speech and spoken language over non-speech sounds in captioning practices and technologies. Logocentrism in captioning manifests as speech captions receiving more attention, resources, and technical development than…
NER Model(also: Number, Edition, Recognition Model, NER Accuracy Model): A caption-quality evaluation model developed by Pablo Romero-Fresco and Juan Martínez Pérez for measuring the accuracy of live subtitling and respeaking. Unlike Word Error Rate, which penalises all errors equally, the NER model weights each error by how much it affects the…
Non-Speech Captions(also: Non-Speech Sound Captions, Non-Dialogue Captions): Textual descriptions of non-speech audio elements in media content, including environmental sounds, music, and sound effects, displayed as part of closed or open captions. Non-speech captions are essential for Deaf and Hard of Hearing viewers to access auditory information…
Non-Speech Information(also: NSI, Non-Dialogue Audio Information): Any audio content in media that is not spoken dialogue, including environmental sounds, music, sound effects, and ambient noise. Non-speech information plays a critical role in storytelling by conveying mood, indicating off-screen events, and providing contextual cues. For…
Non-Speech Sounds(also: Non-Speech Audio, Sound Effects): Auditory content in media that is not spoken dialogue, including music, environmental noises, sound effects, laughter, applause, and other ambient sounds. Non-speech sounds carry important narrative, emotional, and contextual information that contributes to a viewer's…
Onomatopoeia: Words that phonetically imitate or suggest the sound they describe, such as "buzz," "crash," "swoosh," or "sizzle." In captioning, onomatopoeia is one approach to representing non-speech sounds, offering viewers a sense of the acoustic quality of a sound. However, research shows…
Open Captioning(also: Open Captions, Burned-In Captions): Captions that are permanently embedded into the video image and cannot be turned off by the viewer. Unlike closed captions, open captions are part of the visual content itself, making them visible to all viewers regardless of device or platform support. Open captions are…
Open Captions(also: Burned-in Captions, Hard-coded Captions): Captions that are permanently embedded into a video and cannot be turned off by the viewer. Unlike closed captions, which can be toggled on or off, open captions are always visible as part of the video image itself. Open captions are sometimes used when a platform does not…
Paralinguistic Cues(also: Paralanguage, Paralinguistic Features, Non-verbal Vocal Cues): Aspects of spoken communication that carry meaning beyond the literal words themselves: tone of voice, pitch contour, loudness, rhythm, tempo, stress, pauses, and voice quality. Paralinguistic cues convey emotion, emphasis, sarcasm, uncertainty, speaker identity, and social…
Participatory Captioning: A framework proposed by Nguyen et al. (2026) that characterises social media video captioning as a collaborative, community-sustained infrastructure co-produced by viewers, creators, and platforms — rather than a top-down accessibility feature delivered unilaterally.…
Play-by-Play(also: Play-by-play announcing, Play-by-play commentary): In sports broadcasting, the moment-to-moment verbal description of on-screen action provided by the main commentator (e.g., who has the puck, who is passing to whom). Because play-by-play describes what sighted viewers can see, it largely duplicates visual information for Deaf…
Pop-on Captions(also: Pop-on style, Block captions): A captioning display style in which a complete caption appears on screen as a single block, remains visible for a readable duration, and is then replaced in one transition by the next block. Pop-on captions let viewers "glance and grab" an entire sentence at once, which viewer…
Rapid Serial Visual Presentation(also: RSVP): A text display method in which words or short phrases are shown one at a time in a fixed location on screen in quick succession, eliminating the need for eye movements (saccades) between words. RSVP was first proposed in the 1950s for reading research and adapted for practical…

Category

Search results