Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

360 Video(also: 360-Degree Video, Spherical Video, Omnidirectional Video): Video content recorded or rendered to capture a full spherical or panoramic field of view, allowing viewers to look in any direction by turning their head (in a VR headset) or by dragging the view (on a screen). Unlike traditional video where a director controls the frame, 360…
AD Guidelines(also: Audio Description Guidelines, AD Standards): Established rules and best practices that govern the creation of audio descriptions for video and live performances. AD guidelines cover aspects such as what to describe (actions, characters, settings, on-screen text), language style (present tense, third person, objective),…
AD Personalization(also: Audio Description Customization, Personalized Audio Description): The practice of tailoring audio descriptions to individual user preferences rather than providing a one-size-fits-all narration. Personalization can include varying the level of detail (concise vs. comprehensive), focus (character-driven vs. environment-driven), interpretation…
AD Timing(also: Audio Description Timing, AD Placement): The process of determining when audio descriptions should be inserted into video content. Effective AD timing requires identifying natural pauses in dialogue and significant audio where descriptions can be placed without overlapping important sound. Automated AD timing systems…
Audio Description(also: AD, Descriptive Video, Video Description): A narration track added to video content that describes important visual information for people who are blind or have low vision. Audio descriptions are inserted during natural pauses in dialogue and other audio, conveying key visual elements such as actions, scene changes,…
Audio Description Authoring(also: AD Authoring, AD Creation, Description Writing): The process of writing and producing audio descriptions for video content, live performances, or other visual media. AD authoring involves watching content, identifying key visual elements, writing concise and objective descriptions, timing them to fit within available gaps, and…
Automatic Captions(also: Auto-Generated Captions, Auto Captions, ASR Captions): Captions produced by automatic speech recognition (ASR) systems without human transcription, typically generated by the hosting platform (e.g., YouTube, Zoom, Microsoft Teams) as an optional layer on uploaded or live video. Automatic captions have dramatically expanded caption…
CEA-708(also: CTA-708, EIA-708, Digital Closed Captioning): A US standard for digital closed captioning on digital television broadcasts and streaming, superseding the analog-era CEA-608 standard. CEA-708 supports richer presentation than its predecessor, including multiple fonts, colours, opacity, text positioning, and up to 63 caption…
Caption Quality(also: Subtitle Quality): The overall fitness of a set of captions or subtitles for their intended accessibility purpose. Quality is multi-dimensional: it includes text accuracy (whether spoken words are correctly transcribed, commonly measured by Word Error Rate or the NER model), synchronicity with the…
Chroma Key(also: Green Screen, Blue Screen, Chroma Keying): A video-post-production technique in which a solid, uniformly coloured background (often green or blue) is replaced with another image, video, or transparency using colour-matching software. In accessibility work, chroma key is most often encountered in the production of…
Closed Interpreting(also: Closed Sign Language Interpreting): A proposed accessibility feature for video content where a sign language interpreter video can be toggled on or off and displayed alongside the main video, analogous to closed captions for text. Unlike embedded "open" interpreters that are permanently part of the video, closed…
Crowdsourced Captioning(also: Crowd Captioning, Collaborative Captioning): Crowdsourced captioning is an approach to creating video captions or subtitles by distributing the work across multiple contributors rather than relying on a single professional captionist. This method can leverage diverse workers with varying language skills, hearing abilities,…
Description Variation(also: AD Variation, Alternative Description): Multiple versions of audio descriptions for the same video content, each reflecting different stylistic choices, levels of detail, or narrative focuses. Description variations recognize that BLV users have diverse preferences and that a single description cannot serve all needs…
Educational Video(also: Instructional Video, Video Lecture): Video content created to teach - including talking-head lectures, screencasts, animations, hand-drawn (Khan-style) explanations, recorded classroom sessions, programming/coding demonstrations, interviews, and slide-based presentations. Accessibility of educational video depends…
Embedded Description(also: Inline Description, Integrated Description): A technique for making presentation content accessible where the speaker verbally describes relevant visual information on slides — including text, images, graphics, and other visual aids — as part of their narration during the presentation itself. Unlike audio descriptions…
Extended Audio Description(also: Extended Description): A form of audio description in which the video playback is paused to allow time for a description that would not otherwise fit within natural gaps in the audio track. Extended audio descriptions are used when the density of dialogue or other important audio leaves insufficient…
Frame Rate(also: Frames Per Second, FPS, Frame Frequency): Frame rate is the number of still images (frames) displayed or captured per second in a video stream, usually measured in frames per second (fps). Common values include 24 fps (cinema), 30 fps (US broadcast), and 60 fps (high-motion content); video calling and streaming systems…
Freeze Frame(also: Video Thumbnail, ASL Freeze Frame): In the context of ASL video interfaces, a freeze frame is a static image captured at a recognizable moment of an ASL sign, used as a visual label or thumbnail for video content. Freeze frames allow Deaf users to quickly scan, identify, and select content without watching full…
Keyframe(also: Key Frame): A keyframe is a single representative frame selected from a video scene or shot that best captures the essential visual content of that segment. In automated audio description and video captioning systems, keyframe selection is a critical step — the chosen frame is analyzed by…
Lecture Capture(also: Lecture Recording, Classroom Recording): The process of recording classroom lectures, presentations, or educational sessions using video, audio, and screen capture technology for later review by students. Lecture capture systems range from simple single-camera recordings to multi-camera setups that capture the…
Live Description(also: Real-Time Description, Live Audio Description): The practice of providing descriptions of visual content in real time as events unfold, as opposed to scripted descriptions added during post-production of recorded media. Live description is used in contexts such as livestreaming, live theatre, sporting events, and…
Livestream Accessibility(also: Live Video Accessibility): The practice of making live video broadcasts accessible to people with disabilities, particularly viewers with visual or hearing impairments. Livestreams present unique accessibility challenges because they feature multiple simultaneous visual elements (main video, webcams,…
Minimum Viable Description(also: MVD): Minimum viable description (MVD) is an emerging framework for audio description that establishes the foundational level of visual information needed to provide equal access to video content without introducing bias or cognitive overload. Rather than attempting to describe…
Momentous Depiction: A conceptual framework proposed by Niu, Clements, and Kim (2026) for using generative AI to visualize critical moments that convey the insights and meanings of disability in storytelling videos. The framework identifies four core GenAI affordances that support or constrain…
Motion Design(also: Motion Graphics, Motion-Driven Design): The practice of animating graphic elements - text, icons, diagrams, captions - in time-based media to communicate instructional content. In accessible educational video, motion design is used to guide visual attention, sequence information, and pace the presentation of captions…
NER Model(also: Number, Edition, Recognition Model, NER Accuracy Model): A caption-quality evaluation model developed by Pablo Romero-Fresco and Juan Martínez Pérez for measuring the accuracy of live subtitling and respeaking. Unlike Word Error Rate, which penalises all errors equally, the NER model weights each error by how much it affects the…
Narrative Style(also: Descriptive Style, AD Voice): The distinctive approach a describer takes when writing audio descriptions, encompassing choices about language formality, emotional tone, level of interpretation, detail density, and pacing. Narrative style in audio description ranges from strictly objective and impersonal to…
Non-diegetic Sound(also: Non-diegetic Audio, Extradiegetic Sound): Sound in film, television, or games that does not originate from any source within the story world and cannot be heard by the characters - for example, orchestral score, voice-over narration, or added accessibility cues. This contrasts with diegetic sound, which exists in the…
On-Screen Text(also: OST, Visual Text, Burned-In Text): Text that appears visually within video content, including titles, subtitles, captions, signs, labels, credits, and any other written information displayed on screen. On-screen text must be read aloud or referenced in audio descriptions to ensure BLV users have access to this…
Open Captions(also: Burned-in Captions, Hard-coded Captions): Captions that are permanently embedded into a video and cannot be turned off by the viewer. Unlike closed captions, which can be toggled on or off, open captions are always visible as part of the video image itself. Open captions are sometimes used when a platform does not…
Participatory Captioning: A framework proposed by Nguyen et al. (2026) that characterises social media video captioning as a collaborative, community-sustained infrastructure co-produced by viewers, creators, and platforms — rather than a top-down accessibility feature delivered unilaterally.…
Picture-in-Picture(also: PiP, PIP): A display technique that shows a smaller video or content window overlaid on the main content, allowing viewers to see two sources simultaneously. In accessibility contexts, picture-in-picture is the primary method for presenting sign language interpretation in video and…
Scene Change Detection(also: Shot Boundary Detection, Scene Transition Detection): An automated technique for identifying transitions between different scenes or shots in video content by analyzing visual differences between consecutive frames. In audio description workflows, scene change detection helps determine optimal moments for inserting descriptions, as…
Scene Segmentation(also: Scene Detection, Shot Boundary Detection): Scene segmentation is the process of automatically dividing a video into discrete scenes or segments based on visual changes such as cuts, transitions, or the appearance of new elements in the frame. In the context of accessibility, scene segmentation is a foundational component…
Signer(also: Sign Language User, Signing Person): A person who communicates using sign language. In accessibility contexts, signers may be deaf, hard of hearing, or hearing individuals (such as interpreters, children of deaf adults, or others who have learned sign language). When creating accessible video content, signers…
Signer Box(also: Signing Space, Sign Space): The three-dimensional space in front of a sign language user within which signs are produced, typically extending from the waist to just above the head and about an arm's width to either side. The signer box is a critical concept in sign language video production, video…
Silent Gap Detection(also: Silence Detection, Audio Gap Detection): An automated technique for identifying periods of silence or absence of speech in audio tracks, used in audio description workflows to find natural insertion points for descriptions. Silent gap detection distinguishes between complete silence (no sound at all) and non-speech…
Social Media Video Captions(also: SMVC): An umbrella term for the textual or symbolic elements — platform-generated captions, creator-edited captions, user-generated captions, and non-speech information such as sound effects, music cues, or onomatopoeia — that are temporally aligned with video content on social media…
Sound Design(also: Audio Design): The craft of creating, selecting, and arranging audio elements - dialogue, music, ambient sound, foley, and effects - to shape the experience of a film, game, broadcast, or interactive product. For accessibility, sound design is doubly important: it carries narrative and…
Spatiotemporal Saliency(also: Spatiotemporal Saliency Estimation, Spatio-Temporal Saliency): A computer vision technique that estimates, for each pixel in a video, how visually important it is at a given moment by combining spatial contrast (features that stand out within a frame) with temporal contrast (regions that change or move differently from their recent…
Speech Gap(also: Dialogue Gap, Audio Gap): A pause or silence between spoken dialogue in a video or film where audio descriptions can be inserted without overlapping with the original soundtrack. Identifying speech gaps is a critical first step in audio description production, as descriptions must fit within these…
Subtitle(also: Subtitles, Open captions (video), Movie subtitles): On-screen text that reproduces the spoken dialogue of a video, most commonly rendered in a "movie subtitle" style (white text with a black outline, one or two lines at the bottom of the frame). Subtitles are closely related to captions but are conventionally distinguished in…
Subtitles(also: Captions, Closed Captions, CC): Text displayed on screen that represents the spoken dialogue and other relevant audio information in video content. Subtitles (called captions in North America) are essential for deaf and hard of hearing viewers but are also widely used by hearing audiences in noisy…
Synthesized Video Description(also: TTS Video Description, Text-to-Speech Description, Synthesized Audio Description): An audio description for video content that is generated using text-to-speech (TTS) technology rather than recorded by a human narrator. A describer writes a text script describing the visual elements of a video, and speech synthesis software converts this text into spoken…
Talking-Head Video(also: Talking Head): A common educational video format in which a presenter speaks directly to the camera, typically filling the frame, with no or few accompanying visuals. For d/Deaf and Hard-of-Hearing learners, talking-head videos are often low in useful visual content - the speaker's face must…
Text-to-Sound(also: Text-to-Audio, TTA, Sound Generation from Text): A class of generative AI models that synthesize non-speech audio - sound effects, ambient environments, foley, or short music clips - from a natural-language description such as 'a door creaking shut' or 'cloth ruffling as a coat is removed'. Distinct from text-to-speech, which…
Text-to-Video(also: T2V, Text-to-Video Generation): A class of generative AI models that produces short video clips from natural-language prompts (and sometimes reference images). Examples at the time of writing include Runway Gen, OpenAI Sora, Google Veo, and Pika. For accessibility, text-to-video raises both opportunities —…
Tracked Captions(also: Speaker-following captions, Dynamic captions): Captions that move dynamically within the video frame to stay near the current speaker's face or mouth, rather than remaining anchored at a fixed position (typically the bottom of the video). Tracked captions reduce the visual effort required for Deaf and Hard-of-Hearing viewers…
User-Generated Captions(also: UGC captions): Captions created and added to video content by non-professional contributors — typically the video's own creator or community members — rather than by professional captioners or fully automated systems. On social media, user-generated captions are often implemented as open…
Video Enrichment(also: Enriched Video, Video Augmentation): The process of augmenting video content with additional elements such as captions, audio descriptions, images, audio cues, hyperlinks, or tactile outputs to make it more accessible or informative for different audiences. Unlike traditional approaches where added elements are…

Category

Search results