Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

ACCMD(also: ACCessibility MetaData, IMS ACCessibility MetaData): A metadata specification developed by the IMS Global Learning Consortium for describing the accessibility characteristics of learning resources. ACCMD provides a structured way to document whether a resource contains auditory, visual, textual, or tactile information, and to…
Affective Captions(also: Affective Captioning, Emotive Captions): Captions that convey not only the spoken words but also the emotional qualities of speech — such as valence (positive vs. negative tone) and arousal (intensity) — typically through typographic modulations like font-color, font-weight, or font-size, and increasingly through…
Ambient Audio(also: Ambient Sound, Environmental Audio, Background Audio): The background sound of an environment — voices, traffic, water, wind, music, birdsong — captured incidentally rather than as the main focus of a recording. In accessible photography and audiophotography tools, ambient audio is often recorded automatically in the seconds leading…
Audio Description(also: AD, Descriptive Video, Video Description): A narration track added to video content that describes important visual information for people who are blind or have low vision. Audio descriptions are inserted during natural pauses in dialogue and other audio, conveying key visual elements such as actions, scene changes,…
Audio Description Script(also: AD Script, Video Description Script, Described Video Script): An audio description script is the written text that forms the basis of an audio description track for video content. The script contains narration that describes visual elements — including actions, scene changes, character appearances, on-screen text, and other visual…
Audio Interference(also: Audio Conflict, Speech Conflict): Audio interference in a digital accessibility context is the overlap of two or more sound streams in a user's environment such that one masks another — most commonly, auto-playing media audio on a webpage drowning out a screen reader's synthesized speech. Because most consumer…
Audio-Video Synchrony(also: AV Sync, Lip Sync, Audio-Video Synchronization): The temporal alignment between audio and video streams in multimedia content or real-time communication. When audio and video are not properly synchronized, the mismatch can significantly impair speech understanding for people with hearing loss who rely on lipreading to…
Audiophotography(also: Audiophotograph, Audio Photograph, Sound Photograph): A medium proposed by Frohlich and Tallyn in which a photograph is packaged together with an associated audio recording — typically ambient sound captured at the moment of the shutter, a spoken caption added afterwards, or both. For accessibility practice the audiophotograph is a…
Caption Highlighting(also: Text Highlighting in Captions, Keyword Highlighting in Captions): The visual emphasis of important words within video captions to help viewers quickly identify key concepts and reduce the cognitive load of reading dynamic text. Research with Deaf and Hard of Hearing users has found that underlining 5-15% of the most important words in captions…
Captioning(also: Captions, Subtitles for the Deaf and Hard of Hearing, SDH): The process of displaying synchronized text on screen that represents spoken dialogue, sound effects, and other audio information in video content. Unlike subtitles, captions are specifically designed for deaf and hard of hearing viewers and include non-speech sounds like [door…
Connected TV(also: Smart TV, Internet TV, CTV): A television set or set-top box that can connect to the Internet, providing access to interactive features beyond traditional broadcast content including streaming applications, electronic programme guides, web browsing, and app stores. Connected TVs present significant…
Extended Audio Description(also: Extended Description): A form of audio description in which the video playback is paused to allow time for a description that would not otherwise fit within natural gaps in the audio track. Extended audio descriptions are used when the density of dialogue or other important audio leaves insufficient…
Facial Avatar(also: Singing head, Talking head avatar): A digital, animated representation of a face — typically rendered as a 3D or stylized 2D character from the neck up — driven by audio, video, or data signals to produce expressive facial behavior such as lip-sync, emotional expression, gaze, and head motion. In accessibility…
Frame Rate(also: Frames Per Second, FPS, Frame Frequency): Frame rate is the number of still images (frames) displayed or captured per second in a video stream, usually measured in frames per second (fps). Common values include 24 fps (cinema), 30 fps (US broadcast), and 60 fps (high-motion content); video calling and streaming systems…
HTML5 Track Element(also: <track> Element, Track Tag, HTML Track): The HTML5 <track> element is used to specify timed text tracks for <video> and <audio> elements, providing a standardized way to associate captions, subtitles, descriptions, chapters, and metadata with media content. Each <track> element specifies a kind (captions, subtitles,…
Immersive Video(also: Immersive Media, VR Video): Video content viewed through head-mounted displays or surrounding screens that creates a sense of being present within the recorded environment. Immersive video includes 360-degree video captured with omnidirectional cameras and computer-generated virtual reality content. In…
Media Fragments(also: Media Fragments URI, Media Fragment Identifier): Media Fragments is a W3C specification that defines a standard syntax for addressing specific portions of audio and video resources on the web using URI fragment identifiers. It allows users and applications to reference temporal segments (e.g., a specific time range within a…
Music Visualization(also: Music visualisers, Visual music): The representation of musical content — pitch, rhythm, timbre, dynamics, melody, lyrics, or emotion — through visual rather than auditory channels. Visualizations range from abstract mappings of audio features (spectrograms, particle systems, pulsing geometry, lyric typography)…
Narrative Engagement(also: Story Engagement): A multidimensional construct used in media studies and HCI research to capture how deeply a viewer is drawn into a story, including narrative understanding, attentional focus, narrative presence (the feeling of being inside the story world), and emotional engagement with…
Photo Sharing(also: Photograph Sharing, Image Sharing): The activity of showing, distributing, or discussing photographs with others — in person, via email, or through social-networking platforms. As a social practice it conveys memories, experiences, and identity; as an accessibility concern it presents barriers for blind and…
Quality of Perception(also: QoP): An evaluation framework from the multimedia-accessibility research literature for measuring how well a user can understand and use a media presentation, combining objective comprehension metrics (e.g., fact-recall or multiple-choice quiz accuracy) with subjective judgements…
Redundancy Principle: A principle from the Cognitive Theory of Multimedia Learning stating that people learn better from graphics and narration than from graphics, narration, and on-screen text presenting the same words, because presenting identical information in both spoken and written form…
SMIL(also: Synchronized Multimedia Integration Language): A W3C XML-based markup language for describing multimedia presentations that combine audio, video, text, images, and other media with precise temporal and spatial synchronization. SMIL is significant for accessibility because it includes a MediaAccessibility module that defines…
SRT(also: SubRip, SubRip Text, SRT Subtitle Format): SRT (SubRip Text) is a widely used plain-text subtitle file format originally created by the SubRip software for extracting subtitles from DVDs. An SRT file contains sequentially numbered subtitle entries, each with a time range (start and end timestamps in…
Split Attention(also: Split-Attention Effect, Divided Attention): A cognitive phenomenon in multimedia learning where users must divide their visual attention between multiple information sources presented simultaneously. In accessibility contexts, this is particularly challenging for Deaf and Hard of Hearing viewers of captioned videos, who…
Split Attention Effect(also: Split Attention): A cognitive load phenomenon in multimedia learning where learners must divide visual attention between two or more sources of information that should be integrated - for example captions at the bottom of the screen and a diagram in the centre. The cost of switching and mentally…
Streaming Media(also: Streaming Audio, Streaming Video, Media Streaming): Streaming media is audio or video content delivered to a user in a continuous flow from a server, played back as it arrives rather than waiting for a complete download. Because streaming content produces transient sound and images, and often begins auto-playing as soon as a page…
Subtitle(also: Subtitles, Open captions (video), Movie subtitles): On-screen text that reproduces the spoken dialogue of a video, most commonly rendered in a "movie subtitle" style (white text with a black outline, one or two lines at the bottom of the frame). Subtitles are closely related to captions but are conventionally distinguished in…
Tactile Captions(also: Haptic Captions, Vibrotactile Captions): An enhanced captioning approach that supplements traditional text-based captions with vibrotactile feedback, allowing deaf and hard of hearing viewers to feel non-speech sounds (such as phone rings, doorbells, footsteps, or objects falling) through a wrist-worn or body-worn…
Teletext(also: Ceefax, Oracle): A text-based information service broadcast within the television signal that allowed viewers to access pages of text and simple graphics using their TV remote control. Originating in the UK with the BBC's Ceefax service in 1974, teletext provided news, weather, sports results,…
Timed Text(also: Timed Text Markup Language, TTML, DFXP): Timed Text Markup Language (TTML) is a W3C standard for representing timed text content such as captions, subtitles, and other text synchronized with audio or video media. Originally developed as the Distribution Format Exchange Profile (DFXP), TTML provides an XML-based format…
Tiresias(also: Tiresias Screenfont, Tiresias Font Family): A family of typefaces developed in 1998 by the Royal National Institute of Blind People (RNIB) specifically designed for legibility on screen displays, particularly television subtitling. Named after the blind prophet of Greek mythology, Tiresias became one of the most widely…
Transcript(also: Text Transcript, Video Transcript, Audio Transcript): A written document containing the complete text of spoken content from a video or audio recording, presented separately from the media rather than synchronized with it. Unlike captions, which appear on-screen in real time as speech occurs, transcripts provide all text at once,…
Transcripts(also: Transcript, Text Transcript): A written, text-based representation of spoken audio or audiovisual content. WCAG 2.1 success criterion 1.2.1 (Audio-only and Video-only Prerecorded) requires an alternative for time-based media — typically a transcript — for pre-recorded audio-only content such as podcasts,…
Video Annotation(also: Video Metadata Annotation, Multimedia Annotation): Video annotation is the process of adding supplementary information — such as text descriptions, captions, audio descriptions, or semantic labels — to specific segments or elements of a video. In accessibility contexts, video annotations provide the additional layers of…
Video Enrichment(also: Enriched Video, Video Augmentation): The process of augmenting video content with additional elements such as captions, audio descriptions, images, audio cues, hyperlinks, or tactile outputs to make it more accessible or informative for different audiences. Unlike traditional approaches where added elements are…
Video-Based Learning(also: VBL, video-based instruction, VBI): The use of pre-recorded or streaming video as a primary medium for teaching skills, procedures, or concepts, ranging from YouTube how-to tutorials and MOOCs to specialised instructional content like safety training and vocational education. Video-based learning offers self-paced…
Video-to-Haptics(also: Video to Haptics, V2H): A class of techniques that automatically generate haptic feedback (typically vibrotactile or force cues) from visual content in video, so that viewers feel sensations synchronised with what they see. Video-to-haptics offers a non-visual channel for conveying motion, impact, and…
WebVTT(also: Web Video Text Tracks, Web Video Text Tracks Format): WebVTT (Web Video Text Tracks) is the W3C standard text format for providing timed text tracks — including captions, subtitles, descriptions, chapters, and metadata — synchronized with HTML5 <video> and <audio> elements. WebVTT evolved from the earlier SRT subtitle format,…

39 results.

Category

Search results