Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

3D Audio(also: Three-Dimensional Audio, Binaural Audio, Immersive Audio): Audio technology that creates the perception of sound sources positioned in three-dimensional space around the listener, including above, below, and at varying distances. 3D audio uses head-related transfer functions (HRTFs), interaural time differences, and distance-based…
Acoustic Event Detection(also: Sound Event Detection, Audio Event Detection, Sound Event Classification): The automated process of identifying and classifying specific sounds within an audio stream, such as recognizing a phone ringing, door knocking, fire alarm, or speech from continuous environmental audio. Acoustic event detection systems use machine learning trained on labeled…
Adaptive Multi-Rate Codec(also: AMR, AMR Codec, AMR-NB): A family of audio codecs used in mobile telephony to encode voice for transmission. AMR-NB (narrowband) operates at 300-3,400 Hz with bit rates from 4.75-12.2 kbps, while AMR-WB (wideband, also called HD Voice) extends to 50-7,000 Hz at 6.6-23.85 kbps. AMR-WB is adopted by 3GPP…
Ambient Audio(also: Ambient Sound, Environmental Audio, Background Audio): The background sound of an environment — voices, traffic, water, wind, music, birdsong — captured incidentally rather than as the main focus of a recording. In accessible photography and audiophotography tools, ambient audio is often recorded automatically in the seconds leading…
AsTeR(also: Audio System for Technical Readings): An interactive computing system developed by T. V. Raman in his 1994 PhD thesis at Cornell University that converts LaTeX documents into navigable audio documents. AsTeR parses electronic documents into a tree structure that listeners can interactively browse, enabling…
Audio Description Script(also: AD Script, Video Description Script, Described Video Script): An audio description script is the written text that forms the basis of an audio description track for video content. The script contains narration that describes visual elements — including actions, scene changes, character appearances, on-screen text, and other visual…
Audio Formatting(also: Audio Rendering): The process of converting structured electronic documents into audio output that conveys not just textual content but also the logical structure and formatting of the original document. Audio formatting uses synthesizer parameters such as pitch, stereo positioning, speaking…
Audio Game(also: Audiogame, Audio-Based Game, Accessible Game): A video game designed primarily or entirely around audio output rather than visual graphics, making it accessible to players who are blind or have visual impairments. Audio games use techniques such as 3D spatial audio, sound effects, text-to-speech, and musical cues to convey…
Audio Guide(also: Audio Tour, Audio Description Tour, Museum Audio Guide): A portable or installed audio system that provides spoken descriptions, narratives, or contextual information about exhibits in a museum, gallery, or cultural venue. Audio guides range from traditional handheld devices with numbered stops to smartphone apps with…
Audio-to-Haptics Translation(also: Audio-haptic translation, Audio-to-vibration conversion): A class of techniques that convert audio signals — either recordings of real-world interactions or AI-generated sounds — into vibrotactile patterns that can be rendered through actuators embedded in phones, tablets, wearables, or specialized haptic displays. Because the…
Auditory Icon(also: Audio Icon): A non-speech sound used in a user interface that represents an object, action, or event by mimicking its real-world sound — for example, the sound of crumpling paper to indicate deleting a file, or a camera shutter sound for taking a screenshot. Auditory icons rely on causal…
Bone Conduction Headphones(also: Bone Conducting Headphones, Bonephones): Audio devices that transmit sound through the bones of the skull directly to the inner ear, bypassing the outer and middle ear. Unlike traditional headphones, bone conduction headphones leave the ear canal open, allowing users to hear environmental sounds while receiving audio…
Bone-Conduction Headset(also: Bone-conduction headphones, Bone-conduction earphones): A headphone that delivers sound by vibrating the bones of the skull and jaw rather than projecting air through the ear canal, leaving the wearer's ears uncovered and able to hear ambient sound. Bone-conduction headsets are widely used in blind and low-vision navigation contexts…
Causal Listening: A mode of listening, identified by composer and theorist Pierre Schaeffer, in which the listener focuses on identifying the source or cause of a sound — for example, hearing crumpling paper and recognising it as something being discarded, or hearing a camera shutter and…
Critical Listening(also: Analytical Listening, Active Listening): Critical listening is the skill of analytically evaluating audio content to identify specific qualities such as tonal balance, clarity, spatial positioning, dynamic range, and technical flaws like distortion or noise. In audio production, critical listening is a core…
Cymatics: The study of visible patterns and shapes created when sound vibrations pass through physical media such as water, sand, or metal plates. Cymatic patterns are deterministic — the same frequency produces the same pattern — creating a predictable visual representation of sound. In…
Diegetic Sound(also: In-World Sound, Source Sound): Sound that originates from a source within the narrative world of a game, film, or virtual reality environment — meaning the characters or inhabitants of that world could theoretically hear it. Examples include a phone ringing, a dog barking, footsteps, a crackling fire, or a…
Diegetic Sound(also: In-Game Sound, In-World Sound): Sound that originates from within the world of a game, film, or virtual environment—sounds that characters within that world could theoretically hear. In gaming and VR, diegetic sounds include environmental audio (footsteps, ambient noise, machinery), character dialogue, and…
Equalization(also: EQ, Audio Equalization, Adaptive Equalization): The process of adjusting the balance of frequency components in an audio signal by boosting or attenuating specific frequency bands. In accessibility contexts, adaptive equalization can be used to compensate for background noise by selectively boosting frequencies that are being…
Head-related transfer function(also: HRTF): A response function that describes how sound from a specific point in space is filtered by the shape of the outer ear, head, and torso before reaching the eardrum. HRTFs are unique to each individual and are used in spatial audio rendering to create realistic 3D sound over…
Mean Opinion Score(also: MOS, MOS Score): A standardized measure of perceived audio or video quality, rated on a scale from 1 (bad) to 5 (excellent). In telecommunications research, MOS is commonly used to assess speech quality as experienced by listeners. Participants rate samples, and scores are averaged to produce…
Mood(also: Affect, Affective State): In affective computing and music research, the emotional quality a stimulus evokes in a listener or viewer, commonly characterized along dimensions such as valence (pleasant–unpleasant) and arousal (calm–energetic). Mood is a core target for music information retrieval systems…
Movement Sonification(also: Motion Sonification): The practice of mapping qualities of physical movement - such as speed, direction, duration, or weight - to non-verbal sound cues so that movement can be perceived auditorily. In accessibility contexts, movement sonification can convey information about body motion to blind and…
Narrowband Audio(also: NB Audio, Standard Definition Voice): Audio transmission limited to the frequency range of approximately 300-3,400 Hz, which has historically been the standard for telephone networks (PSTN). While sufficient for basic speech intelligibility, narrowband audio excludes higher frequency consonant sounds that aid speech…
Non-diegetic Sound(also: Non-diegetic Audio, Extradiegetic Sound): Sound in film, television, or games that does not originate from any source within the story world and cannot be heard by the characters - for example, orchestral score, voice-over narration, or added accessibility cues. This contrasts with diegetic sound, which exists in the…
Object-Based Audio(also: OBA, Object-Based Broadcasting): An audio production and delivery paradigm in which speech, music, effects, and ambience are transmitted as discrete objects with metadata describing their role and relationships, rather than as a single mixed stream. The receiver renders the final mix, enabling per-listener…
Open Sound Control(also: OSC): An open, network-based protocol for communication between computers, sound synthesizers, and other multimedia devices, developed by Matthew Wright and Adrian Freed (1997). OSC sends human-readable address patterns and floating-point values over UDP/TCP, offering higher…
Pitch: The perceived highness or lowness of a sound, determined primarily by its fundamental frequency (measured in Hertz). Pitch is one of the primary dimensions along which music and speech are organized, underpinning melody, harmony, and prosody. In accessibility work, pitch is…
Podcast(also: Podcasting): An episodic, on-demand audio programme distributed over the internet, typically via RSS or proprietary platforms such as Spotify, Apple Podcasts, and BBC Sounds. Podcasts are a dominant form of long-form audio media — 92% of UK adults listen to some audio content weekly — but…
Psychoacoustics: The branch of perceptual psychology that studies how humans subjectively perceive sound - loudness, pitch, timbre, spatial location, foreground/background segregation, and masking. Psychoacoustic principles underpin accessible audio design: screen reader pacing, earcon and…
Rhythm-Action Game(also: Rhythm Game, Music Rhythm Game, Beat-Matching Game): A genre of video game in which players must make timed inputs (button presses, key strokes, or physical movements) synchronised with musical beats or rhythmic patterns. Popular examples include Dance Dance Revolution, Guitar Hero, and PaRappa the Rapper. Rhythm-action games are…
Semantic Listening: A mode of listening, identified by composer and theorist Pierre Schaeffer, in which the listener focuses on decoding a coded audio signal to arrive at its intended message — for example, understanding a musical motif as representing a particular region or culture. Semantic…
Shepard Tone(also: Shepard Scale, Shepard-Risset Glissando): A psychoacoustic auditory illusion created by layering sine waves separated by octaves, producing the paradoxical perception of a tone that continuously rises (or falls) in pitch indefinitely, yet cycles back without apparent discontinuity. Named after cognitive scientist Roger…
Signal-to-Noise Ratio(also: SNR, S/N Ratio): A measure of the strength of a desired signal relative to background noise, expressed in decibels (dB). In accessibility, signal-to-noise ratio is critical for the effectiveness of auditory interfaces: if background noise is too high relative to device audio output, speech…
Sound Design(also: Audio Design): The craft of creating, selecting, and arranging audio elements - dialogue, music, ambient sound, foley, and effects - to shape the experience of a film, game, broadcast, or interactive product. For accessibility, sound design is doubly important: it carries narrative and…
Sound Visualization(also: Audio Visualization, Sound-to-Visual Mapping): The practice of representing audio information through visual means, enabling Deaf or Hard-of-hearing individuals to perceive sound-based information that would otherwise be inaccessible. Sound visualization goes beyond simple captioning to convey characteristics like loudness…
Sound localization(also: Auditory localization, Spatial hearing): The ability to identify the direction and distance of a sound source, relying on cues such as interaural time differences, intensity differences, and spectral filtering by the outer ear. Sound localization is critical for spatial awareness, safety, and immersive experiences in…
Spatial Audio(also: 3D Audio, Spatialised Sound, Binaural Audio): Audio technology that creates the perception of sound coming from specific locations in three-dimensional space around the listener, using techniques such as head-related transfer functions (HRTFs), binaural rendering, and ambisonics. In accessibility, spatial audio can convey…
Spatial audio beacon(also: Audio beacon, 3D audio waypoint): A virtual sound source placed at a specific geographic location that a user can hear through headphones, providing directional guidance by leveraging spatial audio to indicate the direction and distance of a destination. As the user turns toward the beacon, the sound appears to…
Spatialised Audio(also: Spatial Audio, 3D Audio, Directional Audio): Audio technology that places sounds in specific locations in three-dimensional space relative to the listener, creating the perception that sounds come from particular directions or distances. In accessibility applications for blind and low-vision users, spatialised audio can…
Spatialization(also: Spatialisation, Audio Spatialization, 3D Audio Spatialization): The process of rendering a sound so that it appears to originate from a specific location in three-dimensional space around the listener. Spatialization typically combines head-related transfer functions (HRTFs) to model how ears filter sound by direction, binaural or ambisonic…
Speaker Diarisation(also: Speaker Diarization, Speaker Segmentation): The automatic process of segmenting an audio recording by speaker identity — answering "who spoke when" — and labelling each segment. A critical pre-requisite for accessible transcripts of multi-voice audio such as interviews, podcasts, and meetings, since a flat transcript…
Spearcon: A spearcon is a type of auditory icon created by compressing a spoken phrase until it becomes a very brief, distinctive audio cue. Unlike earcons, which use abstract musical sounds, spearcons retain a connection to the original speech, making them easier to learn and associate…
Speech Gap(also: Dialogue Gap, Audio Gap): A pause or silence between spoken dialogue in a video or film where audio descriptions can be inserted without overlapping with the original soundtrack. Identifying speech gaps is a critical first step in audio description production, as descriptions must fit within these…
Structured Audio(also: Structured Digital Audio): Structured audio refers to digital audio content that has been encoded with hierarchical markers and metadata, allowing non-sequential access to specific segments such as chapters, sections, paragraphs, and phrases. Unlike linear audio recordings (such as traditional audio…
Synthesized Video Description(also: TTS Video Description, Text-to-Speech Description, Synthesized Audio Description): An audio description for video content that is generated using text-to-speech (TTS) technology rather than recorded by a human narrator. A describer writes a text script describing the visual elements of a video, and speech synthesis software converts this text into spoken…
Tempo(also: BPM, Beats Per Minute): The speed or pace of a musical piece, typically measured in beats per minute (BPM). Tempo is one of the primary features that shapes emotional perception of music — fast tempos (130+ BPM) are associated with excitement and urgency, slow tempos (60–80 BPM) with calm or solemnity.…
Text-to-Audio(also: Text-to-Audio Generation, TTA): A class of generative AI models that synthesise non-speech sound (environmental sounds, sound effects, music stems) from a text prompt - for example producing the sound of 'leaves rustling in wind' or 'church bells ringing'. Distinct from text-to-speech, which produces spoken…
Text-to-Sound(also: Text-to-Audio, TTA, Sound Generation from Text): A class of generative AI models that synthesize non-speech audio - sound effects, ambient environments, foley, or short music clips - from a natural-language description such as 'a door creaking shut' or 'cloth ruffling as a coat is removed'. Distinct from text-to-speech, which…
Text-to-Speech(also: TTS, Speech Synthesis): Technology that converts written text into spoken audio output. Text-to-speech is a fundamental component of many assistive technologies, including screen readers, audio description tools, and communication devices for people with speech disabilities. Modern TTS systems use…

Category

Search results