Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

3D Reconstruction(also: Scene Reconstruction, 3D Scene Reconstruction): The computer vision task of recovering the 3D structure of a scene - geometry, camera positions, and sometimes object trajectories - from one or more 2D images or video frames. Techniques range from classic structure-from-motion and multi-view stereo to modern learning-based…
AI Proxy(also: AI Proxying): A design pattern in which an AI system acts on a user's behalf within a social, communicative, or interpretive setting — for example, generating a facial expression, voice, or written reply that represents the user to others — rather than merely assisting the user with a…
Automatic Captions(also: Auto-Generated Captions, Auto Captions, ASR Captions): Captions produced by automatic speech recognition (ASR) systems without human transcription, typically generated by the hosting platform (e.g., YouTube, Zoom, Microsoft Teams) as an optional layer on uploaded or live video. Automatic captions have dramatically expanded caption…
Calibrated Trust(also: Appropriate Reliance, Trust Calibration): An HCI and human-factors concept, articulated by Lee and See, describing the alignment between a user's trust in an automated or AI system and the system's actual capability in a given context: trusting the system when it is reliable and being skeptical when it is not. Designing…
Chart Question Answering(also: Chart QA, ChartQA, Visual Question Answering for Charts): The task of answering natural-language questions about a data visualization, typically a chart provided as an image or structured specification. A chart question answering system must identify the chart type, extract the underlying data, interpret axes and legends, and answer…
Clarifying Question(also: Clarifying Questions, Counter-Question): A clarifying question is a follow-up query posed by a system or interlocutor to resolve ambiguity, fill missing context, or confirm intent before acting on a user's request. In conversational interfaces, clarifying questions are a core mechanism of mixed-initiative interaction:…
Confidence Indicator(also: Confidence Score, Uncertainty Indicator): An interface element that communicates how certain an AI or automated system is about a given output, helping users decide how much to trust the result. In accessibility tools for blind and low-vision users, confidence indicators are especially important because users cannot…
Depth Estimation(also: Monocular Depth Estimation, Depth Prediction): The computer vision task of predicting the distance from the camera to each point in a scene, producing a depth map in which each pixel carries a distance value. Monocular depth estimation uses a single RGB image (no stereo cameras or LiDAR) and typically relies on deep learning…
Diffusion Model(also: Diffusion-based Generator, Denoising Diffusion Model): A diffusion model is a class of generative AI that learns to produce images or videos by iteratively denoising a random noise input, reversing a forward process that gradually adds noise to training data. In accessibility work, diffusion models are used to synthesize sign…
Dimensionality Reduction(also: Dimension Reduction, UMAP, t-SNE): Dimensionality reduction is a class of machine learning techniques that transform high-dimensional data — such as the vector embeddings produced by neural networks — into lower-dimensional representations (typically 2D or 3D) that can be visualised and explored by humans. Common…
Disability-First Dataset(also: Disability-first AI dataset): An approach to AI dataset creation, articulated by Theodorou et al. and others, that treats serving a disability community as the primary objective rather than collecting disability data as a minority slice of a general-purpose dataset. Examples include VizWiz (blind…
End-User Programming(also: EUP, End-User Development, EUD): A design approach that enables people without formal programming training to create, modify, or combine software behaviors to suit their own needs. Typical end-user programming systems expose computational building blocks through accessible interfaces such as visual block…
Feature Extraction(also: Feature Engineering, Representation Learning): Feature extraction is the process of identifying and isolating measurable properties or characteristics (features) from raw data such as images, audio, or text, for use in machine learning tasks. In image processing, features may include edges, textures, colours, shapes, or…
Foundation Model(also: Large Pretrained Model, General-Purpose AI Model, GPAI): A foundation model is a large AI model trained on broad, general-purpose data — typically at massive scale using self-supervised or unsupervised learning — that can be adapted (fine-tuned) for a wide range of downstream tasks. Examples include CLIP, DinoV2, GPT-4, and BLIP.…
Human-AI Co-Creation(also: Human-AI Co-Creative, Co-Creative AI, Mixed-Initiative Co-Creation): Human-AI co-creation refers to creative work in which a person and an AI system iteratively contribute to the same artifact, with each shaping the other's next move rather than the AI acting as a one-shot tool. In accessibility contexts, co-creative systems are used to scaffold…
Human-Centered AI(also: HCAI, Human-Centered Artificial Intelligence, HCXAI): Human-Centered AI (HCAI) is a design and research orientation that places human experience, context, agency, and values at the center of how AI systems are built and evaluated, rather than optimizing only for model performance. In accessibility contexts, HCAI emphasizes that AI…
Immersive Analytics(also: 3D Data Visualisation, Spatial Analytics, Immersive Visualisation): Immersive analytics is the application of interactive 3D, virtual reality (VR), or augmented reality (AR) technologies to support data exploration, analysis, and decision-making. By leveraging spatial context, immersive analytics aims to overcome the limitations of flat 2D…
Implicit Interaction(also: Implicit Input, Implicit Human-Computer Interaction): Implicit interaction refers to user input that the system infers from natural behaviors not explicitly performed for the purpose of issuing commands, such as gaze, gait, posture, physiological signals, or ambient context. It contrasts with explicit interaction, where users…
Large Vision Model(also: LVM): A large vision model is a foundation model trained on very large image (and often video) datasets to produce general-purpose visual representations - capable of object detection, segmentation, captioning, or feature extraction without task-specific retraining. Examples include…
Layer-wise Relevance Propagation(also: LRP): Layer-wise Relevance Propagation (LRP) is an explainable AI technique that attributes a neural network's prediction back to its input features by propagating relevance scores layer by layer from the output toward the input. Unlike gradient-based saliency methods, LRP…
Local-First Software(also: Local-First): A software design philosophy, articulated by Kleppmann and colleagues in 2019, in which applications keep the user's primary data on local devices and treat cloud services as optional synchronization or backup layers rather than as the source of truth. Local-first systems aim to…
Mixture of Experts(also: MoE): Mixture of experts is a neural network architecture that routes each input through a small subset of specialist subnetworks ('experts') rather than activating the whole model. A gating network decides which experts handle a given token or query, letting the overall model be much…
Open-Vocabulary Detection(also: Open-Vocabulary Object Detection, OVD): A class of computer vision object detection models that accept arbitrary text queries at inference time rather than being restricted to a fixed set of pre-trained classes. Instead of only recognizing, for example, the 80 COCO categories, an open-vocabulary detector (such as…
Parameter-Efficient Fine-Tuning(also: PEFT, Lightweight Fine-Tuning): Parameter-efficient fine-tuning is a family of techniques (LoRA, adapters, prefix tuning, prompt tuning) that adapt a large pretrained model to a new task or domain by updating only a small fraction of its parameters - typically under 1% - while freezing the rest. This…
Reassurance Robot: A term coined by Grace Barkhuff (CHI 2026) to describe generative AI systems — such as ChatGPT — that, by default, provide reassurance, confession-hearing, and decision-making on demand, thereby accommodating the compulsions of people with Obsessive-Compulsive Disorder (OCD).…
Semantic Hearing(also: Programmable Hearing, Intent-Aware Hearing): A research paradigm and class of systems that treat the user's auditory environment as something programmable: rather than uniformly amplifying or suppressing all sound, the wearable headphone or earbud uses on-device machine learning to selectively extract or attenuate specific…
Small Language Model(also: SLM): A language model, typically ranging from tens of millions to a few billion parameters, designed to run on consumer or edge devices rather than in centralized cloud data centers. Small language models sacrifice some of the broad general knowledge of frontier large language models…
Symbiotic Learning: Symbiotic Learning is a conceptual framing introduced by Jiang et al. (CHI 2026) describing a mode of mixed-ability family learning in which parents and children mutually enable each other's participation and development through AI-mediated communication. Rather than positioning…
Target Sound Extraction(also: Target Sound Separation, TSE): A machine-learning task in which a model isolates a specific target sound (or class of sounds) from a complex acoustic mixture, conditioned on some specification of the target - a text label, a reference recording, or an embedding. Distinct from blind source separation (which…
Text-to-Sound(also: Text-to-Audio, TTA, Sound Generation from Text): A class of generative AI models that synthesize non-speech audio - sound effects, ambient environments, foley, or short music clips - from a natural-language description such as 'a door creaking shut' or 'cloth ruffling as a coat is removed'. Distinct from text-to-speech, which…
Trust in Automation(also: Automation trust, TiA): A human factors construct describing the extent to which a person believes an automated system — a car, aircraft, medical device, AI assistant, or robot — will perform reliably and behave in their interest, typically measured via validated questionnaires such as the Trust in…
VQA(also: Visual Question Answering): VQA (Visual Question Answering) is an AI task in which a system answers natural-language questions about the content of an image. In assistive contexts, VQA systems such as Be My AI, Seeing AI, and Aira let blind and low-vision users ask about their visual surroundings - from…
Vision-and-Language Navigation(also: VLN): Vision-and-language navigation is a task setup in which an agent follows natural-language instructions to move through a visual environment, grounding words like 'turn left at the blue sofa' onto what it sees in real time. Research in VLN has moved from small indoor simulators…
Visualization Question Answering(also: Chart QA, Visualization QA, VisQA): A class of interactive systems that let users ask natural-language questions about a data visualization — a chart, graph, or map — and receive direct textual or spoken answers rather than having to interpret the visualization themselves. Visualization QA systems typically…
Wav2Vec(also: Wav2Vec2, Wav2Vec 2.0): A family of self-supervised speech representation models from Meta AI that learn rich acoustic embeddings directly from raw waveform audio without requiring transcribed training data. Wav2Vec 2.0, introduced in 2020, became a backbone for low-resource automatic speech…

35 results.

Category

Search results