Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

Few-Shot Learning(also: N-Shot Learning, Low-Shot Learning): Few-shot learning is a machine learning approach that enables AI models to learn new concepts from only a small number of examples — typically 1 to 10 — rather than the hundreds or thousands traditionally required. This is achieved through techniques like meta-learning, where…
Few-Shot Object Recognition(also: Few-Shot Recognition): A machine learning approach in which a model learns to identify a novel object from only a handful of labelled examples (commonly one to ten) rather than the hundreds or thousands typical of conventional supervised training. Few-shot object recognition underpins teachable and…
Fine-tuning(also: Model Fine-tuning, Fine-tune, Supervised Fine-tuning): A machine-learning technique that adapts a pre-trained foundation model - typically a large language model or vision model - to a specific task, domain, or individual user by continuing training on a smaller, targeted dataset. Fine-tuning preserves the broad capabilities of the…
Foundation Model(also: Large Pretrained Model, General-Purpose AI Model, GPAI): A foundation model is a large AI model trained on broad, general-purpose data — typically at massive scale using self-supervised or unsupervised learning — that can be adapted (fine-tuned) for a wide range of downstream tasks. Examples include CLIP, DinoV2, GPT-4, and BLIP.…
Frame differencing(also: Temporal differencing, Background subtraction): A computer vision technique that detects motion or changes in video by comparing consecutive frames pixel by pixel. In accessibility applications, frame differencing can identify instructor actions in presentation videos, detect gestures in sign language recognition, or track…
Gaussian Mixture Model(also: GMM): A Gaussian Mixture Model (GMM) is a probabilistic model that represents data as a weighted combination of multiple Gaussian (normal) distributions. Each component Gaussian has its own mean and covariance, allowing GMMs to model complex, multimodal distributions. In speech…
Gesture Recognition(also: Gesture Detection): The computational process of identifying and interpreting human gestures—typically hand, arm, or body movements—using sensors and machine learning algorithms. Gesture recognition systems analyze data from cameras, accelerometers, gyroscopes, or other sensors to classify…
Grad-CAM(also: Gradient-weighted Class Activation Mapping): A widely used explainable AI technique, introduced by Selvaraju et al. in 2017, that produces a class-discriminative heat map over an input image by weighting convolutional feature maps by the gradient of the target class score. Grad-CAM and its variants (SmoothGrad-CAM,…
Hidden Markov Model(also: HMM): A statistical model used extensively in pattern recognition where the system being modeled is assumed to follow a Markov process with hidden (unobserved) states. HMMs have been foundational in both automatic speech recognition and sign language recognition, as they can model…
Histogram of Oriented Gradients(also: HOG): A feature descriptor technique used in computer vision for object detection that counts occurrences of gradient orientations in localized portions of an image. HOG captures edge and texture information by dividing the image into cells and computing gradient direction histograms.…
Human Activity Recognition(also: HAR, Activity Recognition): A field of machine learning and ubiquitous computing that uses sensor data — typically from accelerometers, gyroscopes, and other sensors in smartphones, smartwatches, or other wearable devices — to automatically identify and classify physical activities performed by a person.…
Image Classification(also: Visual Classification, Photo Classification): A computer vision task where a machine learning model assigns a category label to an input image based on its visual content. Image classifiers are trained on labeled example images and learn to recognize patterns associated with each category. In accessibility applications,…
Image Retrieval(also: Content-Based Image Retrieval, CBIR, Visual Search): A computer vision technique that searches a database of images to find ones similar to a query image based on visual features rather than text metadata. In accessibility applications, image retrieval enables systems that can identify specific product instances (like a particular…
ImageNet: ImageNet is a large-scale visual database containing over 14 million labeled images organized into thousands of categories, widely used for training and benchmarking computer vision models. Many object detection and image classification systems used in accessibility…
Inception-v3(also: Inception v3): A deep convolutional neural network architecture developed by Google for image recognition, introduced in 2015. It uses "inception modules" that apply multiple convolution filter sizes in parallel to efficiently capture features at different scales, balancing recognition…
Individual Sign Language Recognition(also: ISLR, Word-Level Sign Recognition, Isolated Sign Recognition): A machine learning task focused on recognizing individual signs from a sign language, translating single signs independently without considering surrounding context. Unlike continuous sign language recognition which attempts to interpret flowing signed sentences, ISLR identifies…
Instance-Level Recognition(also: Instance Recognition, Fine-Grained Recognition): A computer vision task that involves distinguishing between specific individual objects within the same general category, rather than just identifying broad categories. For example, while category-level recognition might identify something as "a bag of chips," instance-level…
Isolated Sign Language Recognition(also: isolated SLR, word-level sign recognition): A sign language recognition task that focuses on identifying individual, pre-segmented signs rather than continuous signing sequences. In isolated SLR, each sign is captured as a separate video clip with clear start and end points, simplifying the recognition problem compared to…
Isolated Sign Recognition(also: ISR, ISLR): A computer vision and machine learning task focused on identifying individual signs from video recordings where each video contains a single sign production, as opposed to continuous sign language recognition which processes connected signing in sentences or conversation.…
K-Shot Learning(also: N-Way K-Shot Learning): A machine learning paradigm where a model must learn to classify objects using only k training examples per class. In the context of accessibility, k-shot learning is significant because it enables assistive technologies like personal object recognizers to be trained with…
LIME(also: Local Interpretable Model-agnostic Explanations): An explainable AI technique, introduced by Ribeiro et al. in 2016, that approximates any black-box model's behaviour around a single prediction by fitting a simple interpretable model (usually sparse linear regression) to perturbed versions of the input. The resulting feature…
LLM-as-Judge(also: LLM as a Judge, Model-as-Judge): An evaluation methodology in which a large language model is prompted to assess the quality of some artifact — generated text, code, a UI, or a response from another model — according to a structured rubric. LLM-as-judge is attractive because it scales automated evaluation to…
LSTM(also: Long Short-Term Memory, LSTM Network): A type of recurrent neural network architecture designed to learn long-term dependencies in sequential data by using special gating mechanisms that control the flow of information through the network. LSTMs are particularly effective for processing time-series data such as…
Large Vision Model(also: LVM): A large vision model is a foundation model trained on very large image (and often video) datasets to produce general-purpose visual representations - capable of object detection, segmentation, captioning, or feature extraction without task-specific retraining. Examples include…
Large multimodal model(also: LMM, Multimodal AI, Vision-language model): An artificial intelligence model capable of processing and generating content across multiple modalities, such as text, images, and audio. Examples include GPT-4V and Gemini. In accessibility applications, large multimodal models enable powerful new capabilities like generating…
Layer-wise Relevance Propagation(also: LRP): Layer-wise Relevance Propagation (LRP) is an explainable AI technique that attributes a neural network's prediction back to its input features by propagating relevance scores layer by layer from the output toward the input. Unlike gradient-based saliency methods, LRP…
Learning Vector Quantization(also: LVQ): A supervised machine learning algorithm used for pattern classification, commonly applied in brain-computer interface systems to classify EEG signals. LVQ works by creating a set of reference vectors (codebook) that represent decision boundaries between different classes of…
Linear Discriminant Analysis(also: Fisher Discriminant Analysis, Fisherfaces): A statistical method used in pattern recognition and machine learning that finds a linear combination of features to best separate two or more classes of objects. In the context of face recognition, LDA (also known as the Fisherfaces method) projects face images into a…
LoRA(also: Low-Rank Adaptation): A parameter-efficient fine-tuning technique, introduced by Hu et al. in 2022, in which a large pretrained neural network is specialised by training only a pair of small low-rank matrices that modify specific weight projections, while the original weights remain frozen. LoRA…
Machine Teaching(also: Interactive Machine Teaching): A paradigm in human-computer interaction where non-expert users guide the training of machine learning models through interactive feedback, such as providing examples, labels, or corrections. Unlike traditional machine learning where data scientists prepare datasets and tune…
Mapping by Demonstration: A personalisation technique for gestural and sensor-based interfaces in which the system learns the relationship between user input (movement, breath, gaze) and output (sound, visuals, commands) from examples the user provides, rather than from designer-authored rules. The…
Markov Logic Networks(also: MLN, MLNs): A machine learning framework that combines first-order logic with probabilistic graphical models to handle uncertainty in rule-based reasoning. In assistive technology contexts, MLNs enable context-aware systems to make intelligent decisions by weighing multiple factors—such as…
Mel Spectrogram(also: Mel-frequency Spectrogram, Log Mel Spectrogram): A visual representation of sound that maps audio frequencies onto the mel scale, which approximates how humans perceive pitch — compressing higher frequencies and expanding lower ones to match the non-linear sensitivity of human hearing. Mel spectrograms convert audio signals…
Meta-learning(also: Learning to Learn): A branch of machine learning where models are trained to learn new tasks from very few examples by leveraging knowledge gained from previous tasks. In accessibility applications, meta-learning enables technologies like teachable object recognizers that can quickly adapt to…
Mixture of Experts(also: MoE): Mixture of experts is a neural network architecture that routes each input through a small subset of specialist subnetworks ('experts') rather than activating the whole model. A gating network decides which experts handle a given token or query, letting the overall model be much…
Motion History Image(also: MHI): A computer vision technique that represents motion in video sequences as a single grayscale image, where pixel intensity indicates recency of movement. Brighter pixels represent more recent motion while darker pixels show older movement patterns. In accessibility applications,…
Multimodal Features(also: multimodal data, multimodal fusion): Information extracted from multiple sensory channels or data types—such as combining visual (RGB), depth, audio, and skeletal data—to improve recognition accuracy. In accessibility systems, multimodal approaches often outperform single-modality methods because different data…
Named Entity Recognition(also: NER): A natural language processing technique that identifies and classifies named entities in text into predefined categories such as person names, locations, organizations, quantities, and domain-specific terms. In accessibility applications, NER can be used to extract meaningful…
Neural Network(also: Artificial Neural Network, ANN): A machine learning model inspired by the structure of biological neural networks in the brain, consisting of interconnected layers of nodes (neurons) that process information by adjusting weighted connections during training. In accessibility and assistive technology, neural…
Neural Radiance Field(also: NeRF): An implicit neural representation of a 3D scene, introduced by Mildenhall et al. in 2020, in which a small neural network is trained to map any 3D coordinate and viewing direction to a colour and density value. Rendering is performed by volumetric ray marching through this…
Neural Vocoder: A deep-learning model that synthesises audio waveforms from intermediate acoustic representations such as mel-spectrograms or discrete speech units. Examples include HiFi-GAN, WaveNet, WaveGlow, and SoundStream. Neural vocoders have largely replaced classical signal-processing…
Object recognition(also: Object detection, Image classification): A computer vision task in which a system identifies and labels objects within images or video, often using deep learning models trained on large datasets. For blind and low-vision users, object recognition is a core capability of camera-based assistive technologies like Seeing…
On-device Recognition(also: On-Device Inference, Edge Recognition): Performing pattern recognition - such as sign language recognition, speech recognition, or computer vision - locally on the user's device rather than by sending input to a remote server. On-device recognition matters for accessibility because it preserves privacy (camera or…
Open-Vocabulary Detection(also: Open-Vocabulary Object Detection, OVD): A class of computer vision object detection models that accept arbitrary text queries at inference time rather than being restricted to a fixed set of pre-trained classes. Instead of only recognizing, for example, the 80 COCO categories, an open-vocabulary detector (such as…
OpenPose: An open-source computer vision library developed by Carnegie Mellon University that detects human body, hand, facial, and foot keypoints in real-time from images or video. OpenPose extracts 25 body keypoints, 21 keypoints per hand, and 70 facial landmarks, providing a skeletal…
Optical Flow: A computer vision method that estimates the apparent motion of objects between consecutive video frames by tracking pixel displacement patterns. Optical flow calculates velocity vectors showing movement direction and speed across an image. In assistive technology, optical flow…
POMDP(also: Partially Observable Markov Decision Process): A Partially Observable Markov Decision Process (POMDP) is a mathematical framework for modelling decision-making in situations where an agent cannot fully observe the state of its environment. In accessibility research, POMDPs are used to model how people with visual impairments…
Parameter-Efficient Fine-Tuning(also: PEFT, Lightweight Fine-Tuning): Parameter-efficient fine-tuning is a family of techniques (LoRA, adapters, prefix tuning, prompt tuning) that adapt a large pretrained model to a new task or domain by updating only a small fraction of its parameters - typically under 1% - while freezing the rest. This…
Part-of-Speech Tagging(also: POS Tagging, Grammatical Tagging): Part-of-speech tagging is the natural-language-processing task of labelling each word in a text with its grammatical category — noun, verb, adjective, and so on — using context from surrounding words. Classical approaches use hidden Markov models with the Viterbi algorithm;…
Particle Filtering(also: Sequential Monte Carlo, Particle Filter): Particle filtering is a probabilistic localization technique that estimates a user's position by maintaining a cloud of weighted "particles," each representing a possible location. As new sensor data arrives—from GPS, inertial sensors, or other sources—particles are updated,…

Category

Search results