Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

3D Reconstruction(also: Scene Reconstruction, 3D Scene Reconstruction): The computer vision task of recovering the 3D structure of a scene - geometry, camera positions, and sometimes object trajectories - from one or more 2D images or video frames. Techniques range from classic structure-from-motion and multi-view stereo to modern learning-based…
AR Marker(also: Fiducial marker, Augmented reality marker): A printed visual pattern (often a square with a distinctive black-and-white code) placed in the environment that a smartphone or AR headset camera can recognise to determine its own position and orientation with high precision. In blind-navigation research, AR markers are placed…
Adaptive Boosting(also: AdaBoost): A machine learning ensemble method that combines multiple weak classifiers to create a strong classifier, with each successive classifier focusing on the examples that previous classifiers misclassified. In computer vision and accessibility applications, AdaBoost is widely used…
ArUco Marker(also: ArUco Fiducial): A square fiducial marker composed of a black border and an inner binary pattern that encodes a unique ID, designed for fast, robust pose estimation from a single camera image. ArUco markers are widely supported by OpenCV and are used in augmented reality, robotics, and research…
Background Subtraction(also: Foreground-Background Separation, Background Modelling): Background subtraction is a computer vision technique used to identify moving objects (the foreground) in a video by comparing each frame against a model of the static background. Common approaches include adaptive Gaussian mixture models that continuously update the background…
CLIP(also: Contrastive Language-Image Pre-Training): A vision-language model developed by OpenAI that learns to associate images with natural language descriptions through contrastive learning on large-scale image-text pairs. CLIP can compute similarity scores between images and text, enabling zero-shot classification and…
Camera Framing(also: Photo Framing, Object Framing): The act of positioning a camera so that the intended subject is properly captured within the image frame — not cropped, not too small, and centered enough for clear identification. Camera framing presents a significant accessibility challenge for blind and low-vision users who…
Camera Mouse(also: Head-Controlled Mouse Pointer, Head Tracking Mouse): A computer-vision-based mouse-replacement system that tracks a user's head motion through a standard webcam to control the mouse pointer on screen. Developed at Boston University by Margrit Betke and James Gips, Camera Mouse is freely available and enables people with severe…
Cascading classifier(also: Cascaded detection, Multi-stage classifier): A machine learning architecture that chains multiple detection stages in sequence, where each stage filters candidates before passing them to the next, progressively increasing detection precision while maintaining recall. In accessibility applications, cascading classifiers are…
Cognitive Assistance(also: Cognitive Aid, AI-Powered Assistance, Assisted Cognition): Technology that uses artificial intelligence and machine learning to supplement or expand human cognitive and perceptual abilities. In accessibility contexts, cognitive assistance systems recognise people, objects, text, and environments and convey that information through…
Collision Prediction(also: Collision risk prediction, Trajectory prediction): The task of estimating the future trajectories of surrounding pedestrians and obstacles and determining whether any of them will intersect with a user's own future position within a short prediction horizon (typically 2–4 seconds). In assistive technology for blind travellers,…
Color Histogram(also: Colour histogram, Histogram tracking): A statistical summary of the distribution of colour values across the pixels of an image or image region, often computed in a perceptual colour space such as Lab. In assistive computer-vision systems for blind users, colour histograms are used to re-identify and track a specific…
Continuous Sign Language Recognition(also: CSLR): A computer vision task that involves recognizing sign language from continuous, naturally produced signing — as opposed to isolated sign recognition, which identifies individual signs in segmented clips. Continuous sign language recognition deals with the complexities of natural…
Convolutional Neural Network(also: CNN, ConvNet): A class of deep neural network that uses convolutional filters to automatically extract spatial features from data, originally designed for image processing but now widely applied to sensor data, audio, and video analysis. CNNs identify patterns like edges, textures, and shapes…
Crosswalk detection(also: Pedestrian crossing detection, Zebra crossing detection): The automated identification and localization of marked pedestrian crossings in imagery using computer vision techniques. Crosswalk detection can be performed on satellite images, street-level photographs, or real-time camera feeds to populate navigation databases for blind…
Depth Camera(also: Depth Sensor, RGB-D Camera, 3D Camera): A depth camera is a device that captures both standard visual imagery and per-pixel distance information, producing a 3D representation of the scene. Technologies include structured light (projecting patterns and measuring distortion), time-of-flight (measuring how long light…
Depth Estimation(also: Monocular Depth Estimation, Depth Prediction): The computer vision task of predicting the distance from the camera to each point in a scene, producing a depth map in which each pixel carries a distance value. Monocular depth estimation uses a single RGB image (no stereo cameras or LiDAR) and typically relies on deep learning…
Depth Sensing(also: Depth Perception (computer vision), 3D Sensing): The ability of a sensor or system to measure the distance from itself to objects in the scene, producing a depth map or point cloud rather than a flat image. Common approaches include stereo vision (triangulating between two cameras), structured light (projecting a known…
Eigenfaces: A computer vision technique for face recognition that uses Principal Component Analysis to represent faces as a linear combination of standardized face components (eigenvectors derived from a training set of face images). Developed by Turk and Pentland in 1991, Eigenfaces was…
Element Detection(also: UI Element Detection, Widget Detection, Object Detection): The task of automatically identifying the locations and types of user interface components (such as buttons, text fields, images, and checkboxes) from a screenshot using computer vision models. Element detection is important for accessibility because it can identify interactive…
Face Detection(also: Face Recognition, Facial Detection): A computer vision technology that identifies and locates human faces within digital images or video frames, typically providing bounding box coordinates around each detected face. Face detection serves as the foundation for more advanced tasks like face recognition (identifying…
Face Recognition(also: Facial Recognition, Face Detection): A technology that uses computer vision and machine learning to identify or verify a person by analysing their facial features from images or video. In accessibility contexts, face recognition has significant potential as an assistive tool for blind and deafblind people, enabling…
Facial Action Coding System(also: FACS): A comprehensive, anatomically based system for describing all visually discernible facial movements, originally developed by Paul Ekman and Wallace Friesen in 1977. FACS decomposes facial expressions into individual components called Action Units (AUs), each corresponding to the…
Facial Expression Analysis(also: Automated Facial Expression Analysis, Facial Coding, AFEA): The automated classification of a person's facial movements into discrete emotion categories (happy, angry, neutral, surprised, etc.) using computer vision. In hiring, facial expression analysis is embedded in AI-scored video interviews. It has been shown to systematically…
Facial Expression Recognition(also: FER, Facial Action Recognition): Computer vision technology that detects and classifies facial expressions from images or video. In sign language contexts, facial expression recognition is essential for capturing non-manual signs — the facial movements that carry grammatical meaning in ASL, such as raised…
Facial Gesture Recognition(also: Face Tracking, Facial Expression Recognition): Technology that uses cameras and computer vision algorithms to detect and interpret facial movements and expressions in real time. For accessibility, facial gestures such as opening the mouth, raising eyebrows, smiling, or nose movements can be mapped to computer commands,…
Facial Recognition(also: Face Recognition, FR): Facial recognition is a computer vision technology that identifies or verifies a person by analyzing and comparing patterns in their facial features from digital images or video. In accessibility contexts, facial recognition has significant potential to assist blind and low…
Few-Shot Object Recognition(also: Few-Shot Recognition): A machine learning approach in which a model learns to identify a novel object from only a handful of labelled examples (commonly one to ten) rather than the hundreds or thousands typical of conventional supervised training. Few-shot object recognition underpins teachable and…
Fiducial Marker(also: ArUco Marker, Fiducial Tag): A visual pattern placed on an object or surface that can be detected and identified by computer vision systems to determine the object's position, orientation, and identity. Fiducial markers such as ArUco markers are commonly used in augmented reality and assistive technology…
Fiducial Marker(also: AR Marker, Visual Marker, Reference Marker): An artificial visual landmark placed in a physical environment to serve as a reference point for image processing systems. Fiducial markers — such as QR codes, ArUco markers, and BCH matricial markers — are designed for robust detection by cameras under varying conditions of…
Finger Tracking(also: Fingertip Tracking, Finger Detection, Hand Tracking): Computer vision or sensor-based technology that detects and follows the position and movement of a user's fingers in real-time. In accessibility applications, finger tracking enables hands-free interaction with tactile graphics, touchscreens, and physical objects by monitoring…
Frame differencing(also: Temporal differencing, Background subtraction): A computer vision technique that detects motion or changes in video by comparing consecutive frames pixel by pixel. In accessibility applications, frame differencing can identify instructor actions in presentation videos, detect gestures in sign language recognition, or track…
Grad-CAM(also: Gradient-weighted Class Activation Mapping): A widely used explainable AI technique, introduced by Selvaraju et al. in 2017, that produces a class-discriminative heat map over an input image by weighting convolutional feature maps by the gradient of the target class score. Grad-CAM and its variants (SmoothGrad-CAM,…
Hand-Object Interaction(also: Hand-Object Interactions, HOI): The full range of physical actions people perform when grasping, touching, holding, manipulating, or gesturing toward objects with their hands. In accessibility research, hand-object interactions are studied as natural intent cues that can drive assistive technology: for blind…
Head Pose Estimation(also: Head Orientation Detection, Gaze Direction Estimation): A computer vision technique that determines the orientation or direction a person's head is facing, typically classifying whether someone is looking towards or away from the camera. In accessibility contexts, head pose estimation can help blind users determine whether a passerby…
Histogram of Oriented Gradients(also: HOG): A feature descriptor technique used in computer vision for object detection that counts occurrences of gradient orientations in localized portions of an image. HOG captures edge and texture information by dividing the image into cells and computing gradient direction histograms.…
Image Captioning(also: Automatic Image Description, AI Image Description): A computer vision task in which an AI model generates a natural language description of the content of an image. In accessibility contexts, image captioning technology enables visually impaired users to understand visual content by converting images into text that can be read…
Image Classification(also: Visual Classification, Photo Classification): A computer vision task where a machine learning model assigns a category label to an input image based on its visual content. Image classifiers are trained on labeled example images and learn to recognize patterns associated with each category. In accessibility applications,…
Image Obfuscation(also: Image Masking, Visual Privacy Protection): Techniques applied to images to obscure or remove sensitive visual information before sharing or processing, such as blurring, pixelation, edge filtering, or masking regions of an image. In accessibility contexts, image obfuscation is important for privacy-preserving assistive…
Image Processing(also: Digital Image Processing): The use of computational algorithms to analyze, enhance, transform, or extract information from digital images. In accessibility, image processing techniques are applied to convert visual content into accessible formats for blind and visually impaired users, including generating…
Image Retrieval(also: Content-Based Image Retrieval, CBIR, Visual Search): A computer vision technique that searches a database of images to find ones similar to a query image based on visual features rather than text metadata. In accessibility applications, image retrieval enables systems that can identify specific product instances (like a particular…
Image Segmentation(also: Region Segmentation): A computer vision technique that partitions a digital image into multiple distinct regions or segments based on shared characteristics such as color, intensity, or texture. In accessibility applications, image segmentation is used to simplify complex images for tactile…
Image Stitching(also: Photo Stitching, Panoramic Stitching): A computer vision technique that combines multiple overlapping photographs into a single wider or panoramic image. In accessibility contexts, image stitching enables blind users to capture more visual information from their environment than a single photo can provide, creating…
ImageNet: ImageNet is a large-scale visual database containing over 14 million labeled images organized into thousands of categories, widely used for training and benchmarking computer vision models. Many object detection and image classification systems used in accessibility…
Inception-v3(also: Inception v3): A deep convolutional neural network architecture developed by Google for image recognition, introduced in 2015. It uses "inception modules" that apply multiple convolution filter sizes in parallel to efficiently capture features at different scales, balancing recognition…
Instance Segmentation: A computer vision technique that identifies and delineates individual objects within an image at the pixel level, distinguishing separate instances even when they belong to the same category. In accessibility contexts, instance segmentation enables assistive tools to provide…
Instance-Level Recognition(also: Instance Recognition, Fine-Grained Recognition): A computer vision task that involves distinguishing between specific individual objects within the same general category, rather than just identifying broad categories. For example, while category-level recognition might identify something as "a bag of chips," instance-level…
Intersection Detection(also: Junction detection, Corridor intersection recognition): A computer-vision or sensor-fusion technique used in indoor navigation systems for blind travellers to identify where two or more walkable corridors meet, so the navigation software can update the user's position on a map and issue a turn instruction at the right moment.…
Isolated Sign Recognition(also: ISR, ISLR): A computer vision and machine learning task focused on identifying individual signs from video recordings where each video contains a single sign production, as opposed to continuous sign language recognition which processes connected signing in sentences or conversation.…
K-Shot Learning(also: N-Way K-Shot Learning): A machine learning paradigm where a model must learn to classify objects using only k training examples per class. In the context of accessibility, k-shot learning is significant because it enables assistive technologies like personal object recognizers to be trained with…

Category

Search results