Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

Data Annotation(also: Data labeling, AI labeling): The process of attaching labels, transcriptions, bounding boxes, or other structured metadata to raw data so that it can be used to train, evaluate, or benchmark machine-learning models. Annotation is typically performed by human workers - in-house experts, clinicians,…
Dataset Collection(also: Data Collection Protocol): The process of gathering, curating, and documenting data used to train, evaluate, or benchmark machine learning systems. In accessibility contexts, dataset collection decisions — who contributes, what objects or scenarios are captured, how quality is assessed, how privacy is…
DementiaBank: A shared database of multimedia interactions for the study of communication in dementia, maintained as part of the TalkBank system. DementiaBank contains longitudinal recordings of people with Alzheimer's disease and matched controls performing tasks like the "cookie theft"…
Disability-First Dataset(also: Disability-first AI dataset): An approach to AI dataset creation, articulated by Theodorou et al. and others, that treats serving a disability community as the primary objective rather than collecting disability data as a minority slice of a general-purpose dataset. Examples include VizWiz (blind…
Ground Truth(also: Gold standard, Reference labels): In machine learning, the labels treated as authoritative when training or evaluating a model - typically produced by human annotators or expert consensus and assumed to represent the 'correct' answer. Critical AI scholarship has shown that ground truth is socially constructed:…
ImageNet: ImageNet is a large-scale visual database containing over 14 million labeled images organized into thousands of categories, widely used for training and benchmarking computer vision models. Many object detection and image classification systems used in accessibility…
Inter-Annotator Agreement(also: IAA, Inter-rater agreement, Inter-coder agreement): A statistical measure of how consistently two or more human annotators assign the same label to the same data item, widely used in NLP, computer vision, and AI dataset construction as a proxy for label quality. Common measures include Cohen's kappa, Fleiss' kappa, and…
ORBIT Dataset(also: Object Recognition for Blind Image Training): A disability-first machine learning dataset for teachable object recognition, contributed by people who are blind or have low vision. The original ORBIT dataset (Massiceti et al., 2021) contains 3,822 videos of 486 objects from 67 data collectors, predominantly in the UK and…
Sign Language Corpus(also: ASL Corpus, Signed Language Corpus): A structured collection of recorded signed-language performances — typically video, and increasingly motion-capture data — annotated by expert signers with time-stamped linguistic information such as individual signs, non-manual markers, eye gaze, grammatical boundaries, and…
Stanford Emotional Narratives Dataset(also: SEND, SEND Dataset): A publicly available dataset of short video clips of people telling emotionally significant personal stories, created by Ong et al. at Stanford (2019) to support multimodal emotion-recognition research. Each video is annotated continuously for valence, arousal, and…
UAspeech Database(also: UAspeech, UA-Speech, Universal Access Speech): The UAspeech Database is a standardized corpus of dysarthric speech recordings created for research in accessible speech technology. It contains isolated word recordings from speakers with cerebral palsy exhibiting varying degrees of dysarthria, along with matched control…

11 results.

Category

Search results