Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

Reinforcement Learning from Human Feedback(also: RLHF): A machine learning technique used to fine-tune large language models by incorporating human judgments about response quality. Human annotators rank or rate model outputs, and this feedback trains a reward model that guides the LLM toward producing preferred responses. While RLHF…
Relevance Scoring(also: Task Relevance Score, Content Relevance Rating): The assignment of numerical scores to web page elements indicating how relevant they are to a user's specified task or goal. In systems like Task Mode, relevance scores typically range from 0 (completely irrelevant) to 100 (critical to the task), assigned by large language…
Representational harm(also: Representational bias): A category of harm caused by AI systems that perpetuate or amplify negative stereotypes, demeaning portrayals, or erasure of particular social groups, distinct from allocative harms that deny resources or opportunities. In disability contexts, representational harms occur when…
Retrieval-Augmented Generation(also: RAG): An AI technique that enhances the responses of large language models (LLMs) by first retrieving relevant information from an external knowledge base or document collection, then providing that information as context for the model to generate its response. In accessibility…
SHAP(also: SHapley Additive exPlanations): A unified framework for feature-importance explanations of machine-learning models, introduced by Lundberg and Lee in 2017, grounded in Shapley values from cooperative game theory. For any model and input, SHAP assigns each feature a value representing its contribution to that…
Scene Segmentation(also: Scene Detection, Shot Boundary Detection): Scene segmentation is the process of automatically dividing a video into discrete scenes or segments based on visual changes such as cuts, transitions, or the appearance of new elements in the frame. In the context of accessibility, scene segmentation is a foundational component…
Seeing AI: A free AI-powered app developed by Microsoft for blind and low vision users that uses computer vision and AI to describe the visual world. Features include reading short text, documents, and handwriting; identifying products via barcodes; recognizing people and their emotions;…
Self-Debiasing(also: Model Self-Debiasing, Autonomous Debiasing): A class of techniques where AI systems, particularly large language models, are prompted or configured to identify and reduce their own biased outputs without external model modification or retraining. Self-debiasing approaches include prompting models to reflect on whether…
Semantic Analysis(also: Semantic Content Analysis, Semantic Similarity): The computational process of determining meaning and relationships within text, images, or other content by analyzing their semantic properties rather than just surface-level features. In accessibility, semantic analysis enables automated tools to go beyond detecting the…
Semantic Data Extraction(also: Structured Data Extraction, Information Extraction): The process of extracting structured, meaningful data from unstructured or semi-structured sources such as images, documents, web pages, or natural language text, preserving the semantic relationships between data elements. In accessibility, semantic data extraction is used to…
Semantic Segmentation(also: Pixel-Level Classification, Scene Parsing): A computer vision technique that classifies every pixel in an image into a predefined category, producing a detailed map of what objects are present and where they are located. Unlike object detection (which draws bounding boxes around objects), semantic segmentation provides…
Sensory augmentation(also: Sensory substitution system, Sensory augmentation technology): Technology that provides information from one sensory channel through an alternative modality accessible to the user, such as converting visual scenes to audio descriptions for blind users or translating sounds to visual or haptic alerts for deaf users. AI-powered sensory…
Sentiment Analysis(also: Opinion Mining): A natural language processing technique that identifies and extracts subjective information from text, classifying it as positive, negative, or neutral. In accessibility research, sentiment analysis can be applied to social media posts, product reviews, and online discussions to…
SigLIP(also: Sigmoid Loss for Language Image Pre-Training): A vision-language model that uses sigmoid loss instead of contrastive loss for aligning images with text descriptions. SigLIP improves upon CLIP by using a more efficient training objective that computes image-text similarity without requiring large batch sizes. In accessibility…
Sign Language Processing(also: SLP, Sign Language Technology): A field of artificial intelligence and computer science focused on developing computational systems that can understand, generate, and translate sign languages. Sign language processing encompasses sign language recognition (detecting and interpreting signs from video input),…
Sign Language Recognition(also: SLR, Automatic Sign Recognition): A computer vision and machine learning task focused on automatically detecting and classifying signs from video input. Sign language recognition ranges from isolated sign recognition (identifying individual signs) to continuous sign recognition (interpreting sequences of signs…
Sign Language Translation(also: SLT, Sign-to-Text Translation, Sign-to-Speech Translation): The task of converting between a sign language and a spoken or written language, in either direction. Sign-to-spoken/written translation (e.g., ASL to English) involves recognizing signs from video and producing equivalent text or speech. Spoken/written-to-sign translation…
Sign language avatar(also: Signing avatar, Virtual signer): A computer-generated animated character that produces sign language from text or speech input. While sign language avatars hold potential for scaling deaf accessibility, their premature deployment raises significant concerns: the World Federation of the Deaf and World…
Small Language Model(also: SLM): A language model, typically ranging from tens of millions to a few billion parameters, designed to run on consumer or edge devices rather than in centralized cloud data centers. Small language models sacrifice some of the broad general knowledge of frontier large language models…
Sound Classification(also: Sound Event Detection, Audio Classification): The automated process of identifying and categorizing sounds into predefined categories such as speech, music, alarms, animal sounds, or environmental noise. Sound classification is a foundational capability in sound awareness technologies for deaf and hard of hearing users,…
Sound Event Detection(also: Audio Tagging, Automatic Sound Recognition): A machine learning technique that automatically identifies and classifies sounds within an audio stream, such as music, applause, laughter, environmental noises, and other non-speech audio events. In accessibility contexts, sound event detection can complement automatic speech…
Sound awareness(also: Sound recognition, Environmental sound detection): Technology that detects and identifies sounds in the user's environment and conveys that information through alternative modalities such as visual notifications or haptic alerts. For deaf and hard-of-hearing users, sound awareness systems can identify doorbells, fire alarms,…
Speaker Diarization(also: Speaker Segmentation): The process of partitioning an audio stream into segments according to speaker identity, determining "who spoke when" in a multi-speaker recording or conversation. Speaker diarization is important for accessibility because deaf and hard of hearing individuals need to distinguish…
Speaker-dependent speech recognition(also: User-adapted ASR, Personalized speech recognition): A speech recognition approach that trains or adapts its acoustic models to a specific individual's voice characteristics, rather than relying solely on general population models. For people with cognitive disabilities, dysarthria, or other speech differences, speaker-dependent…
Speech Language Model(also: SLM, Audio Language Model, Speech Foundation Model): A class of large neural models that processes both speech and text in a single end-to-end framework, integrating tasks — automatic speech recognition, spoken language understanding, dialogue, speech generation — that traditionally required separate modular systems. Examples…
Speech Recognition(also: Voice Recognition, STT, Speech-to-Text): Technology that converts spoken language into text or commands by analyzing audio input. Speech recognition powers dictation systems, voice assistants, and voice-controlled interfaces. For accessibility, speech recognition enables text input and device control for users who…
Speech-to-Text(also: STT, Speech Recognition, Automatic Speech Recognition): Technology that converts spoken language into written text, enabling voice-based input for digital systems. In accessibility, speech-to-text serves multiple roles: it powers voice command interfaces for users who cannot use keyboard or touch input, generates real-time captions…
Stable Diffusion: An open-weights latent text-to-image diffusion model released by Stability AI in 2022. It operates by iteratively denoising a random latent tensor, conditioned on text embeddings produced by a frozen CLIP encoder, until the latent can be decoded by a VAE into a coherent image.…
Subjective Image Description(also: Subjective Visual Assessment): An image description that involves opinion, aesthetic judgment, or interpretation rather than purely factual content. Examples include assessing whether an outfit matches, whether a room setting looks nice, or whether a photograph is aesthetically pleasing. Subjective image…
Support Indicator(also: Agreement Indicator): A visual or textual cue that communicates the degree of agreement across multiple AI model responses for a particular claim. Support indicators help BLV users assess claim reliability by showing how many or which models agree. Research has explored four types: source-based ("3…
Support Vector Machine(also: SVM): A supervised machine learning algorithm used for classification and regression tasks. SVMs work by finding the optimal hyperplane that separates data points into distinct categories in a high-dimensional feature space. In accessibility research, SVMs have been used to detect…
Task Automation(also: Web Task Automation, Browser Automation): The use of software agents or scripts to automatically perform web-based tasks on behalf of users, such as filling forms, making purchases, or extracting information. Task automation in accessibility contexts promises to reduce the effort required for screen reader users to…
Teachable AI(also: Teachable Machine Learning, Interactive Machine Learning): Teachable AI refers to artificial intelligence systems that allow end users to personalize the system by providing their own training examples, high-level constraints, or prompts — without requiring programming or machine learning expertise. In the accessibility context,…
Teachable Object Recognition(also: Teachable Object Recognizer, TOR, Personalized Object Recognition): A machine learning approach that allows users to train an object recognition system to identify their own personal items by providing a small number of training examples, typically photos or videos. This technology is particularly valuable for blind and low vision users who need…
Text-to-Image(also: Text-to-Image Generation, T2I): An AI capability that generates visual images from natural language text descriptions (prompts). Text-to-image models like DALL-E, Midjourney, and Stable Diffusion have opened new creative possibilities for blind individuals by allowing them to create visual content through…
Text-to-Image Generation(also: Text-to-Image AI, Text-to-Image Synthesis): An artificial intelligence capability that creates visual images from natural language text descriptions, also known as prompts. Tools such as DALL-E, MidJourney, and Stable Diffusion use large-scale diffusion models trained on image-text pairs to generate novel images matching…
Text-to-Image Model(also: T2I Model, T2I, Text-to-Image Generator): A generative AI system that produces images from natural-language prompts. Prominent examples include DALL-E, Stable Diffusion, and Midjourney. In accessibility contexts, text-to-image models have been shown to replicate and amplify disability stereotypes — for example,…
Time-Causal Model(also: Temporal Causal Model, Sequential Logic Model): A computational model that enforces temporal coherence in predictions by ensuring that the sequence of recognized events follows a logical causal order. In recipe tracking, a time-causal model prevents the system from predicting that an earlier step is currently happening after…
Topic Segmentation(also: Text Segmentation, Topicalisation): A natural language processing technique that automatically divides a document into coherent sections based on changes in topic or subject matter. Topic segmentation algorithms detect boundaries where the semantic content of adjacent sentences or paragraphs shifts significantly,…
Toxicity detection(also: Content toxicity scoring, Toxic speech detection): An NLP-based content moderation technique that assigns scores to text indicating the likelihood it is rude, disrespectful, or likely to make someone leave a conversation. Research has shown that toxicity detection models encode disability bias, scoring innocuous sentences that…
Training Data(also: Training Set, Training Dataset): The collection of labeled examples used to teach a machine learning model to perform a specific task. The quality, quantity, and diversity of training data directly determine how well a model will perform. In accessibility contexts, training data quality is especially important…
Trajectory Analysis(also: Route Analysis, Path Analysis): The computational study of movement patterns over time and space, typically derived from GPS or other location data. Trajectory analysis involves modelling, comparing, and classifying sequences of spatial positions to identify patterns, anomalies, or behaviours. In assistive…
Transfer Learning: A machine learning technique where a model trained on a large general dataset is adapted to perform a new, more specific task using a much smaller amount of new training data. Rather than training a model from scratch, transfer learning leverages patterns already learned by an…
Transparency in AI(also: AI Transparency, Algorithmic Transparency): The principle that AI systems should clearly communicate how they work, what data they use, where processing occurs, and what their limitations are. In accessibility contexts, blind users have expressed strong desires to understand how AI-enabled privacy techniques are designed,…
Turing Test(also: Imitation Game): The Turing Test, proposed by Alan Turing in 1950, is a thought experiment for assessing whether a machine's conversational behaviour is indistinguishable from that of a human. A human evaluator engages in a text-based exchange with both a human and a machine and must decide…
UI Agent(also: User Interface Agent, Browser Agent, AI Agent): An AI-powered software system that can autonomously interact with graphical user interfaces on behalf of a user, performing tasks by interpreting natural language commands and translating them into interface actions such as clicking buttons, entering text, and navigating between…
Variation Summary: A concise presentation format for AI-generated image descriptions that explicitly organizes information into three categories: agreements (claims supported by all or most models), disagreements (claims where models conflict), and unique mentions (information provided by only one…
Variation Surfacing(also: Variation Display, Surfacing Variations): A technique for helping users assess AI reliability by generating multiple responses from one or more AI models and systematically presenting the differences, agreements, and unique mentions across those responses. In the context of image descriptions for blind and low vision…
Variation-Aware Description: A presentation format for AI-generated image descriptions that aggregates multiple model responses into a single coherent, hierarchical description while highlighting variations inline. When multiple AI models describe the same image, a variation-aware description combines their…
Video Summarization(also: Video Summary, Video Condensation): The process of creating a shortened version of a video that captures its key content, either through extractive methods (selecting key segments) or abstractive methods (generating new condensed content). Video summarization is an emerging accessibility tool that can make…

Category

Search results