Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

Scene Classification(also: Scene Recognition, Scene Understanding): Scene classification is a computer vision task that categorizes images or video frames into predefined scene types such as indoor/outdoor, kitchen, office, or street. For accessibility, scene classification helps automated systems provide context about environments in image…
Scene Segmentation(also: Scene Detection, Shot Boundary Detection): Scene segmentation is the process of automatically dividing a video into discrete scenes or segments based on visual changes such as cuts, transitions, or the appearance of new elements in the frame. In the context of accessibility, scene segmentation is a foundational component…
Scene Text Recognition(also: Scene Text Detection, Text in the Wild, Environmental Text Detection): The computer vision task of detecting and reading text that appears naturally in real-world environments, such as street signs, product labels, shop names, and building numbers. Unlike optical character recognition (OCR) for scanned documents where text layout is predictable,…
Screen Recognition: A computer vision feature in Apple's VoiceOver screen reader that automatically interprets the pixels of a graphical user interface to identify and label interactive elements when applications have not properly implemented accessibility APIs. Screen Recognition analyses the…
Semantic Segmentation(also: Pixel-Level Classification, Scene Parsing): A computer vision technique that classifies every pixel in an image into a predefined category, producing a detailed map of what objects are present and where they are located. Unlike object detection (which draws bounding boxes around objects), semantic segmentation provides…
SigLIP(also: Sigmoid Loss for Language Image Pre-Training): A vision-language model that uses sigmoid loss instead of contrastive loss for aligning images with text descriptions. SigLIP improves upon CLIP by using a more efficient training objective that computes image-text similarity without requiring large batch sizes. In accessibility…
Sign Language Generation(also: Sign Language Synthesis, Signing Generation): The automatic production of sign language content, typically through computer-generated animations of signing avatars or video synthesis. Sign language generation systems convert text or symbolic representations of signs into visual output, often using motion-capture data,…
Sign Recognition(also: Indoor sign recognition, Signage recognition): The task of automatically detecting, reading, and interpreting signs in an environment — for accessibility purposes, typically indoor directional signs (arrows pointing to corridors or facilities) and textual signs (room numbers, department names, wayfinding labels). Sign…
Sign Spotting(also: Sign Detection, Continuous Sign Spotting): Sign spotting is the task of automatically locating instances of specific signs within a continuous signing video, as opposed to classifying a pre-segmented isolated sign. It is a building block for search-by-sign in archive footage, automatic captioning of signed media, and…
Sign language detection(also: SL detection, Signing detection): The automated identification of whether video content contains sign language communication, using computer vision techniques to analyse motion patterns around detected faces. Sign language detection is distinct from sign language recognition (which interprets specific signs): it…
Skeleton Tracking(also: skeletal tracking, body tracking, pose estimation): Technology that detects and tracks the positions of human body joints (such as head, shoulders, elbows, hands) in real-time from camera or depth sensor data. In accessibility applications, skeleton tracking enables gesture-based interfaces, sign language recognition, and…
Spatiotemporal Saliency(also: Spatiotemporal Saliency Estimation, Spatio-Temporal Saliency): A computer vision technique that estimates, for each pixel in a video, how visually important it is at a given moment by combining spatial contrast (features that stand out within a frame) with temporal contrast (regions that change or move differently from their recent…
Speaker Segmentation(also: Person Segmentation, Human Segmentation): The process of identifying and isolating the speaker or presenter in a video frame, separating them from the background and other visual elements. Speaker segmentation uses computer vision models to create precise masks around the speaker, enabling layout customization options…
Stereo Vision(also: Stereoscopic Vision, Stereo Camera System, Stereopsis): A computer vision technique that uses two or more cameras positioned at slightly different viewpoints to extract three-dimensional depth information from a scene, mimicking the way human binocular vision perceives depth. In assistive technology, stereo vision systems have been…
Stereoscopic Camera(also: Stereo Camera, Depth Camera, 3D Camera): A camera system that uses two or more lenses to capture images from slightly different perspectives, mimicking human binocular vision to compute depth information (disparity maps). In accessibility applications, stereoscopic cameras are used in assistive devices for visually…

15 results.

Category

Search results