Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

SHAP(also: SHapley Additive exPlanations): A unified framework for feature-importance explanations of machine-learning models, introduced by Lundberg and Lee in 2017, grounded in Shapley values from cooperative game theory. For any model and input, SHAP assigns each feature a value representing its contribution to that…
SMOTE(also: Synthetic Minority Over-sampling Technique): A data augmentation technique that addresses class imbalance in machine learning datasets by generating synthetic examples of the minority class rather than simply duplicating existing ones. SMOTE creates new instances by interpolating between existing minority class samples and…
Scene Classification(also: Scene Recognition, Scene Understanding): Scene classification is a computer vision task that categorizes images or video frames into predefined scene types such as indoor/outdoor, kitchen, office, or street. For accessibility, scene classification helps automated systems provide context about environments in image…
Semantic Segmentation(also: Pixel-Level Classification, Scene Parsing): A computer vision technique that classifies every pixel in an image into a predefined category, producing a detailed map of what objects are present and where they are located. Unlike object detection (which draws bounding boxes around objects), semantic segmentation provides…
Sequence-to-Sequence(also: Seq2Seq, Encoder-Decoder): A neural network architecture designed for tasks where both input and output are sequences of variable length, such as machine translation, speech recognition, and video captioning. A seq2seq model consists of an encoder that processes the input sequence into a fixed-length…
Sign Language Machine Translation(also: English-to-ASL Translation, Sign Language MT, Text-to-Sign Translation): The automatic translation of written or spoken text into a signed language (or vice versa) using computational methods, typically producing output as an animated signing avatar or, less commonly, as recorded video clips. Because signed languages such as American Sign Language…
Sign language translation(also: SLT, Sign-to-text translation): The automatic conversion of sign language video into written or spoken language text using machine learning. Unlike sign language recognition, which identifies individual signs or glosses, sign language translation produces fluent natural language output that accounts for the…
Signer-Independent Recognition(also: signer-independent SLR): A sign language recognition approach designed to work with signers whose data was not included in the training set. Similar to speaker-independent speech recognition, signer-independent systems must handle variations in signing style, hand size, speed, and regional signing…
Singular Value Decomposition(also: SVD): A mathematical technique that decomposes a matrix into three component matrices, used to reduce high-dimensional data to its most important features while preserving essential relationships. In accessibility research, SVD is a core component of Latent Semantic Analysis and has…
Sound Recognition(also: Sound Classification, Audio Event Detection, Environmental Sound Recognition): Technology that automatically identifies and classifies sounds in a user's environment, typically using machine learning models trained on audio datasets. In accessibility contexts, sound recognition systems help deaf and hard of hearing people become aware of environmental…
Speaker Adaptation(also: Voice Adaptation, Speaker-Adaptive Training, Voice Personalization): Speaker adaptation is the process of adjusting an existing automatic speech recognition (ASR) system — usually one trained on a large, demographically broad corpus of able-bodied speakers — to a particular individual's voice using a relatively small amount of that person's…
Speech Emotion Recognition(also: SER, Vocal Emotion Recognition): A class of machine-learning techniques that infers a speaker's emotional state from acoustic features of speech — pitch contour, intensity, rhythm, spectral properties, voice quality — usually producing a label (happy/sad/angry/calm) or continuous values on valence and arousal…
Speech Language Model(also: SLM, Audio Language Model, Speech Foundation Model): A class of large neural models that processes both speech and text in a single end-to-end framework, integrating tasks — automatic speech recognition, spoken language understanding, dialogue, speech generation — that traditionally required separate modular systems. Examples…
Stable Diffusion: An open-weights latent text-to-image diffusion model released by Stability AI in 2022. It operates by iteratively denoising a random latent tensor, conditioned on text embeddings produced by a frozen CLIP encoder, until the latent can be decoded by a VAE into a coherent image.…
Supervector(also: GMM Supervector): A supervector is a high-dimensional feature representation created by concatenating the mean vectors from all components of a Gaussian Mixture Model (GMM) adapted to a specific speaker or utterance. This concatenation transforms variable-length speech into a fixed-length vector…
Support Vector Machine(also: SVM): A supervised machine learning algorithm used for classification and regression tasks. SVMs work by finding the optimal hyperplane that separates data points into distinct categories in a high-dimensional feature space. In accessibility research, SVMs have been used to detect…

16 results.

Category

Search results