Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

Accessibility Metrics(also: Web Accessibility Metrics, Accessibility Scores, Accessibility Measurement): Quantitative methods for measuring and scoring the accessibility level of websites or digital content. Accessibility metrics typically work by evaluating web pages against checkpoints derived from standards like WCAG, computing pass or failure rates, and then synthesizing these…
Agreement Rate(also: AR): A statistical measure used in end-user gesture elicitation studies to quantify how much consensus participants show when proposing gestures or interactions for a given task (referent). Agreement rate ranges from 0 (no two proposals are equivalent) to 1 (all proposals are…
Back-translation(also: Reverse translation): A validation technique in cross-linguistic instrument translation where an independently translated version (e.g., ASL video) is translated back into the source language (e.g., English) by someone who did not see the original, then compared for meaning equivalence.…
Benchmark dataset(also: Evaluation dataset, Test benchmark): A standardized dataset used to evaluate and compare the performance of AI models, algorithms, or systems against established baselines. In accessibility, the absence of benchmark datasets that include people with disabilities means disparate performance across disability…
Caption quality metric(also: ACE metric, Caption evaluation metric): A measure designed to predict how understandable automatically generated captions are for Deaf and Hard-of-Hearing users, as an alternative to standard Word Error Rate which correlates poorly with actual DHH comprehension. The Automatic Caption Evaluation (ACE) metric combines…
Cloze Test(also: Cloze Procedure, Cloze Deletion Test): A reading comprehension assessment method in which words are systematically deleted from a text and the reader must fill in the missing words based on context. Developed by Wilson Taylor in 1953, cloze tests measure how well a reader understands the language patterns and meaning…
Cognitive Walkthrough(also: Expert Walkthrough): An accessibility and usability evaluation method in which one or more experts step through a series of tasks from the perspective of a target user, identifying potential barriers and difficulties at each step. In accessibility evaluations, cognitive walkthroughs often involve…
Cross-syndrome comparison(also: Cross-disability comparison): A research methodology that evaluates a technology or intervention with participants from multiple disability groups to determine whether findings and design principles generalize across conditions. Cross-syndrome comparisons are important because assistive technologies designed…
Ecological validity(also: Real-world validity): The degree to which research findings from controlled laboratory settings accurately reflect behaviour and performance in real-world everyday contexts. In accessibility research, ecological validity is a critical concern because laboratory conditions — structured tasks, quiet…
Error-spread modelling(also: Error propagation modelling, Error radiation): An approach to evaluating the impact of speech recognition errors that accounts for how a single misrecognized word degrades comprehension of its neighbouring words, not just the word itself. For example, misrecognizing "kitchen" as "kitten" makes the subsequent word "area"…
Formative Evaluation(also: Formative Usability Testing, Formative Assessment): Usability evaluation conducted early in the design process using prototypes, mockups, or wireframes to identify design problems and inform improvements. Formative testing is qualitative and iterative, focusing on understanding user behavior and identifying issues rather than…
GOMS(also: Goals, Operators, Methods, and Selection rules, KLM, Keystroke-Level Model): A family of human-computer interaction models used to predict how long it will take a user to complete a task with a given interface. GOMS stands for Goals, Operators, Methods, and Selection rules — the four components used to describe user behavior. The simplest variant, the…
Gamified evaluation(also: Game-based assessment, Gamified testing): A research methodology that incorporates game design elements — such as challenges, scoring, progressive difficulty, and rewards — into the evaluation of technology or user performance, to increase participant engagement, motivation, and retention. In accessibility research,…
Heuristic evaluation(also: Expert review, Usability inspection): A usability and accessibility evaluation method where trained evaluators systematically assess an interface against a set of recognized principles or guidelines (heuristics) to identify potential problems. In accessibility contexts, heuristic evaluation applies principles from…
Intrinsic Motivation Inventory(also: IMI): A standardized psychometric instrument used to assess participants' subjective experience during activities, measuring dimensions such as interest/enjoyment, perceived competence, effort/importance, value/usefulness, and felt pressure/tension. The IMI is commonly used in HCI and…
NASA-TLX(also: NASA Task Load Index, Task Load Index, NASA TLX): A widely used subjective workload assessment tool developed by NASA that measures perceived workload across six dimensions: mental demand, physical demand, temporal demand, performance, effort, and frustration. In accessibility research, NASA-TLX is frequently used to evaluate…
Participant pool bias(also: Sampling bias, Recruitment bias): Systematic distortion in research findings caused by the demographic characteristics and backgrounds of study participants, rather than by the technology or intervention being evaluated. In accessibility research, participant pool bias is especially consequential because…
Participatory Evaluation(also: PE): A research approach in which the people affected by a program, technology, or intervention are actively involved in evaluating it, rather than being passive subjects of assessment. In accessibility research, participatory evaluation means disabled people help define evaluation…
Perturbation testing(also: Counterfactual testing, Template-based testing): A bias evaluation methodology for NLP models that systematically substitutes identity-related terms (e.g., disability phrases) in otherwise identical sentences to measure whether the model produces different predictions based on the identity mention alone. By holding all other…
Psychometric validation(also: Psychometric evaluation, Instrument validation): The process of establishing that a measurement instrument (such as a questionnaire or scale) possesses adequate reliability (consistency of measurement), criterion validity (correlation with established measures), and construct validity (measuring the intended theoretical…
Response Bias(also: Acquiescence Bias, Yea-Saying Bias): A systematic tendency for research participants to respond in a particular way regardless of the actual content of the question, distorting data collection and analysis. In accessibility research involving people with intellectual disabilities, acquiescence bias — the tendency…
SUS(also: System Usability Scale): Abbreviation for System Usability Scale, a ten-item questionnaire developed by John Brooke in 1986 that produces a single usability score from 0 to 100 based on user ratings of agreement with statements about a system. SUS is widely used in accessibility and HCI research because…
Semantic distance(also: Semantic similarity, Word embedding distance): A computational measure of how different two words are in meaning, typically derived from word embedding models like word2vec that represent words as vectors in a high-dimensional space. In caption evaluation for DHH users, semantic distance between an ASR error and the intended…
Single-case experimental design(also: SCED, N-of-1 design, ABAB design): A rigorous research methodology that evaluates intervention effects by systematically alternating between baseline and treatment conditions within individual participants, using each person as their own control. Common variants include AB, ABA, ABAB, and multiple-baseline…
Spearman correlation(also: Spearman rank correlation, Spearman's rho): A non-parametric statistical measure of the strength and direction of the monotonic relationship between two ranked variables, ranging from -1 to +1. In accessibility evaluation research, Spearman correlation is used to assess how well automated metrics (such as Word Error Rate…
Summative Evaluation(also: Summative Usability Testing, Summative Assessment): Usability evaluation conducted on functional software or high-fidelity prototypes, typically later in the development process, to measure the effectiveness of specific design choices. Summative testing uses representative users performing representative tasks and often involves…
System Usability Scale(also: SUS): A widely used 10-item Likert scale questionnaire developed by John Brooke in 1996 that provides a quick, reliable measure of perceived usability. Scores range from 0 to 100, with higher scores indicating better usability. The SUS has been validated across thousands of studies,…
Tactile Accuracy: Tactile accuracy is an evaluation criterion for measuring how well a person perceives the shape information of an object in a tactile image through touch. Unlike "naming accuracy" (whether someone can name the object), tactile accuracy captures whether the person has obtained…
Upper Baseline(also: Gold Standard, Reference Standard): In accessibility evaluation research, an upper baseline is a high-quality reference stimulus used to establish the best achievable performance against which other systems are compared. For sign language animation studies, this might be a video of a human signer or a carefully…
Video intelligibility(also: Signal intelligibility, Visual signal clarity): The degree to which a video signal can be perceived and understood by the viewer, determined by technical parameters including frame rate, bit rate, spatial resolution, and codec quality. Video intelligibility is distinct from comprehension — a viewer may perceive clear hand…
Wizard-of-Oz study(also: WoZ study, Wizard of Oz method): A research methodology in which participants interact with a system they believe is automated, but which is actually operated partially or fully by a hidden human operator (the "wizard"). This approach allows researchers to evaluate user experience, interface design, and…
Word error rate(also: WER): The standard metric for evaluating automatic speech recognition accuracy, calculated as the number of substitutions, deletions, and insertions divided by the total number of words in the reference transcript. Research with DHH users has shown that WER correlates poorly with…
Word importance(also: Lexical importance, Information content): A measure of how critical a specific word is to the overall meaning of a sentence, typically computed using neural language models that estimate how predictable a word is from its context. In captioning evaluation, word importance helps determine the impact of ASR errors:…

33 results.

Category

Search results