Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

AI Auditing(also: Algorithmic Auditing, AI Audit): The systematic evaluation of an AI system's outputs, behaviour, or training data to identify harms such as bias, stereotype reproduction, or accessibility failures. Audits may be conducted by industry professionals, external researchers, regulators, or end users, and are…
Accessibility Evaluation Method(also: AEM, Accessibility Testing Method): A structured approach or procedure used to assess the accessibility of digital products, websites, or applications. Accessibility evaluation methods include conformance review (checking against standards like WCAG), barrier walkthrough (assessing barriers in context of specific…
Accessibility Heuristics(also: Accessibility Heuristic Evaluation): A set of broad usability and accessibility principles used to evaluate digital products for barriers that may prevent people with disabilities from using them effectively. Unlike detailed technical checklists such as WCAG success criteria, accessibility heuristics provide…
Accessibility Inspection(also: Accessibility Inspection Method, Accessibility Audit): An evaluation approach in which an expert or designer reviews an interface against a set of accessibility criteria without recruiting end users, analogous to usability inspection methods such as heuristic evaluation, cognitive walkthrough, or guideline review. Common inspection…
Accessibility Metric(also: Web Accessibility Metric, Accessibility Score): A quantitative measure used to assess and compare the accessibility quality of web pages or websites. Accessibility metrics typically calculate a score (often 0-100%) based on the number and severity of WCAG violations found, weighted by conformance level (Level A weighted…
Accessibility Persona(also: Disability Persona, Inclusive Persona): A detailed, realistic description of a hypothetical user with specific disabilities, assistive technology configurations, and usage contexts, used during design and evaluation to help teams consider accessibility requirements from the perspective of real people. Accessibility…
Accessibility-in-Use(also: Accessibility in Use): A concept describing how well accessibility metrics predict the actual effects that real accessibility problems will have on the quality of interaction as perceived by real users when interacting with real pages for achieving real goals. Unlike traditional conformance testing…
BLEU Score(also: BiLingual Evaluation Understudy, BLEU): A metric for evaluating the quality of machine-generated text by comparing it to one or more reference (human-written) translations. BLEU calculates precision by counting how many n-grams (sequences of words) in the predicted text match n-grams in the reference text, with BLEU-1…
Back-Translation(also: Reverse Translation): A quality assurance method used in survey and instrument translation where a translated version is independently translated back into the original language by a different translator. The back-translated text is then compared with the original to identify meaning losses or…
Barrier Walkthrough(also: BW Method): The Barrier Walkthrough is a structured expert evaluation method for assessing web accessibility in which evaluators systematically examine a website against a predefined set of accessibility barriers rather than individual guideline success criteria. Unlike conformance-based…
Barrier Walkthrough(also: Structured Walkthrough): A systematic accessibility evaluation method that guides evaluators through the process of identifying barriers that people with disabilities may encounter on a website. Unlike a standard conformance review that checks against all WCAG success criteria, a barrier walkthrough…
Between-Subjects Design(also: Between-Groups Design, Independent-Groups Design): A between-subjects design is an experimental research design in which each participant is assigned to only one condition, and the conditions are compared across different groups of people. It contrasts with within-subjects (repeated-measures) designs, in which every participant…
Blurred Vision Simulation(also: Vision Simulation, Low Vision Simulation): A technique used in accessibility evaluation where evaluators simulate the visual experience of people with reduced visual acuity by artificially blurring their view of a website or application. Methods include using low vision simulation glasses (commercially available from…
Conformance Testing(also: Compliance Testing, Guideline Review): An accessibility evaluation method that checks whether a website or digital product meets the requirements specified by accessibility guidelines or standards such as WCAG. Conformance testing can be performed manually by human evaluators or through automated testing tools that…
Cooperative Evaluation(also: Cooperative Usability Evaluation, Modified Think-Aloud): A usability evaluation method in which the researcher and participant work together as collaborators rather than following a strict observer-subject protocol. Unlike standard controlled experiments, cooperative evaluation allows participants to think aloud, ask questions, and…
Correctness(also: Precision, Validity): In the context of accessibility evaluation, correctness (also called precision) is the proportion of reported accessibility problems that are true problems — that is, issues that genuinely affect users with disabilities rather than false positives. A high correctness rate means…
Counterfactual Explanation(also: Counterfactual XAI): An explanation technique that communicates what minimal change to the input would have produced a different output from an AI model, for example 'if the applicant's income had been $5,000 higher, the loan would have been approved'. Counterfactual explanations are legally…
Criterion Validity: A psychometric property indicating whether an instrument's scores relate to some external measurable criterion. In practice, this is assessed by comparing the instrument's results with scores from another established measurement tool administered concurrently. For example, when…
Critical Incident Questionnaire(also: CIQ): A short, open-ended reflective tool developed by Stephen Brookfield for teaching and learning contexts, typically consisting of five questions asking participants to recall moments from a recent experience that were most engaging, surprising, confusing, distancing, or affirming.…
Cursor Deviation(also: Cursor Drift, Path Deviation): The difference between the actual path taken by a cursor and the ideal straight-line path between the starting point and the target. Cursor deviation is a key performance metric in evaluating alternative input devices such as head controls, eye trackers, and adapted mice. Higher…
Disability Simulation(also: Disability Simulator, Empathy Exercise): A method of accessibility evaluation where non-disabled people experience approximations of the barriers faced by people with disabilities when using technology or navigating environments. In web accessibility, disability simulation tools like IBM's aDesigner visualize how…
Disability-Centered Evaluation(also: Disability-Centric Evaluation, Disability-First Evaluation): An approach to evaluating AI systems, tools, or research artefacts that places disabled people's lived experiences, information needs, and failure contexts at the centre of study design — including which data are collected, how ground truth is annotated, which models are tested,…
Discriminative Ability(also: Discriminative ability of a metric, Discriminability): In accessibility research methodology, the property of an evaluation metric to reveal statistically significant differences between stimuli that are known to differ along the dimension being measured. For example, a comprehension-question metric has discriminative ability for…
End-User Auditing(also: User-Led Auditing, End User Audits): An approach to AI auditing in which everyday users — rather than professional evaluators — identify problems, biases, or harms in AI outputs based on their lived experience. End-user auditing is particularly valuable for surfacing harms against minoritised communities (including…
Equal Error Rate(also: EER, Crossover Error Rate): A metric used to evaluate biometric system performance, representing the point at which the false acceptance rate (wrongly accepting unauthorized users) equals the false rejection rate (wrongly rejecting authorized users). Lower EER values indicate better system accuracy. In…
Evaluation Reliability(also: Inter-rater Reliability, Evaluator Agreement): The extent to which independent accessibility evaluations of the same content produce consistent results. High reliability means that different evaluators using the same method will identify similar sets of accessibility problems, while low reliability indicates that results…
Evaluator Effect: The phenomenon in accessibility and usability evaluation where different evaluators examining the same interface detect different sets of problems and may reach different conclusions about the same issues. The evaluator effect means that no single evaluation can achieve 100%…
F-measure(also: F-score, F1 Score): A metric that combines correctness (precision) and sensitivity (recall) into a single balanced score, calculated as the harmonic mean of the two values. In accessibility evaluation research, the F-measure provides a single number representing the overall effectiveness of an…
FaceReader(also: Noldus FaceReader): A commercial facial-expression recognition software (developed by Noldus) that uses computer vision and deep learning to automatically classify faces into basic emotions (neutral, happy, sad, angry, surprised, scared, disgusted) and to estimate emotional valence and arousal in…
Feasibility Study(also: Feasibility Trial, Pilot Study): A feasibility study is a small-scale investigation conducted before a full-scale trial to determine whether a planned intervention or system can be delivered as intended in its real-world setting. Feasibility work asks practical questions — Can we recruit? Can participants…
Friedman Test(also: Friedman Rank Test): The Friedman test is a non-parametric statistical test used to detect differences across three or more related samples - for example, the same participants rating three interface conditions. It ranks each participant's responses across conditions and tests whether the rank sums…
Gold-Standard Evaluation(also: Gold Standard, Reference Standard Evaluation): An evaluation methodology in natural language processing and generation where system output is compared against a set of pre-established correct or ideal responses. In text-based systems, gold-standard strings are human-produced reference outputs that serve as benchmarks.…
Haptic Experience Model(also: HX Model, HX): A framework proposed by Kim and Schneider for evaluating user experience with haptic technologies along five perceptual-experiential dimensions: autotelics (the pleasantness of the sensation), realism (fidelity to the depicted phenomenon), harmony (fit with accompanying…
Heuristic Evaluation(also: Expert Review, Heuristic Review): An accessibility or usability evaluation method in which evaluators examine an interface against a set of recognised principles (heuristics) to identify potential problems. In web accessibility, heuristic evaluation typically involves checking pages against WCAG success criteria…
Heuristic Walkthrough(also: Heuristic walk-through): A usability evaluation method proposed by Andrew Sears (1997) that combines scenario-based cognitive walkthrough with heuristic evaluation. Evaluators work through realistic user tasks using a prioritised list of heuristics, surfacing both task-specific and general usability…
Index of Difficulty(also: ID, Fitts ID): The Index of Difficulty (ID) is the central quantity in Fitts' law that captures how hard a rapid aimed pointing movement is, computed as log₂(A/W + 1) in the Shannon formulation, where A is the amplitude (distance to the target) and W is the target width along the movement…
Internal Reliability(also: Internal Consistency): A psychometric property that measures whether all items in a questionnaire or instrument contribute consistently to the overall score. It is commonly assessed using Cronbach's alpha, where values of 0.7 and above are generally considered acceptable. In accessibility research,…
Interpersonal Reactivity Index(also: IRI): A widely used multidimensional self-report measure of empathy developed by Mark H. Davis in 1980. The instrument contains four seven-item subscales: perspective taking (the tendency to adopt another's point of view), empathic concern (feelings of warmth and compassion for…
Keystroke-Level Model(also: KLM): A simplified predictive model from human-computer interaction research, originally developed by Card, Moran, and Newell, that estimates task completion time by decomposing user interactions into elementary operations such as keystrokes, pointing movements, mouse clicks, and…
LIME(also: Local Interpretable Model-agnostic Explanations): An explainable AI technique, introduced by Ribeiro et al. in 2016, that approximates any black-box model's behaviour around a single prediction by fitting a simple interpretable model (usually sparse linear regression) to perturbed versions of the input. The resulting feature…
LLM Self-Reflection(also: AI Self-Assessment, Model Self-Evaluation): A technique in which a large language model is prompted to evaluate and critique its own output, identifying errors, gaps, or areas for improvement. In the context of accessibility, LLM self-reflection involves asking the model to assess whether the code or UI it generated meets…
LLM-as-Judge(also: LLM as a Judge, Model-as-Judge): An evaluation methodology in which a large language model is prompted to assess the quality of some artifact — generated text, code, a UI, or a response from another model — according to a structured rubric. LLM-as-judge is attractive because it scales automated evaluation to…
Literacy Bias(also: Literacy bias of a metric): In accessibility research methodology, a literacy bias describes the phenomenon where an evaluation metric systematically produces different scores for participants with different reading-literacy levels, independent of the characteristic being measured. For example,…
Manual Accessibility Testing(also: Manual Testing, Manual Evaluation, Human Testing): The process of evaluating web content accessibility through direct human inspection rather than automated tools. Manual testing is essential because many WCAG success criteria cannot be fully evaluated by automated means — they require human judgment about whether content is…
N-back Task(also: N-back, 2-back Task): A working-memory paradigm in which participants view or hear a sequence of stimuli (letters, digits, positions) and, on each trial, respond when the current stimulus matches the one presented N steps earlier. Higher N levels place greater load on working memory and executive…
NGOMSL(also: Natural GOMS Language): A structured notation for writing GOMS (Goals, Operators, Methods, Selection rules) models in a program-like form that is readable by humans. NGOMSL was developed by David Kieras as a more formal variant of GOMS that includes selection rules and allows operators at the keystroke…
OPTIMAL-EM(also: Optimised Evaluation Methodology): A web accessibility evaluation methodology proposed by Hambley, Yesilada, Vigo, and Harper to complement the W3C's WCAG-EM by providing a statistically grounded, complexity-driven method for selecting representative pages from a large website. OPTIMAL-EM comprises six metrics —…
Personalized Accessibility(also: Personalized Web Accessibility, User-Tailored Accessibility): An approach to accessibility evaluation and design that considers the specific disability profile, capabilities, and needs of individual users rather than treating accessibility as a single universal property. Personalized accessibility evaluation tools filter WCAG success…
Pluralistic Walkthrough(also: Pluralistic Usability Walkthrough): A group usability inspection method, introduced by Randolph Bias in 1994, in which users, developers, and usability specialists step through a task scenario together, each writing down the actions they would take at every screen before discussing as a group. It extends the…
PrEmo(also: Product Emotion Measurement Instrument): A non-verbal self-report tool for measuring emotional responses, developed by Pieter Desmet. PrEmo presents users with 14 cartoon-like icons representing seven positive emotions (joy, admiration, pride, hope, satisfaction, fascination, desire) and seven negative emotions…

Category

Search results