Benchmark dataset

Also known as: Evaluation dataset, Test benchmark

A standardized dataset used to evaluate and compare the performance of AI models, algorithms, or systems against established baselines. In accessibility, the absence of benchmark datasets that include people with disabilities means disparate performance across disability subgroups goes undetected, as aggregate accuracy metrics hide failures for underrepresented populations. Creating inclusive benchmarks requires balancing data collection from disabled users with privacy and ethical concerns.

Category: artificial intelligence · data science · evaluation

Related: Algorithmic bias · Datasheets for datasets · Differential privacy

Sources

https://doi.org/10.1145/3386296.3386298