SAMBA: a semi-automatic method for measuring barriers of accessibility

Giorgio Brajnik, Raffaella Lomuscio · 2007 · Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '07) · doi:10.1145/1296843.1296853

Summary

Brajnik and Lomuscio argue that web accessibility cannot be managed without being measured, yet existing accessibility metrics are underdeveloped and almost universally tied to conformance with WCAG checkpoints rather than to the real-world barriers end users encounter. The paper surveys the state of the art, including Sullivan and Matson's Failure Rate, Zeng's Web Accessibility Barrier Score (WAB), the Unified Web Evaluation Methodology (UWEM), Arrue et al.'s Web Accessibility Quantitative Metric (WAQM), and the Accessibility Internet Rally (AIR) scoring sheet. The authors identify four open issues these metrics fail to address: measuring accessibility beyond mere conformance, combining automated results with expert judgment, accounting for the tool's error rate, and producing disability-specific scores that reflect barrier severity. In response they propose SAMBA (Semi-Automatic Method for measuring Barriers of Accessibility), a three-phase methodology that combines an automated testing tool with expert review via Brajnik's Barrier Walkthrough method. Phase 1 runs an automated tool (LIFT in their experiment) and maps the detected checkpoint violations to a catalogue of 35+ barrier types, each tagged with affected disability groups. Phase 2 draws a stratified random sample of potential barriers and asks human judges to rate each as false positive, minor, major, or critical within plausible user scenarios. Phase 3 combines these results with a barrier-density factor (barriers per line of HTML) to produce three indexes: a Raw Accessibility Index (AIr), an Unweighted Index (AIu), and a Weighted Accessibility Index (AIw) expressed as a confidence interval. The method was tested on roughly 1,500 pages across 15 websites, including university, news, and government sites.

Key findings

Roughly 27% of potential barriers flagged by LIFT across 1,050 sampled items (288 of 1,050) were false positives once reviewed by experts, underscoring why raw tool output should not be treated as a reliable accessibility score. Error rates varied sharply by disability group: up to 16% for 'no JavaScript' users versus roughly 5% for motor-disabled users on one site. The Weighted Accessibility Index produced narrower, higher-valued confidence intervals (roughly 0.62 to 0.96 across sites) because critical barriers carried a weight of 9 and major barriers a weight of 3; the tradeoff was reduced resolution between sites. AIw correlated strongly with the simpler AIr (Pearson's r = 0.945), suggesting that a linear regression could predict AIw from automated output alone for trend monitoring, with a maximum error of 7%. Correlation between SAMBA's indexes and WAQM was only moderate (Pearson 0.43 to 0.47), indicating that conformance-based metrics and severity-based metrics genuinely measure different things. Each judge needed about 90 minutes per site to review a 70-item sample, making the method practical at scale. Sensitivity analysis showed AIw is robust to small perturbations in judgments, site size, and severity weights. The authors note SAMBA still does not address false negatives (barriers the tool misses).

Relevance

For accessibility practitioners, this paper remains one of the clearest explanations of why conformance counts (for example, 'X WCAG violations found') are an impoverished proxy for real accessibility. The separation of sampling, investigation, and metric into distinct processes is still a useful frame when designing audit programmes or monitoring dashboards. The demonstrated 27% false-positive rate on an automated tool is a salutary reminder that automated scanners should never be used as the sole evidence of accessibility, a point practitioners still have to argue against in procurement and compliance contexts. The disability-specific severity matrix anticipates modern persona-based reporting and ATAG/WCAG-EM practices. Limitations are that SAMBA depends on the quality of the tool-to-barrier mapping table, ignores false negatives, has never been validated against sites of known accessibility, and uses WCAG 1.0 era tooling. The methodology, however, translates readily to modern tools (axe, WAVE, Lighthouse) and to WCAG 2.x.

Tags: web accessibility · accessibility metrics · accessibility evaluation · accessibility testing · conformance testing · barrier walkthrough · expert review · automated testing · quality assurance

Standards referenced: WCAG 1.0 · WCAG 2.0 · Section 508 · UWEM 1.0