Web not for all: a large scale study of web accessibility

Rui Lopes, Daniel Gomes, Luís Carriço · 2010 · Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/1805986.1806001

Summary

This paper presents the first large-scale automated accessibility evaluation of nearly 30 million web pages from the Portuguese Web Archive. The researchers implemented 39 WCAG 1.0 checkpoints (priorities 1 and 2) based on the Unified Web Evaluation Methodology (UWEM) and applied them to a collection crawled primarily from the .pt domain. Each checkpoint evaluation classified HTML elements as PASS (compliance verified), FAIL (compliance unachieved), or WARN (compliance impossible to verify automatically). To account for the ambiguity of warnings, the study defined three metrics: a conservative rate (warnings treated as failures), an optimistic rate (warnings treated as passes), and a strict rate (warnings excluded entirely). The evaluation processed over 40 billion HTML elements across the collection, finding an average of approximately 1,451 HTML elements per page, of which only about 56 (3.89%) fully met accessibility criteria, while 103 (7.15%) failed and 1,291 (89%) triggered warnings. The massive proportion of warnings highlights the fundamental limitation of automated accessibility testing.

Key findings

The study revealed starkly different pictures of web accessibility depending on how warnings are interpreted. Under the conservative metric, accessibility quality shows exponential decay starting around 5% compliance, with virtually no pages reaching high levels. Under the optimistic metric, quality rapidly rises with a mean around 90% — reflecting how non-expert developers might perceive their sites' accessibility when dismissing warnings. The strict rate (errors only) shows a near-constant distribution, meaning critical accessibility problems are encountered with roughly equal probability regardless of quality level. A key finding is the correlation between page complexity (number of HTML elements) and accessibility: simpler pages consistently had better accessibility scores, with no page in the entire collection having both a low node count and poor accessibility. The authors hypothesize this occurs because simpler pages leave less margin for error and are more manageable. This supports a practical recommendation that developers follow a simplicity approach to page structure to make accessibility compliance more achievable.

Relevance

This study remains one of the largest automated accessibility evaluations ever conducted and its findings continue to resonate. The core insight — that the perceived state of web accessibility varies dramatically depending on how evaluation warnings are interpreted — exposes a persistent problem in the field. Non-expert developers who run automated checkers and see mostly warnings (rather than clear failures) may wrongly conclude their sites are reasonably accessible. The correlation between page complexity and accessibility quality provides an evidence-based argument for simplicity in web design that practitioners can cite when advocating for cleaner markup. The three-metric framework (conservative, optimistic, strict) offers a valuable methodology for accessibility research that accounts for the inherent ambiguity of automated testing. For organizations planning accessibility remediation, the finding that simpler pages are more accessible suggests that reducing page complexity may be as effective as targeted accessibility fixes. The study also demonstrates the research value of web archives for longitudinal accessibility monitoring — an approach that has since been adopted by other national web archive initiatives.

Tags: automated testing · accessibility evaluation · WCAG compliance · web science · accessibility metrics · large-scale study

Standards referenced: WCAG 1.0 · UWEM