Transition of Accessibility Evaluation Tools to New Standards

Amaia Aizpurua, Myriam Arrue, Markel Vigo, Julio Abascal · 2009 · Proceedings of the 2009 International Cross-Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/1535654.1535662

Summary

This paper addresses a practical challenge that faced the web accessibility community following the release of WCAG 2.0 in December 2008: how to update the large ecosystem of existing automated evaluation tools that were built around WCAG 1.0. The authors present EvalAccess, a flexible evaluation framework built on three modules — the Guidelines Definition Manager (GDM) for creating and managing guideline sets, the Guidelines Pre-Processor (GPP) for transforming guidelines into executable evaluation queries, and the Guidelines Evaluation Module (GEM) for performing the actual evaluation and generating reports. The framework uses a Unified Guidelines Language (UGL), an XML-based specification language that allows guidelines to be defined independently of the evaluation engine. Test cases are matched to predefined patterns and translated into XQuery templates that can be executed against web page source code converted to XML. The key innovation is separating guideline definitions from evaluation logic, making it possible to incorporate new or updated guidelines without recoding the evaluation engine itself. The authors analyze both WCAG 1.0 and WCAG 2.0 in detail, cataloging the number of checkpoints, techniques, and test cases in each, and classifying tests as automatic, semi-automatic, or manual (generic problems requiring human judgment).

Key findings

The framework could express 55% of automatic tests and 16% of semi-automatic tests defined in WCAG 2.0 using its existing guideline management interface. After identifying these gaps, the authors discovered five new test case patterns needed to handle WCAG 2.0 requirements — such as verifying element-attribute relationships across sibling elements, checking nesting requirements, and validating OR conditions across element types. For WCAG 1.0, the framework covered 234 test cases across 65 checkpoints, with 144 automatic and 90 semi-automatic tests. WCAG 2.0 defined 138 test cases across 61 success criteria, of which 73 were automatically or semi-automatically evaluable (22 automatic, 51 semi-automatic) after removing duplicates across success criteria. The analysis revealed that WCAG 2.0 test cases are more complex than WCAG 1.0, requiring verification of relationships between multiple HTML elements and attributes rather than simple presence/absence checks.

Relevance

This paper captures an important moment in web accessibility history — the transition from WCAG 1.0 to 2.0 — and the engineering challenges it created for tool developers. The core insight remains relevant today as the community faces similar transitions with WCAG 2.1, 2.2, and the forthcoming WCAG 3.0: automated tools need flexible architectures that can adapt to evolving standards without complete rewrites. The finding that only a fraction of WCAG 2.0 checks could be fully automated (55% of automatic tests, 16% of semi-automatic) underscores a persistent reality in accessibility testing — automated tools are necessary but insufficient, and human judgment remains essential for comprehensive evaluation. For organizations managing accessibility testing programs, this paper reinforces the importance of selecting tools with flexible guideline management and not relying solely on automated results for compliance claims.

Tags: automated testing · web accessibility · WCAG transition · evaluation tools · accessibility guidelines

Standards referenced: WCAG 1.0 · WCAG 2.0 · Section 508