The Transparency of Automatic Accessibility Evaluation Tools

Parvaneh Parvin, Vanessa Palumbo, Marco Manca, Fabio Paternò · 2021 · Proceedings of the 18th International Web for All Conference (W4A '21) · doi:10.1145/3430263.3452436

Summary

This paper examines a critical but often overlooked problem in automated accessibility testing: the lack of transparency in how evaluation tools operate and present their results. The authors observe that different accessibility evaluation tools frequently produce variable results when applied to the same web content, which confuses users and undermines trust in automated testing. The paper proposes a set of transparency criteria that tools should meet, derived from the authors' extensive experience and preliminary user testing. These criteria cover four key areas: what standards, success criteria, and techniques a tool considers; how it categorizes accessibility issues; how reported information is organized (including whether overall accessibility metrics are provided and whether different report formats serve different user types); and whether the tool can evaluate dynamic web pages. Using these criteria, the authors analyse four representative free tools — MAUVE++, WAVE, AChecker, and QualWeb — comparing their features, capabilities, and how transparently they communicate their functioning and limitations to users. The analysis reveals significant variation in how these tools handle each transparency criterion, with no single tool fully satisfying all requirements.

Key findings

The comparative analysis reveals notable gaps across all four tools. Regarding standards coverage, MAUVE++ supports the most techniques (107 HTML, 8 CSS for WCAG 2.1), while WAVE supports only 23 HTML/CSS criteria and AChecker does not even disclose which specific criteria it covers. Issue categorization varies significantly: QualWeb uses the W3C-recommended classification (passed, failed, warning, not applicable), while AChecker uses its own terminology (known, likely, and potential problems) and WAVE reports only errors and alerts. Only MAUVE++ provides quantitative accessibility metrics — an Accessibility Percentage and an Evaluation Completeness score — giving users context for interpreting results. For result presentation, MAUVE++ offers dual views for developers and non-technical users, WAVE overlays icons directly on pages (which can be confusing with absolute positioning), AChecker provides only code-oriented reports, and QualWeb blends code views with visual previews. Critically, AChecker cannot evaluate dynamic web pages at all, while the others support this through browser extensions or server-side rendering. None of the tools fully satisfies all transparency criteria.

Relevance

For accessibility practitioners, this paper provides a practical framework for evaluating and selecting automated testing tools with greater awareness of their limitations. The transparency criteria proposed — standards coverage disclosure, issue categorization clarity, result presentation quality, and dynamic page support — serve as a useful checklist when choosing or comparing tools for organizational use. The findings reinforce a well-known but often underappreciated point: automated tools alone cannot provide a complete accessibility assessment, and the variability between tools means that relying on a single tool gives an incomplete picture. Practitioners should understand what each tool covers and, equally important, what it does not. The paper also highlights the need for tool developers to design more user-centred evaluation experiences, particularly by providing multiple report formats for different audiences and being explicit about the scope and limitations of their analysis.

Tags: automated testing · accessibility evaluation · WCAG · transparency · web accessibility

Standards referenced: WCAG 2.0 · WCAG 2.1 · Section 508 · EARL