A Macroscopic Web Accessibility Evaluation at Different Processing Phases

Nádia Fernandes, Luís Carriço · 2012 · Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/2207016.2207025

Summary

This paper investigates a fundamental question about automated accessibility evaluation: does it matter whether you evaluate a web page as it arrives from the server (before browser processing) or as the user actually experiences it (after browser processing, including JavaScript execution and DOM manipulation)? Most automated accessibility tools evaluate the raw HTML returned by the initial HTTP request, but modern web pages are heavily modified by browser-side scripting before users interact with them. The authors evaluated over 24,000 web pages from the Portuguese Web Archive (2008 collection) using the QualWeb evaluation framework with PhantomJS (a headless WebKit browser) to simulate both processing phases. They applied 18 HTML WCAG 2.0 techniques corresponding to 15 success criteria, measuring results using three established metrics: conservative rate (warnings treated as failures), optimistic rate (warnings treated as passes), and strict rate (warnings excluded from calculation). The use of PhantomJS solved earlier methodological problems by ensuring the same page was evaluated at both phases without data injection artifacts.

Key findings

Browser processing dramatically changed the accessibility landscape of evaluated pages. The number of HTML elements grew by approximately 68% after processing (from an average of 1010 to 1710 elements per page). Passing elements increased by approximately 867% (from 9 to 87 average per page), failing elements by 282% (from 46 to 176), and warnings by 72% (from 262 to 451). The most striking finding was a convergence effect: after browser processing, accessibility quality distributions became narrower and more uniform. The worst-performing pages improved while the best-performing pages got worse, resulting in a more homogeneous middle ground. The authors attribute this to reusable code — templates, libraries, and frameworks injected during browser processing tend to have medium accessibility quality, pulling both extremes toward the center. For the strict rate, the average increased from 0.11 to 0.35 after processing, with pages scoring above 85% largely disappearing. The conservative rate showed average quality increasing after processing, while the optimistic rate showed it decreasing — confirming that the processing phase fundamentally changes evaluation outcomes.

Relevance

This study has direct implications for how organizations conduct accessibility audits and monitoring. The finding that pre-processing and post-processing evaluations yield statistically different results means that traditional automated tools evaluating only server-delivered HTML produce incomplete and potentially misleading results — a problem that has only intensified with the rise of single-page applications, client-side rendering frameworks, and heavy JavaScript usage since 2012. The convergence effect driven by reusable templates and libraries suggests a practical intervention point: improving the accessibility of popular frameworks, component libraries, and CMS templates can have outsized impact across the web, since this shared code gets injected into many pages. For accessibility practitioners, the takeaway is clear: evaluate what users actually experience, not just what the server sends. Modern tools like axe-core integrated into browser environments reflect this principle, but many large-scale web monitoring services still primarily evaluate server-delivered content.

Tags: automated testing · large-scale evaluation · web accessibility · browser processing · dynamic content · accessibility metrics · WCAG evaluation

Standards referenced: WCAG 2.0