An Optimal Sampling Method for Web Accessibility Quantitative Metric

Mengni Zhang, Can Wang, Jiajun Bu, Zhi Yu, Yi Lu, Ruijie Zhang, Chun Chen · 2015 · Proceedings of the 12th International Web for All Conference (W4A 2015) · doi:10.1145/2745555.2746663

Summary

This short paper proposes OPS-WAQM, an optimal page sampling method specifically designed for use with the Web Accessibility Quantitative Metric (WAQM) when evaluating large websites. The core problem is that evaluating every page of a large website for accessibility is prohibitively expensive, so practitioners must sample a subset of pages. However, existing sampling methods (ad hoc, uniform random, random walk, stratified) are metric-independent and can produce errors of up to 20%. The authors observe that sampling accuracy depends heavily on the evaluation metric being used, so they develop a method optimized specifically for WAQM. WAQM weights pages by depth from the homepage using an exponential decay function (weight = e to the negative i, where i is page depth), reflecting that pages closer to the homepage are visited more frequently and therefore their accessibility issues affect more users. OPS-WAQM partitions a website into layers by page depth and uses a greedy algorithm to determine the optimal number of pages to sample from each layer. The algorithm iteratively allocates additional samples to whichever layer would most reduce the total weighted sampling error, continuing until the desired total sample size is reached. Pages within each layer are then randomly sampled.

Key findings

The method was validated on a dataset of 20 Chinese government websites (ministries and provincial governments) totaling 365,780 web pages, ranging from 2,039 to 50,493 pages per site. The accessibility data came from the Chinese Government Website Accessibility Evaluation Campaign using China's national standard YD/T1761-2012. Compared to uniform random sampling, OPS-WAQM consistently produced lower sampling errors across all 20 websites. For sites with shallow depth (2 levels), the improvement was modest since there is less variation between layers. For deeper sites (5-6 levels), OPS-WAQM showed substantially lower and more consistent error rates. The greedy algorithm is computationally efficient, requiring only the variance of accessibility scores within each depth layer and the layer sizes, which can be estimated from a preliminary crawl. The key statistical insight is that the sampling error for WAQM is minimized when more samples are allocated to layers with higher variance and higher WAQM weight, rather than distributing samples uniformly.

Relevance

This paper addresses a practical challenge that accessibility auditors face regularly: how to efficiently evaluate large websites without testing every page. While WCAG-EM (the W3C's evaluation methodology) recommends sampling and provides general guidance, it does not specify how to optimize sample selection for particular metrics. OPS-WAQM provides a principled statistical approach that could reduce audit costs while maintaining evaluation accuracy. The depth-weighted approach aligns with user experience reality — homepage and top-level pages receive more traffic and their accessibility problems impact more users. For organizations conducting regular accessibility monitoring of large sites, this method could make ongoing evaluation more feasible. The limitation is that the method requires knowing the site structure (depth layers and page counts) in advance, necessitating a preliminary crawl, and the validation was limited to Chinese government sites under a single national standard.

Tags: accessibility evaluation · sampling methods · accessibility metrics · WAQM · large-scale evaluation · government websites · quantitative metrics · statistical methods

Standards referenced: WCAG 2.0 · WCAG-EM · UWEM · YD/T1761-2012