← All reviews

Reliability Aware Web Accessibility Experience Metric

Shuyi Song, Jiajun Bu, Chengchao Shen, Andreas Artmeier, Zhi Yu, Qin Zhou · 2018 · Proceedings of the 15th International Web for All Conference (W4A) · doi:10.1145/3192714.3192836

Summary

This paper introduces RA-WAEM (Reliability Aware Web Accessibility Experience Metric), a novel approach to measuring website accessibility that incorporates actual user experience from people with disabilities while accounting for the varying reliability of different evaluators. Traditional accessibility metrics derive checkpoint weights directly from WCAG priority levels, but research has shown poor correlation between these priority levels and how users actually experience accessibility barriers. RA-WAEM addresses this by using Partial User Experience Orders (PUEXOs) — pairwise comparisons where participants indicate which of two websites provided a better browsing experience. The key innovation is a reliability-aware model: not all evaluators are equally skilled at objectively assessing accessibility barriers, and less experienced users tend to rate inaccessible sites as more accessible because they cannot detect subtle barriers. The algorithm uses Expectation Maximization (EM) to iteratively estimate both the optimal checkpoint weights and each participant's reliability level, giving more weight to comparisons from more reliable evaluators. The study was conducted in collaboration with the China Disabled Persons' Federation, using China's first whole-website accessibility evaluation system.

Key findings

RA-WAEM achieved a satisfied percentage (SP) of 86.89% in reflecting user experience through ten-fold cross validation, outperforming all existing metrics: WAEM (85.79%), WAQM (76.55%), WAB (74.07%), and UWEM/A3 (70.38%). The study evaluated 46 Chinese government websites comprising 323,098 web pages against 30 WCAG checkpoints, collecting user experience data from 122 people including 49 people with disabilities (motor, hearing, visual impairment, speech disability, and color weakness), 43 trained accessibility experts, and 30 non-disabled undergraduate volunteers. Critically, when unreliable volunteer data was incrementally added, RA-WAEM maintained 82.17% SP while WAEM dropped to 74.47%, demonstrating the value of reliability weighting. The EM algorithm successfully distinguished reliable from unreliable evaluators — participants (experts and people with disabilities) showed generally higher reliability levels than untrained volunteers. The results confirm that WCAG priority levels alone are poor proxies for actual user impact, and that user experience data, properly weighted for reliability, produces more meaningful accessibility scores.

Relevance

This research has significant implications for how organizations measure and benchmark web accessibility. The finding that WCAG priority levels correlate poorly with actual user experience challenges the common practice of treating Priority 1 violations as inherently more severe than Priority 2 or 3 issues. For practitioners conducting accessibility audits, RA-WAEM suggests that incorporating real user feedback — even from evaluators with varying expertise — produces more meaningful results than purely automated or standards-based scoring, provided evaluator reliability is accounted for. The reliability-aware approach is particularly practical because it means organizations do not need exclusively expert evaluators; the algorithm can extract useful signal even from less experienced participants. The large-scale Chinese dataset also provides valuable non-Western accessibility data, highlighting that accessibility evaluation research and practice must be globally inclusive. The core limitation is the reliance on pairwise comparisons, which scale quadratically with the number of websites being evaluated.

Tags: accessibility metrics · user experience · accessibility evaluation · machine learning · web accessibility · WCAG compliance · automated testing · crowdsourcing · reliability · optimization algorithm · China · disability inclusion · visual impairment · motor disability · hearing disability

Standards referenced: WCAG 1.0