A Task Assignment Strategy for Crowdsourcing-Based Web Accessibility Evaluation System

Liangcheng Li, Can Wang, Shuyi Song, Zhi Yu, Fenqin Zhou, Jiajun Bu · 2017 · Proceedings of the 14th International Web for All Conference (W4A) · doi:10.1145/3058555.3058573

Summary

This paper addresses a practical challenge in scaling web accessibility evaluation: how to effectively assign manual evaluation tasks to volunteer crowdsource workers with varying levels of expertise. While automated tools can check many accessibility checkpoints, they cannot handle contextualized or semantic checks — such as evaluating CAPTCHA alternatives, detecting keyboard traps, or assessing skip navigation links — which require human judgment. Expert manual evaluation is accurate but expensive and does not scale. Crowdsourcing offers a middle path, but randomly assigning complex evaluation tasks to inexperienced volunteers produces poor accuracy. The researchers propose Evaluator-Decision-Based Assignment (EDBA), a machine learning-driven task assignment strategy that matches evaluation tasks to volunteers based on their demonstrated abilities. The system tracks four behavioral indicators from evaluators's historical records: accuracy (results matching expert review), errors, give-ups (abandoned tasks), and timeouts. These metrics are combined into a cost model trained using least square loss and gradient descent, with parameters weighted by expert-defined checkpoint weights from the China Disabled Persons Federation that reflect each checkpoint's difficulty and importance. A greedy algorithm then produces an optimal assignment map that minimizes total expected cost while balancing workload across evaluators.

Key findings

Experiments on the Chinese Web Accessibility Evaluation System using 20 checkpoint types and 7 volunteer evaluators showed that EDBA consistently outperformed random assignment in accuracy across nearly all checkpoint categories. The system was tested against the Chinese government standard YD/T 1761-2012, which defines three accessibility levels (basic, reinforced, and high). EDBA also achieved significantly more balanced task distribution, with a variance of 0.9524 compared to 5.6294 for random assignment — meaning workload was spread more evenly, preventing evaluator burnout and enabling novice participation. The one area where EDBA showed minimal improvement was the most complex checkpoint (Skipping Over Navigation at the highest accessibility level), where both strategies produced lower accuracy, suggesting some tasks remain inherently difficult regardless of assignment optimization. Evaluator feedback confirmed satisfaction with EDBA assignments, reporting they felt suited to their assigned checkpoints and did not experience frustration during evaluation.

Relevance

This research is directly relevant to organizations trying to scale accessibility auditing beyond small expert teams. The core insight — that not all accessibility checks are equal in difficulty, and that matching evaluator expertise to task complexity dramatically improves accuracy — has practical implications for any crowdsourced or distributed accessibility testing program. The work demonstrates that machine learning can effectively model evaluator competence from behavioral signals, offering a path toward more reliable community-driven accessibility evaluation. For practitioners, the paper also provides a useful taxonomy of 20 manual accessibility checkpoints organized by difficulty and importance, which could inform how organizations prioritize their own manual testing efforts. The limitation is that the system was validated only with the Chinese YD/T 1761-2012 standard rather than WCAG, though the underlying approach is standard-agnostic and could be adapted to any conformance framework.

Tags: web accessibility evaluation · crowdsourcing · machine learning · automated testing · conformance testing · task assignment

Standards referenced: YD/T 1761-2012