How Much Does Expertise Matter? A Barrier Walkthrough Study with Experts and Non-Experts

Yeliz Yesilada, Giorgio Brajnik, Simon Harper · 2009 · Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '09) · doi:10.1145/1639642.1639678

Summary

This paper investigates whether expertise matters in manual web accessibility evaluation by comparing results from 19 expert and 51 non-expert judges using the Barrier Walkthrough (BW) method. The BW method is an analytical technique based on heuristic walkthrough where evaluators assess predefined barrier types against different disability categories (blind, low vision, motor impaired, hearing impaired, cognitive impaired) and rate their severity. Participants evaluated four web pages — Facebook, IMDB, a craft supplies shop (Quilts), and a hand-crafted goods shop (Sams) — which were chosen to represent both popular large sites and smaller community sites. Expert judges were experienced accessibility professionals (average self-rated knowledge of 4.6 out of 5, 47% working as consultants, 67% having tested more than 10 sites in the past 6 months), while non-experts were students in a web accessibility and usability course (average self-rated knowledge of 2.3 out of 5). The study examines three research questions: differences in true barrier types identified, differences in severity ratings, and differences in the validity and reliability of the BW method when used by each group.

Key findings

Expert judges spent significantly less time than non-experts (107 vs 298 minutes on average, a large effect size of d=2.22), rated themselves as more productive (3.3 vs 2.8) and more confident (3.9 vs 2.5) than non-experts. In terms of effectiveness, expert accuracy was 86.4% versus 79.2% for non-experts — a significant difference but one that also shows non-experts achieved a remarkably high level of accuracy. Experts achieved a higher F-measure (0.629 vs 0.465), indicating better balance between finding true barriers and avoiding false positives. Expert severity ratings were significantly higher than non-expert ratings overall (mean weighted severity of 0.344 for experts vs 0.322 for non-experts). Critically, reproducibility was substantially better for experts (mean 0.55 vs 0.38 for non-experts), meaning expert evaluations were more consistent across different judges. Expert agreement (intraclass correlation) was also higher (0.31 vs 0.28), though both values indicate moderate agreement at best. Interestingly, expertise was not uniformly impactful across all disability categories — the effect of expertise was most noticeable for blind and low vision user categories but less pronounced for motor impairments. Non-expert judges missed some barrier types entirely and also identified false barrier types that experts did not.

Relevance

This study provides the first empirical evidence on the role of expertise in web accessibility evaluation, filling a significant gap in the field. The findings have direct implications for organisations deciding who should conduct accessibility audits: while experts are significantly more effective, efficient, and reliable, non-experts can still achieve reasonable accuracy (around 79%), suggesting that with proper training and the right framework, broader participation in accessibility evaluation is feasible. The results are particularly relevant given WCAG 2.0's reliance on manual evaluation for certain conformance criteria — if non-expert evaluations are less reliable, conformance claims based on such evaluations may be less trustworthy. The study also reinforces the well-documented evaluator effect from usability research: different evaluators find different problems, even among experts. For practitioners, this means that multiple evaluators should always be used, and that investing in evaluator training is worthwhile. The finding that expertise matters more for some disability categories than others suggests training programmes should prioritise the areas where non-expert performance is weakest.

Tags: web accessibility · accessibility evaluation · barrier walkthrough · WCAG · evaluator effect · expertise · accessibility testing · research methodology

Standards referenced: WCAG 2.0