An analysis of personalized web accessibility

Nádia Fernandes, Nikolaos Kaklanis, Konstantinos Votis, Dimitrios Tzovaras, Luís Carriço · 2014 · Proceedings of the 11th Web for All Conference (W4A) · doi:10.1145/2596695.2596698

Summary

This paper presents an experimental study comparing generic (all-disabilities) web accessibility evaluations with personalized evaluations tailored to specific disability profiles. Using the WaaT (Web Accessibility Assessment Tool) evaluator, the authors assessed 39 home pages from the Alexa Top 500 websites from five perspectives: generic (all disabilities), upper limb impairment (cerebral palsy, multiple sclerosis, Parkinson's, arthritis, etc.), cognitive impairment (dementia, Down syndrome, ADHD, traumatic brain injury, etc.), vision impairment (blindness, colour-blindness, low vision, etc.), and hearing impairment (conductive, sensorineural, profound hearing loss, deaf-blindness). Each page was evaluated 5 times (once per perspective), producing 195 total evaluations. WaaT uses ontology-based reasoning (SWRL rules and SPARQL queries) to select which WCAG 2.0 techniques apply to each disability profile, then evaluates only those relevant techniques.

Key findings

The study confirmed that personalized evaluations produce significantly different results from generic evaluations. On average, the ratio of fails detected for specific disability profiles compared to the generic case was: upper limb 0.38, cognitive 0.32, vision 0.65, and hearing 0.70. This means a cognitive impairment evaluation finds only about one-third of the issues a generic evaluation would flag — the rest are irrelevant to that user group. ANOVA confirmed significant differences between disability groups (p=0.000 for warnings, p=0.001 for fails), with significant pairwise differences between upper limb and vision, upper limb and hearing, and cognitive and visual impairments. A particularly revealing example was the Yandex homepage: the hearing impairment evaluation found zero fails and zero warnings, while the generic evaluation found 528 fails and 756 warnings — meaning the page was fully accessible for hearing-impaired users despite appearing highly inaccessible in a generic report. The authors also demonstrated that standard accessibility scoring metrics are inadequate for personalized evaluation, as the normalisation process diminishes real differences between perspectives and can produce misleading scores — the NY Times homepage scored 61% generically and 59% for hearing impairments, despite having 12 times more fails in the generic case (1069 vs 87).

Relevance

This research challenges the implicit assumption in most accessibility evaluation that a single score or pass/fail result meaningfully represents the experience of all users with disabilities. The finding that cognitive and upper limb impairment profiles encounter roughly one-third of the barriers flagged by generic evaluation has practical implications: organisations could prioritise remediation efforts based on their actual user demographics rather than treating all WCAG failures as equally urgent. The personalized approach also enables a fundamentally different user-facing question — "Is this page accessible to me?" rather than "Is this page accessible?" — which could help users decide whether to invest time attempting to use a site. The critique of existing accessibility metrics is particularly important: if scoring systems obscure meaningful differences between disability perspectives, they may give organisations a false sense of equivalence across user groups. This work supports the broader argument that accessibility is not a single binary property but a spectrum that varies by individual capability.

Tags: accessibility testing · automated evaluation · personalization · accessibility metrics · WCAG compliance · disability profiles

Standards referenced: WCAG 2.0 · WAI-ARIA · EARL