Semantic Content Analysis Supporting Web Accessibility Evaluation
Carlos Duarte, Inês Matos, Luís Carriço · 2018 · Proceedings of the 15th International Web for All Conference (W4A) · doi:10.1145/3192714.3196828
Summary
This paper addresses a fundamental limitation of automated web accessibility evaluation tools: their inability to assess whether text alternatives actually describe the content they refer to. While tools can detect the presence of an alt attribute on an image, they cannot judge whether that alt text meaningfully describes the image — a judgment that traditionally requires human evaluation. The authors propose the SCREW (Semantic Content analysis for Repair and Evaluation of Web accessibility) algorithm, which automatically rates the similarity between media content and its textual description. The algorithm works by first extracting content descriptors from media (using image recognition APIs like Clarifai for images, or keyword extraction for text), then computing both direct and indirect similarity measures between those descriptors and the textual description. The indirect measure uses semantic domains (persons, places, organizations) to bridge gaps that direct comparison misses — for example, connecting a photo recognized as containing a "person" with an alt text mentioning "Cristiano Ronaldo." Named entity recognition further strengthens these domain-based connections. The algorithm was optimized using a survey of 192 participants who rated 30 image-description pairs from UK newspaper websites, and the parameters were tuned using Spearman correlation analysis across 200 parameter combinations.
Key findings
The optimized algorithm achieved a moderate Spearman correlation of 0.581 (p<0.001) with human ratings during optimization, and 0.404 during independent evaluation with a separate set of 15 images rated by 101 participants. The Persons and Places domains together produced the best results, while the Organizations domain actually degraded performance. The algorithm correctly classified 9 out of 15 descriptions, achieving recall of 0.75, precision of 0.6, and specificity of 0.43. The high recall means developers would rarely be told a good description is bad, but the lower specificity means some genuinely poor descriptions slip through undetected. The algorithm was integrated into the QualWeb accessibility evaluator to automatically assess WCAG 2.0 technique H37 (alt attributes on img elements), marking the first introduction of semantic evaluation into an automated accessibility testing tool. Two notable failure cases involved images of a hotel building and a car, where the lack of corresponding domains (buildings, vehicles) prevented accurate assessment.
Relevance
This research is significant because it tackles one of the most persistent gaps in automated accessibility testing: evaluating the quality, not just the presence, of text alternatives. For practitioners, the key insight is that semantic AI services can augment — though not yet replace — human judgment in accessibility evaluation. The algorithm has been deployed in the QualWeb evaluator, demonstrating real-world applicability. However, the moderate correlation and low specificity highlight that fully automated alt text quality assessment remains an open challenge. The work also raises an important point about the subjectivity of what constitutes a "good" description versus a caption, suggesting that clearer evaluation criteria are needed industry-wide. As image recognition and NLP technologies have advanced considerably since 2018, the fundamental approach outlined here likely performs much better with modern AI services.
Tags: automated testing · alternative text · semantic analysis · machine learning · image recognition · image description · algorithm · natural language processing · text alternatives · accessibility evaluation · WCAG compliance · web accessibility
Standards referenced: WCAG 2.0 · H37