Combining Semantic Tools for Automatic Evaluation of Alternative Texts

Carlos Duarte, Carlos M. Duarte, Luís Carriço · 2019 · Proceedings of the 16th International Web for All Conference (W4A) · doi:10.1145/3315002.3317558

Summary

This paper presents SCREW v2, an updated algorithm for automatically assessing the quality of alternative texts for images on web pages from an accessibility perspective. The work is motivated by the EU Directive 2016/2102, which requires monitoring the accessibility of public sector websites at a scale that demands automated evaluation — yet existing tools can only detect the presence of alt text, not judge its quality, flagging all alt texts for manual human review. The algorithm outputs a quality score between 0 and 1 by combining three levels of semantic analysis. The first level uses the Clarifai image recognition service to extract visual descriptors from the image, then searches for those descriptors (or their synonyms, or semantically related terms above a threshold) in the alt text. The second level extends this to domain-specific analysis — if general descriptors are semantically related to specific domains (Apparel, Celebrities, Food, Travel, Wedding), domain-specific image recognition models extract additional descriptors that are matched against the alt text. The third level, new in this version, incorporates image metadata (keywords, country, city, sub-location from EXIF data) and checks whether this information appears in the alt text or relates to named entities extracted from the text using spaCy. A bandpass filter penalizes descriptions that are too short or too long, reflecting the guideline that alt text should be succinct (around 125 characters maximum). Semantic similarity is computed using Swoogle (for descriptor-to-description matching) and Sematch (for named entity matching).

Key findings

The algorithm was optimized on a training set of 45 images and evaluated on a separate test set of 149 images collected from blogs, magazines, and news sites. Parameters were optimized by testing all combinations of values from 0 to 1 in 0.05 increments, achieving a correlation of 0.683 with expert classification on the training set. On the test set (149 images classified by consensus of three expert evaluators on a 1-4 scale), the algorithm achieved a strong correlation of 0.698 with expert consensus — a significant improvement over the initial version's moderate correlation of 0.404 on 15 images. For binary classification (good vs. bad alt text, using a 0.3 threshold), the algorithm achieved 83.2% accuracy, 85.3% precision, 79.5% sensitivity, 81.5% specificity, and an F-measure of 0.823. Cohen's Kappa between algorithm and expert classifications was 0.664, representing good agreement. The most significant improvements came from the addition of metadata-based relations and the description size penalty filter. Current limitations include dependence on the quality of underlying semantic services, restricted domain coverage (only five Clarifai domains), English-only support, no consideration of page context (cannot detect redundant alt text), and no mechanism to distinguish informative from decorative images.

Relevance

This research addresses one of the most persistent gaps in automated accessibility evaluation: the inability to assess the quality of alt text, not just its presence. Studies show that approximately 20% of accessibility issues with images involve alt text that exists but is inadequate or unhelpful — a problem no current automated tool can detect. As EU accessibility monitoring requirements scale up, the need for this capability becomes acute. For accessibility practitioners, the SCREW algorithm offers a practical middle ground between fully automated testing (which can only check if alt text exists) and fully manual evaluation (which is expensive and slow). Even with its limitations, the algorithm can serve as a triage tool — prioritizing which images most likely have poor alt text for human review, rather than requiring evaluators to check every image. The three-level approach (visual content matching, domain-specific analysis, metadata matching) reflects the multi-faceted nature of good alt text: it should describe what's in the image, be relevant to the image's domain, and include contextual information like locations or people when available. The size penalty captures the important guideline that alt text should be concise — neither a single word nor an essay. As AI image recognition and semantic similarity services continue to improve, the algorithm's performance will naturally improve without requiring changes to its logic.

Tags: alternative text · automated accessibility testing · image accessibility · semantic analysis · computer vision · artificial intelligence · WCAG · web accessibility · accessibility evaluation · natural language processing

Standards referenced: WCAG · EU Directive 2016/2102