Increasing Web Accessibility by Automatically Judging Alternative Text Quality

Jeffrey P. Bigham · 2007 · Proceedings of the 12th International Conference on Intelligent User Interfaces (IUI 2007) · doi:10.1145/1216295.1216364

Summary

This paper presents a machine learning classifier that can automatically judge whether alternative text assigned to web images is appropriate or inappropriate, using contextual features rather than visual analysis of the image itself. The research addresses a persistent web accessibility problem: nearly 40% of meaningful web images lack alternative text entirely, and many that do have it are assigned inaccurate or unhelpful text (such as filenames like "image45302" or generic labels like "1x1 clear pixel"). The classifier uses an AdaBoost algorithm with features derived from the surrounding context, including: the similarity of the alt text to the page content (via vector cosine), whether the alt text is more common on the specific site than the broader web (using Google Search), similarity to text derived from OCR and filename analysis, the proportion of dictionary words in the alt text, word-based similarity using a Naive Bayes bag-of-words model trained on known good and bad examples, and properties like word count. The system is designed to improve three categories of tools: authoring tools that help developers verify their alt text, systems like WebInSight that automatically generate alt text, and screen readers that could suppress known-bad alt text.

Key findings

The classifier achieved 86.8% accuracy on a manually labeled dataset of 250 images using 5-fold cross-validation. When used to filter alt text produced by filename analysis (which was originally 60.4% incorrect), the classifier reduced bad alt text from 60.4% to just 8.8%, while only suppressing 13.6% of good alt text — making filename-based labeling far more usable as it could then correctly label 36.2% of images. The classifier was also tested on a bootstrapped dataset of 8,350 images from 248 high-traffic websites, using page-level accessibility compliance as a proxy for alt text quality (pages that properly labeled all images were assumed to have better alt text). The paper identifies that content similarity to the surrounding page is a strong signal of quality, and that common known-bad patterns ("spacer," "1," "pad," "*") can be learned effectively. The approach challenges the assumption that judging alt text quality is too subjective for automation.

Relevance

This early work by Bigham (2007) was ahead of its time in applying machine learning to a core web accessibility problem that remains relevant today. The insight that alt text quality can be assessed through contextual features — without needing to understand the image visually — is foundational. Modern automated accessibility testing tools still struggle with alt text quality assessment, typically only checking for presence or absence rather than appropriateness. The paper references the landmark Target ADA lawsuit, connecting technical research to legal accessibility requirements. For practitioners, the work highlights that alt text quality is not purely subjective and that pattern-based detection of common bad practices (filename-derived text, generic labels, decorative images with non-empty alt) can significantly improve the user experience for screen reader users. The three application categories identified — authoring tools, auto-generation systems, and screen readers — remain the primary contexts where alt text quality assessment is needed.

Tags: alternative text · web accessibility · machine learning · screen readers · automated testing · image accessibility · blind and low vision

Standards referenced: ADA