Prediction of Web Page Accessibility Based on Structural and Textual Features

Sina Bahram, Debadeep Sen, Robert St. Amant · 2011 · Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/1969289.1969329

Summary

This paper from NC State University explores whether machine learning classifiers can predict the accessibility of web pages based on structural and textual features of their DOM — features that are independent of explicit accessibility markup like alt text or ARIA attributes. The core research question is whether there are inherent properties of web page structure and content that influence usability for people with vision impairment, beyond what accessibility guidelines directly address. The researchers compiled two datasets: 52 academic web pages (top and bottom 26 from a Chronicle of Higher Education accessibility ranking) and 52 non-academic pages (from WebAIM surveys and blind user mailing lists), each labelled "Accessible" or "Inaccessible" based on user judgements. A specialised browser walked each page's DOM to extract 26 structural and textual features, which were then processed through three classifiers in Weka: J48 decision tree, Bayesian network (BayesNet), and Support Vector Machine (SVM), using stratified 10-fold cross-validation.

Key findings

The Bayesian network classifier performed best overall, achieving 80% accuracy on academic pages, 48.1% on non-academic pages, and 71.2% on the combined dataset (F-measure 0.711) — substantially above the 50% random baseline. The decision tree achieved 64.4% on the combined dataset, and the SVM 60.6%. Performance improved on the combined dataset partly because more training data was available (104 vs 52 instances) — when the combined dataset was downsampled to 52, the improvement disappeared. Surprisingly, the established time-to-reach metric from IBM's aDesigner tool performed poorly as a standalone accessibility predictor: logistic regressions using mean, median, and maximum time-to-reach values produced chi-squared values close to zero for non-academic and combined datasets. The one exception was that median time-to-reach (split at 76 seconds) achieved 61.5% true positive rate on the academic dataset alone. The Bayesian network generated a flat network using all 26 features to predict accessibility, providing limited interpretive insight into which specific features drive the prediction. The authors note an important open question: whether the identified correlations are causal — if changing features to reclassify a page as "accessible" would actually improve its real-world accessibility.

Relevance

This early exploration of applying machine learning to accessibility prediction anticipated a direction that has become increasingly important with modern AI capabilities. For accessibility practitioners, the key insight is that web page accessibility may be influenced by structural properties beyond what explicit guidelines check — suggesting that well-structured, well-organised pages may be inherently more accessible even before specific WCAG criteria are considered. The finding that IBM's time-to-reach metric was a poor standalone predictor of user-judged accessibility challenges the assumption that any single metric can capture the complexity of real-world accessibility experience. The paper's honest acknowledgement of limitations — small datasets, unclear causality, modest accuracy — is refreshing and sets a proper foundation for the field. With modern large language models and vastly more training data, the approach of predicting accessibility from page structure could now achieve much higher accuracy and provide actionable insights for developers about which structural patterns correlate with better accessibility outcomes.

Tags: machine learning · accessibility evaluation · automated testing · visual impairment · web accessibility · accessibility metrics · artificial intelligence

Standards referenced: Section 508 · WCAG