Hunting for Headings: Sighted Labeling vs. Automatic Classification of Headings

Jeremy T. Brudvik, Jeffrey P. Bigham, Anna C. Cavender, Richard E. Ladner · 2008 · Assets '08: Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/1414471.1414508

Summary

This paper investigates the problem of missing heading markup in web pages — a significant accessibility barrier for blind screen reader users who rely on heading navigation (h1-h6 tags) to quickly skip through page content rather than reading linearly. While headings are one of the most powerful navigation aids available to screen reader users, many web developers use visual styling (large font, bold text) to create headings without applying proper HTML heading tags, making the page structure invisible to assistive technology. The paper makes three contributions. First, a user study with 10 sighted participants asked to manually label headings on 10 popular websites (Google, Yahoo, MySpace, YouTube, eBay, Wikipedia, Craigslist, Blogger, Photobucket, IMDB) revealed surprising disagreement — even informed sighted labelers following W3C-derived heading definitions did not consistently agree on what constitutes a heading. Fleiss's kappa across all pages was only 0.58, with individual pages ranging from 0.32 (Blogger) to 0.80 (Craigslist). Participants were also internally inconsistent, sometimes labeling visually similar elements as headings on one page but not another. Second, the authors created HeadingHunter, a machine learning classifier that automatically identifies headings based on visual features of rendered text. The classifier uses a J48 decision tree trained on four base features — font size, boldness, text length, and width-to-height ratio — computed as z-scores relative to surrounding text in both "centered" and "forward" windows of varying sizes. Third, the paper presents a general methodology for turning visual-feature-based classifiers into JavaScript scripts that can be deployed via Greasemonkey or bookmarklets to automatically insert headings into any web page.

Key findings

The sighted labeling study revealed that heading identification is inherently subjective and difficult, even for informed sighted users. Across 2,900+ text nodes rated on 10 pages, participants had low false-positive rates (rarely labeling non-headings as headings) but high false-negative rates (frequently missing true headings), and the combination of multiple labelers produced more complete results than any individual. HeadingHunter, trained on 34 pages from top-100 websites (set A), achieved 0.92 precision and 0.81 recall on a separate set of 36 accessibility-focused pages (set B). When evaluated against the sighted labelers' combined judgments at a threshold of 5 (headings labeled by at least 5 of 10 participants), HeadingHunter achieved an f-measure of 0.65, compared to 0.75 for the average participant — but the participant scores were biased upward since their own labels were part of the test set. Performance varied substantially by page: Wikipedia and Google achieved f-measures of 0.81 and 0.86 respectively, while Yahoo scored only 0.36 due to low recall (0.24), indicating HeadingHunter struggled with the diverse heading styles on that page. Importantly, recall correlated significantly with heading "importance" (agreement level among participants): R = 0.828, p = 0.003, meaning HeadingHunter was best at identifying the headings that the most people agreed upon — precisely the most useful headings for navigation.

Relevance

This paper addresses a web accessibility problem that remains widespread today: the gap between visual heading presentation and semantic HTML markup. Despite WCAG guidelines and improved developer awareness, many websites still rely on styled divs and spans rather than proper heading hierarchy, leaving screen reader users without efficient page navigation. For accessibility practitioners, HeadingHunter demonstrates that machine learning can bridge this gap by inferring semantic structure from visual presentation — an approach that has become increasingly relevant with the rise of automated accessibility testing and repair tools. The finding that even informed sighted people disagree on what constitutes a heading has profound implications for accessibility standards and guidelines: if the concept of a heading is inherently ambiguous, it is unreasonable to expect developers to implement headings consistently. This suggests that automated approaches and better tooling, rather than stricter guidelines alone, may be necessary to achieve reliable heading markup. The general methodology of training classifiers on rendered visual features and deploying them as client-side JavaScript is a template that could be applied to other accessibility problems the authors identify: matching form elements with their visual labels, identifying table row and column headers, and detecting logical page sections.

Tags: web accessibility · headings · screen readers · machine learning · semantic HTML · blind users · automated accessibility repair

Standards referenced: WCAG