Grouping Hyperlinks for Improved Voice/Mobile Accessibility

Alex Penev, Raymond K. Wong · 2008 · Proceedings of the 2008 International Cross-Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/1368044.1368055

Summary

This paper from UNSW Sydney proposes an automatic technique for clustering a web page's hyperlinks into topical groups to help blind screen reader users and mobile device users locate desired links more quickly. The core problem is that sighted users can visually scan a page in parallel and instantly skip irrelevant content, while blind users must read links serially — a significant disadvantage on link-heavy pages (the authors note that while the average page has 10 links, index pages of major sites have far more: Google 30, Microsoft Research 70, Yahoo 170, Digg 250). The approach works in three stages: first, each link is described with keywords drawn from its anchor text, destination URL structure, and the content of the linked-to page (top 30 TF-IDF terms). Second, a locality propagation step spreads keyword influence between nearby links in the DOM tree, using an inverse-square decay based on DOM distance calculated via Dewey IDs. This exploits the fact that web authors typically arrange related links in proximity. Third, the enriched link descriptions are clustered using k-means with cosine similarity, producing topical groups that users can browse hierarchically rather than linearly. The technique has dual applications: automatic client-side clustering for blind users on demand, and supervised server-side clustering for creating simplified mobile portal pages.

Key findings

Testing on Australia's Centrelink government services website ("About Payments" section with 111 links), the algorithm produced 12 clusters that grouped links into coherent topics: agency information, website meta-links, welfare allowances, carers, raising children, family, study and training, health problems, farm help, and online security. The Dewey ID analysis showed that links from structurally distant parts of the DOM were successfully grouped together based on semantic similarity — demonstrating that the clustering went beyond simple structural proximity. The locality propagation was effective at grouping links that shared few common words individually but were contextually related through their DOM neighbors. Processing time was approximately 3.2 seconds for 100 links on a standard PC (0.6s parsing, 0.48s locality propagation, 2.1s for 20 k-means runs). The authors acknowledged limitations: k-means requires a predetermined cluster count (set heuristically to min(ceil(links/10), 12)), produces non-deterministic results requiring multiple runs, and Cluster 10 in the experiment mixed three distinct topics, suggesting the fixed cluster count was too low for that content.

Relevance

This paper addresses a fundamental asymmetry in web navigation that persists today: sighted users process page content in parallel while screen reader users navigate serially. The link clustering concept anticipated features that modern screen readers and browser extensions now provide, such as landmark navigation and heading-based content skipping. For accessibility practitioners, the key insight is that the DOM tree structure itself encodes meaningful semantic relationships that can be exploited to improve non-visual navigation — links placed near each other in the source tend to be topically related. The dual application to both screen reader and mobile accessibility also foreshadowed the convergence of accessibility and responsive design concerns. While the specific k-means implementation is dated, the underlying principle of automatically reorganizing content to reduce serial navigation burden remains relevant as pages grow increasingly complex. The work also highlights the value of government service websites as accessibility research targets, given their high traffic and diverse user populations.

Tags: web accessibility · screen readers · blind users · navigation · mobile accessibility · clustering · information retrieval · algorithms