Thematic Organization of Web Content for Distraction-Free Text-to-Speech Narration
Muhammad Asiful Islam, Faisal Ahmed, Yevgen Borodin, I.V. Ramakrishnan · 2012 · Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2012) · doi:10.1145/2384916.2384920
Summary
This paper addresses a fundamental problem in how blind users experience web content through screen readers: the serial, linear narration of complex web pages that contain multiple thematic elements — news stories, ads, navigation taxonomies, and opinion pieces — all interspersed together. When a text-to-speech engine reads these mixed segments sequentially, users must mentally separate relevant content from extraneous material, creating significant cognitive stress. The authors propose a technique called ThemeSeg that segments web pages into coherent thematic pieces by combining visual features (position, size, colour), structural features (DOM hierarchy, element types), and linguistic features (word content, semantic similarity) of web page elements. The key innovation is the tight coupling of all three feature types, whereas prior segmentation methods relied primarily on visual and structural cues alone. The technique adapts Latent Semantic Analysis (LSA), traditionally used for computing linguistic similarity between text documents, by extending its feature set to include visual and structural attributes. This extended LSA computes a composite similarity score among web elements, which is then used in a bottom-up agglomerative clustering algorithm to group elements into thematic segments. The authors note that the clutter-free main content produced by tools like Readability and Safari's Reader mode emerges as a special case of their more general thematic segmentation approach.
Key findings
Evaluation on a dataset of 600 web pages from diverse domains (11,763 manually labelled segments) showed that combining linguistic, visual, and structural features achieved 90% recall, 86% precision, and 88% F-measure — a statistically significant 10% improvement over using visual and structural features alone (81% recall, 76% precision, 79% F-measure). A user study with 23 blind participants compared three systems: a standard screen reader (HS), VoiceOver-style segmentation (VO), and the thematic segmentation system (SS). Participants completed browsing tasks 1.56 times faster with SS versus VO and 2.53 times faster versus HS for the first task. For the second task involving finding related content, SS was 2.03 times faster than VO and 2.56 times faster than HS. Users also pressed significantly fewer keystrokes with SS. Likert-scale feedback showed participants agreed that segmentation made browsing more efficient (4.29/5) and that SS was more useful than both VO (4.21/5) and HS (3.54/5). The algorithm operates in real-time with no training required, processing an average of 376 elements and 1,062 attributes per web page.
Relevance
This research highlights a critical but often overlooked aspect of web accessibility: the cognitive burden placed on screen reader users by poorly organized content. While much accessibility work focuses on ensuring content is technically accessible (proper markup, alt text, ARIA labels), this paper demonstrates that how content is organized and presented to assistive technology users matters enormously for comprehension and efficiency. The thematic segmentation approach offers a model for how assistive technologies could go beyond reading DOM order to provide more intelligent content navigation. For web developers, the findings reinforce the importance of clear content structure and semantic markup. The paper also shows that tools like reader modes, while useful, are a limited special case of what more sophisticated content organization could achieve. The work predates widespread adoption of ARIA landmarks and HTML5 sectioning elements, which partially address similar goals through manual authoring rather than automatic detection.
Tags: screen readers · web segmentation · text-to-speech · blind users · cognitive load · web accessibility · clustering · information retrieval · content organization