Mathematical Content Browsing for Print-disabled Readers Based on Virtual-world Exploration and Audio-visual Sensory Substitution
Rynhardt Kruger, Febe De Wet, Thomas Niesler · 2023 · ACM Transactions on Accessible Computing · doi:10.1145/3584365
Summary
This research addresses a critical accessibility gap: mathematical equations in PDF documents are largely inaccessible to blind and visually impaired readers. While standards like MathML exist for encoding accessible mathematics, most scientific and technical papers are published as untagged PDFs where equations appear as graphics without semantic data. When a screen reader encounters an equation like y = x/2 + x², it reads the characters in linear order across multiple lines, making it impossible to distinguish fractions from exponents or determine which terms belong to numerators versus denominators. The researchers developed a novel browsing approach that combines two techniques: virtual-world navigation inspired by text adventure games, and audio-visual sensory substitution using a modified vOICe algorithm. The equation is represented as a document object model (DOM) where textual elements become navigable nodes connected by spatial relationships (left, right, up, down, and diagonal variants). Users can explore using either text mode (issuing commands like "look," "right," or "play") or graphical mode (using cursor keys with automatic sonification). The vOICe algorithm (named for "Oh, I see") converts visual information to sound by scanning images from left to right and generating tone chords where pitch corresponds to vertical position—higher pixels produce higher-pitched tones. This allows graphical elements like fraction lines, square root extents, and brackets to be "heard" as distinctive sound patterns. The system was implemented as a web application using Rust and JavaScript, with equations extracted from PDFs using the Poppler library.
Key findings
The system was evaluated with 25 participants (11 blind, 14 sighted) tasked with identifying 12 equations extracted from PDF documents. Overall, 78% of equations were identified completely correctly—74% for blind participants and 83% for sighted participants. When partial correctness was considered (accounting for minor omissions or placement errors), accuracy rose to 95.4% overall (93.3% blind, 97.6% sighted). The difference between blind and sighted performance was not statistically significant (p < 0.1). Performance improved significantly from Stage 1 (text mode only) to Stage 2 (combined text and graphical modes), with p < 0.02. Most blind participants preferred graphical mode because it allowed faster navigation with single-key commands rather than typed text commands. Eight of 11 blind participants reported that the browser gave them a clearer understanding of the spatial layout of mathematical equations. One participant noted they had never previously understood the geometric shape of the square root symbol. The most common errors were missing textual elements (53% of blind participant errors in Stage 1) and incorrect bracket placement (32% of errors). Matrices proved particularly challenging because their two-dimensional structure is difficult to represent in linear braille formats. Notably, two blind candidates achieved perfect scores on all equations, demonstrating that the approach can be entirely effective when used non-visually.
Relevance
This research has immediate practical implications for STEM accessibility. The vast majority of scientific literature—textbooks, journal articles, technical reports—exists in PDF format without MathML or other accessible markup. Current screen readers simply cannot interpret this content, creating a significant barrier for blind students and professionals in mathematics, physics, engineering, and other technical fields. The approach offers a workaround that does not require publishers to retrofit existing documents with semantic markup. By extracting visual layout information directly from PDFs and presenting it through navigable audio interfaces, the system makes currently inaccessible content readable. The finding that training takes less than an hour suggests the technique is learnable without extensive investment. For accessibility practitioners and tool developers, the combination of text-adventure-style navigation with sonification offers a model for making other two-dimensional content accessible—the researchers note the approach could extend to graphs, charts, and infographics. The preference for graphical mode with automatic sonification over verbose text descriptions suggests that non-visual spatial exploration can be more efficient than purely linear descriptions, challenging assumptions that accessibility always requires complete translation to sequential text.
Tags: mathematical accessibility · sensory substitution · PDF accessibility · blind users · sonification · screen readers · STEM accessibility · audio feedback
Standards referenced: MathML · PDF/UA