Comprehensive Accessibility of Equations by Visually Impaired

Akashdeep Bansal · 2019 · Proceedings of the 16th International Web for All Conference (W4A) · doi:10.1145/3315002.3332431

Summary

This doctoral consortium paper proposes an approach to improving the audio rendering of mathematical equations for people with visual impairments by introducing a complexity metric that adapts how equations are spoken based on their structural complexity and individual user characteristics. The fundamental challenge is that mathematical equations are inherently non-linear — using 2D constructs like superscripts, subscripts, fractions, radicals, and parentheses — while audio is linear, creating unavoidable tensions between ambiguity, verbosity, and cognitive load when equations are read aloud. The paper illustrates this vividly by showing the same equation (Faà di Bruno's formula for the nth derivative of a composite function) in both its LaTeX source code and its rendered visual form — the visual representation immediately reveals high-level structure (nth derivative equals a double summation over a fraction) while the linear LaTeX is nearly incomprehensible. The paper surveys six types of audio cues proposed in the literature for spoken mathematics — lexical (explicit verbal markers like "start fraction"), prosodic (pitch and pause changes), earcons (musical sounds), spearcons (sped-up speech fragments), audio spatialization (3D sound positioning), and auditory cues (non-speech sounds) — and compares their trade-offs across verbosity, ambiguity, naturalness, and cognitive load. No single cue type excels across all dimensions: lexical cues eliminate ambiguity but are verbose; prosodic cues are natural but ambiguous; earcons and spearcons reduce verbosity but increase cognitive load.

Key findings

The proposed research plan has four components. First, variable substitution: using the complexity metric to identify when an equation is too complex for direct audio rendering and automatically partitioning it into named sub-expressions that are read separately, reducing cognitive load while maintaining comprehension. Second, cue adaptation by complexity level: selecting the most effective type of audio cue based on the equation's complexity rather than using a one-size-fits-all approach, potentially allowing simpler equations to use natural prosodic cues while complex ones use more explicit lexical markers. Third, personalization based on user characteristics: adjusting the complexity threshold and cue selection based on the user's age, educational background, cognitive and listening abilities, and familiarity with the mathematical content — recognizing that what is "complex" differs between a graduate student in mathematics and a high school student. Fourth, semantic rendering: using contextual information to disambiguate notation (e.g., A^T as "A transpose" vs. "A to the power T", or f(x+y) as "f of x plus y" vs. "f times x plus y"), dramatically reducing verbosity compared to structural renderings like "A superscript start T superscript end." The research builds on T.V. Raman's weight metric for equation complexity, proposed in the 1990s but never empirically validated.

Relevance

This research addresses one of the most persistent barriers in STEM accessibility: the difficulty of accessing mathematical content through audio. The comparison table of cue types provides a useful reference for anyone working on math accessibility, clearly showing that no single approach solves all problems. The concept of complexity-adaptive rendering — automatically adjusting how equations are spoken based on their structural complexity — is a promising direction that mirrors how human tutors naturally adapt their mathematical explanations to student level. For accessibility practitioners, the key insight is that equation accessibility is not just about translating notation to speech but about managing cognitive load through appropriate abstraction, decomposition, and contextual disambiguation. The personalization dimension is particularly relevant: the same equation may need entirely different audio treatments for different users, suggesting that math accessibility tools should build user profiles rather than applying uniform rendering rules. This work complements the MathJax/Speech Rule Engine paper from the same conference, which implemented multiple speech rule sets but did not address complexity-based adaptation.

Tags: mathematics accessibility · STEM accessibility · visual impairment · screen reader · speech synthesis · cognitive load · earcon · personalization · equation rendering · auditory interface