Beyond Additive Design: An Empirical Taxonomy of Multimodal STEM Accessibility Systems

Madjid Sadallah, Benoit Encelle · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26) · doi:10.1145/3772363.3799343

Summary

Sadallah and Encelle conduct a systematic review of 66 multimodal STEM accessibility systems for blind and visually impaired (BVI) users published between 2015 and 2025, drawn from the ACM Digital Library, IEEE Xplore, OpenAlex, PubMed, and Web of Science. Their central argument is that the field has largely treated 'multimodal' as a synonym for 'multiple output channels stacked together' — what they call the additive fallacy — and that this framing is fundamentally misaligned with how human perception integrates information. They develop a five-dimensional coding framework grounded in multisensory-integration theory: Temporal Synchronization (0-4 pts, weight 25%), Semantic Integration (0-4 pts, weight 25%), Topological Fidelity (0-3 pts, weight 18.75%), Modal Appropriateness (0-3 pts, weight 18.75%), and Measured Performance (0-2 pts, weight 12.5%). Two raters independently coded all 66 systems with high reliability (ICC=0.92). The corpus covers mathematics (42.4%), 2D diagrams (33.3%), data visualisation (16.7%), and 3D models (7.6%), with audio+tactile being the dominant modality combination (47.0%). Three discrete architectural regimes emerge from the data — Additive (channel stacking), Augmentative (partial coordination), and Integrative (orchestrated fusion) — separated by score breaks so sharp that they form non-overlapping 95% confidence intervals, indicating qualitatively distinct design paradigms rather than points on a continuum.

Key findings

Additive systems remain the dominant design approach, accounting for 37.9% of the corpus (though declining from 48% to 28% over the decade); Augmentative systems are 36.4% (stable at 35-40%); Integrative systems are 25.8% (growing from 12% to 31%). The central empirical finding is the Differential Cognitive Yield (DCY) phenomenon: despite massive architectural progression between regimes (Temporal Synchronization scores rise 382%, Semantic Integration rises 219% from Additive to Integrative), Measured Performance shows no significant variation across regimes (F=1.30, p=0.279, eta-squared=0.04). The authors attribute this decoupling to four methodological weaknesses: only 18.2% of studies used cognitive-load instruments like NASA-TLX; 87% used simple tasks with ceiling effects; there is publication bias toward optimal-condition evaluations; and unidimensional metrics ignore transfer, retention, and concurrent-task capacity. Multinomial logistic regression yields empirical design thresholds: Augmentative status requires Sync >= 1.5 AND Integration >= 1.8; Integrative status requires Sync >= 3.2 AND Integration >= 2.9 (89% classification accuracy). Five practical design principles follow: synchronisation primacy, semantic coherence, topological progressivity, avoiding additive traps, and holistic evaluation that measures performance-load-transfer together rather than task accuracy alone.

Relevance

This paper is essential reading for anyone building or evaluating non-visual interfaces for mathematics, diagrams, charts, or 3D models. The DCY phenomenon names something practitioners have long suspected but struggled to quantify: that sophisticated multimodal tools can look impressive on completion-time and accuracy benchmarks while still exhausting BVI users who must manually weave together audio and tactile information streams. The empirical thresholds (Sync >= 100 ms, Int >= 2.9/4) give designers measurable targets rather than vague guidance, and the five design principles translate directly into evaluation checklists for procurement, accessibility auditing, and peer review. The call to shift from 'interface engineering' to 'perceptual integration engineering' reframes what counts as good multimodal accessibility work and suggests that standards bodies (W3C, ISO, EN 301 549) may need explicit requirements for temporal coordination and semantic coherence in multimodal output, not just the presence of multiple output channels. Limitations: the analysis is retrospective and correlational, the 0-2 Performance scale may be too coarse, and most coded systems are research prototypes rather than commercially deployed products, so generalisation to the wider accessibility landscape remains to be shown.

Tags: STEM accessibility · multimodal interaction · cognitive load · sensory integration · blind and low vision · systematic review · accessibility research · sonification · tactile graphics · haptics · cognitive engineering