← All reviews

IMAGE: An Open-Source, Extensible Framework for Deploying Accessible Audio and Haptic Renderings of Web Graphics

Juliette Regimbal, Jeffrey R. Blum, Cyan Kuo, Jeremy R. Cooperstock · 2024 · ACM Transactions on Accessible Computing · doi:10.1145/3665223

Summary

This paper introduces IMAGE (Internet Multimodal Access to Graphical Exploration), an open-source software framework designed to help accessibility practitioners create and deploy audio and haptic representations of web graphics. The authors address a critical problem in accessibility research: many projects that aim to make visual content accessible to blind and low vision users never reach public deployment due to the significant technical overhead required to build and maintain the underlying infrastructure. The IMAGE framework uses a microservices architecture built on Docker containers, allowing different components to be developed, updated, and deployed independently. The system consists of several key parts: a browser extension that collects graphical data from web pages, an orchestrator that manages request flow, preprocessors that extract and transform data, handlers that generate accessible renderings, and helper services for common functions like speech synthesis and sound spatialization. Currently, the framework supports photographs, embedded Google Maps, and Highcharts data visualizations. When a user encounters a supported graphic, they can activate the IMAGE browser extension via alt-click or a button injected into the page. The extension sends the graphic data to the server, where it passes through relevant preprocessors and handlers. Multiple renderings may be returned, such as an audio sonification of a pie chart or a tactile representation for a refreshable pin array. Importantly, IMAGE is designed to complement rather than replace existing accessibility tools like screen readers and alt text. The architecture emphasizes extensibility and reuse. All data exchanged between components is validated against JSON schemas, which serves both as documentation and as a debugging aid. The paper illustrates how teams outside the original project could use IMAGE for different applications, describing hypothetical pipelines for accessible weather radar maps and automatic audio captioning systems.

Key findings

The authors conducted retrospective interviews with six team members who worked on IMAGE in roles spanning audio design, haptic implementation, machine learning, browser extension development, and user research. These interviews revealed several recurring challenges and lessons learned. Learning curve emerged as a significant barrier. Initial documentation focused on high-level architecture rather than practical tasks like developing and deploying new preprocessors. The team addressed this by creating task-specific guides. Internal state visibility was also problematic; team members struggled to understand which versions of handlers were active and how different preprocessors contributed to final outputs. Solutions included server-side scripts to display component status and debugging-specific handlers that visualize intermediate outputs like object detection bounding boxes. Component reusability proved more nuanced than anticipated. Early in the project, some team members created components that were technically compatible with the IMAGE architecture but too narrowly tailored to specific use cases to be meaningfully reused. Additional instruction and hands-on experience helped developers understand how to design for broader applicability. The framework better supported development tasks than design tasks. When team members worked on clearly defined implementation problems, the architecture helped them focus and coordinate. However, during early brainstorming phases, attempting to fit ideas into the IMAGE structure sometimes constrained thinking and led to lower-quality outcomes. The team adjusted their practices to encourage more exploration outside the architecture during initial design phases. Despite challenges, all interviewed team members expressed that the architecture benefited their work and the project overall. The modular structure facilitated component replacement, software maintenance, and progression from prototypes to production.

Relevance

For accessibility practitioners considering building infrastructure to support multimodal web content, this paper offers both a concrete tool and valuable guidance on framework design. The IMAGE architecture is available as open source (https://image.a11y.mcgill.ca), allowing teams to build on existing components rather than starting from scratch. The interview findings provide candid insights into the human factors of working with accessibility frameworks. The observation that architectural structure can constrain early-stage design thinking is particularly relevant for teams planning similar projects. The recommendation to work "outside the architecture" during brainstorming before fitting ideas into the technical structure could save other teams from premature optimization. The paper highlights an often-overlooked barrier to accessibility progress: many promising research prototypes never reach end users because the effort required to deploy and maintain production systems exceeds project resources. By providing reusable infrastructure, IMAGE aims to reduce this overhead and enable researchers to focus on the accessibility interactions themselves rather than underlying plumbing. For organizations working with data visualizations, maps, or photographs that need to be made accessible, IMAGE offers a potential foundation. The framework currently has limitations, including the requirement that all processing fit within a single HTTP request-response cycle, but the modular design means these constraints could be addressed in future versions without requiring wholesale redesign.

Tags: web graphics · sonification · haptic feedback · multimodal interaction · open source · microservices · blind · low vision · data visualization