Making Charts Speak: LLM-Based Conversational Chart Question Answering for Blind and Low-Vision Users
Amit Kumar Das, Mohammad Tarun, Klaus Mueller · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26) · doi:10.1145/3772363.3799030
Summary
Das, Tarun, and Mueller present GraphWhisper, a conversational system that lets blind and low-vision (BLV) users explore chart images (JPEG, PNG) through natural-language questions, without requiring the chart data to be pre-structured in formats like Vega-Lite. The authors argue that existing accessible chart tools either reduce visualizations to static alt-text summaries or data tables (losing exploratory value), rely on specialized sonification or tactile hardware, or depend on structured data inputs that do not exist for the PNG and JPEG charts BLV users encounter daily in news articles, papers, and web content. GraphWhisper's core technical contribution is an enhanced Charts-of-Thought prompting methodology that first has the LLM identify which of twelve chart types is present (line, bar, stacked bar, 100%% stacked bar, pie, histogram, scatter, bubble, area, stacked area, choropleth, treemap) and then applies chart-type-specific data-extraction procedures with built-in validation guardrails - for example, pie slices must sum to 360 degrees, stacked segments are computed as top-minus-bottom boundary differences, and percentage charts must sum to 100%%. The system is wrapped in a conversational interface with automatic chart summaries, dynamic follow-up question suggestions, text-or-voice input, and confidence indicators expressed in plain language. Evaluation combined VLAT visualization-literacy benchmarks across Claude 4.5, GPT-5, and Gemini 2.5 with a 15-participant BLV user study run remotely via Zoom with JAWS, NVDA, and VoiceOver users recruited through the National Federation of the Blind.
Key findings
On the VLAT benchmark, the enhanced chart-type-specific prompting drove Claude 4.5 to 98.1%% accuracy (52 of 53 questions), surpassing the original Charts-of-Thought score of 50.17 and the human baseline of 28.82; GPT-5 and Gemini 2.5 also improved under the same prompting, with additional validation on ChartQA, ChartMuseum, and EncQA. In the user study, 15 BLV participants asked 185 questions across diverse charts and GraphWhisper answered 94%% correctly, with consistent performance across Kim et al.'s query taxonomy: analytical 94%% (73/78), visual 94%% (49/52), contextual 94%% (31/33), and navigation 95%% (21/22). Satisfaction ratings were unusually high and tightly clustered (4.87-5.0 on a 5-point scale) across ease of use, description quality, usefulness, comparison to usual methods, and overall satisfaction, with minimal standard deviations (0.00-0.34). All participants used every feature, and automatic summaries, follow-up questions, and confidence indicators each had 100%% helpfulness rates. The main documented failure mode: accuracy collapses to 68%% on compressed images below 150 DPI, and complex multi-series or unusual-encoding charts remain hard.
Relevance
For accessibility practitioners, this paper is a concrete demonstration that a well-engineered prompting layer on top of a general-purpose multimodal LLM can deliver practical chart accessibility for the images that BLV users actually encounter, without retraining models or requiring publishers to emit Vega-Lite. Three design patterns are directly reusable: chart-type-specific extraction procedures with mathematical validation guardrails (sum-to-100%%, sum-to-360 degrees, stacked boundary subtraction), plain-language confidence communication rather than numeric scores, and dynamic follow-up suggestions to scaffold exploration. Caveats: the user study is small (n=15), English-only, limited to static images, and assumes prior visualization familiarity; the ceiling-level satisfaction scores warrant scrutiny given the short 45-minute sessions. Still, the approach complements rather than replaces existing accessible-visualization work like Olli, Data Navigator, and MAIDR.
Tags: chart accessibility · data visualization · blind and low vision · large language models · conversational interface · prompt engineering · visualization literacy
Standards referenced: WCAG 2.1