AccessMenu: Enhancing Usability of Online Restaurant Menus for Screen Reader Users

Nithiya Venkatraman, Akshay Kolgar Nayak, Suyog Dahal, Yash Prakash, Hae-Na Lee, Vikas Ashok · 2025 · Proceedings of the 22nd International Web for All Conference (W4A) · doi:10.1145/3744257.3744275

Summary

This paper addresses the significant accessibility barriers that blind and visually impaired (BVI) screen reader users face when trying to access online restaurant menus, which are typically presented as images or PDFs. The research proceeds in two phases. First, an interview study with 12 blind screen reader users (median age 49, all ordering food online at least weekly) uncovered major pain points: OCR outputs from tools like JAWS Convenient OCR produced text in illogical reading order that did not match the visual layout, making it cognitively taxing to mentally reconstruct menu structure; AI assistants like ChatGPT produced inaccurate inferences (e.g., categorizing a Caesar salad with anchovies as vegetarian) and misinterpreted menu symbols and legends; and nearly two-thirds of participants reported relying on sighted companions to help navigate menus. All participants expressed a need for an alternative screen reader-tailored interface. Based on these findings, the authors developed AccessMenu, a Chrome browser extension that automatically detects visual menus on restaurant websites, uses multimodal large language models (MLLMs) to extract and structure menu content into a semantic JSON model, and re-renders it as an accessible HTML accordion interface navigable with standard screen reader keyboard shortcuts. The system also supports natural language queries (e.g., "list all gluten-free items") using Chain-of-Thought prompting with few-shot examples, allowing users to filter and search menu content conversationally.

Key findings

The extraction pipeline was evaluated across three MLLMs using 50 diverse restaurant menus. GPT-4o-mini outperformed Claude-3.5-Sonnet and Llama 3.2-90B-Vision across all three metrics: Entity F1 (0.80 vs. 0.62 vs. 0.79), Relationship F1 (0.73 vs. 0.43 vs. 0.61), and Structural F1 (0.84 vs. 0.43 vs. 0.78). For query responses, manual inspection of 108 queries across 5 menus yielded precision of 0.71, recall of 0.85, and F1 of 0.77, with 82.3% of inaccuracies caused by ambiguities in voice transcription of complex menu item names. In the user study with 10 blind participants, AccessMenu users perused significantly more items (median 31.5) compared to OCR (median 15) in the same time period (Wilcoxon Z = 2.76, p = 0.005). SUS scores were significantly higher for AccessMenu (mean 69.25) versus OCR (mean 46.25), with a one-way ANOVA showing F = 12.08, p < 0.005, and a large effect size (eta-squared = 0.40). NASA-TLX workload scores dropped substantially from 77.93 (OCR) to 48.03 (AccessMenu), with F = 161.26, p < 0.005. Participants praised AccessMenu's simplicity and the natural language query feature, while noting desires for cross-restaurant filtering, persistent search preferences, and allergen-based personalization.

Relevance

This research tackles a practical, everyday accessibility problem that affects blind users' independence and quality of life — the ability to independently browse and choose food from restaurant menus. The work demonstrates how multimodal LLMs can serve as an intermediary layer to transform visually-encoded document content into structured, accessible formats. For accessibility practitioners, the study provides a concrete example of using Chain-of-Thought prompting to handle complex visual documents with spatial relationships, legends, and symbols that traditional OCR cannot meaningfully interpret. The finding that OCR reading order does not match the logical structure of visual documents is a broadly applicable insight for any domain where visual layout conveys semantic meaning. The natural language query feature represents an important interaction paradigm for screen reader users — instead of linearly navigating through content, users can ask questions to efficiently extract specific information. Future directions including platform-wide menu filtering across restaurants and persistent dietary preference profiles point toward a more comprehensive accessible food ordering ecosystem.

Tags: screen readers · blind users · visual document understanding · LLM accessibility · multimodal AI · browser extension · web accessibility · OCR

Standards referenced: ARIA · WCAG