ChitChatGuide: Conversational Interaction Using Large Language Models for Assisting People with Visual Impairments to Explore a Shopping Mall

Yuka Kaniwa, Masaki Kuribayashi, Seita Kayukawa, Daisuke Sato, Hironobu Takagi, Chieko Asakawa, Shigeo Morishima · 2024 · Proceedings of the ACM on Human-Computer Interaction (MobileHCI) · doi:10.1145/3676492

Summary

ChitChatGuide is a smartphone-based indoor navigation system that wraps a GPT-4-powered conversational interface around an existing BLE-beacon localisation stack (HULOP) to support something most blind-navigation research overlooks: casual, purpose-less exploration — the blind equivalent of recreational window-shopping. The authors frame exploration as a distinct accessibility need from wayfinding: wayfinding is 'take me to store X', exploration is 'show me what interesting shops are on this floor'. Existing systems handle wayfinding well but force a blind user to already know a destination, or to listen to a long VoiceOver list of store names. ChitChatGuide has two functionalities. First, tour-planning conversation: the user double-taps with two fingers to speak (a VoiceOver-compatible gesture), the LLM responds in JSON containing a natural-language reply and a structured destination ID, and the conversation iterates until a single-destination route or a predefined floor-exploration tour is selected. Second, personalised Point-of-Interest (POI) descriptions while walking: the system estimates transit time between POIs and prompts the LLM to produce a description whose word length is scaled to fit the walking window, personalised to any exclusion or inclusion preferences the user has stated (for example 'only tell me about restaurants' or 'don't mention toilets'). Users can also interrupt to ask ad-hoc questions about POIs. The system was evaluated in an in-the-wild study at COREDO Muromachi shopping mall in Tokyo (29 stores across four floors) with 11 legally blind participants compared against Inclusive Navi as a baseline.

Key findings

All participants rated ChitChatGuide significantly higher than the baseline Inclusive Navi system for enjoyment (Q1, p < 0.001), ability to determine a destination based on interest (Q2, p = 0.042), useful walking-time information (Q3, p = 0.006), and appropriate description length (Q4, p = 0.027). The LLM answered 93.3% of general exploration questions correctly ('what is on this floor?'), 72.5% of category-specific queries ('any places to eat?'), and 85.4% of specific store queries. On 54 planned tours, 74% involved more than one back-and-forth turn — participants typically started with a vague question and narrowed down through follow-ups, a pattern that the baseline's fixed store list cannot support. POI personalisation was less reliable: only 66.9% of descriptions fully met their requirements, with exclusion requests for business-hours information working well (91.4%) but description-length shortening requests failing 83% of the time because the LLM prioritised its internal transit-time-based length calculation over the user's ask. Hallucinations occurred — 23 of 143 tour-planning responses contained false category or floor information, three introduced stores that didn't exist, and five answered 'no' to store queries when the store was in fact present. Qualitatively, seven participants described the experience as 'window-shopping for the first time', six said the system motivated them to visit a mall without a specific purpose, and all eleven said the Q&A function saved them from having to ask sighted staff for prices and recommendations.

Relevance

For accessibility practitioners working on LLM-assisted AT, public-venue wayfinding, or conversational agents generally, ChitChatGuide is a useful case study because it names three criteria for responsibly integrating LLMs into navigation systems: (1) balance of attractiveness and length of descriptions — blind users want more than a store name but less than a marketing brochure, and length must respond to user requests rather than an internal heuristic; (2) trustworthiness of response — hallucinations in a navigation context can send a user to a non-existent store, so RAG grounding and fact-checking are not optional; (3) depth of data — the system is only as good as the POI database behind it, which raises cooperative-data questions with facility owners (for example, real-time stock or sale-price feeds). The paper also reframes blind indoor navigation research by centring exploration rather than point-to-point travel, which is analogous to the distinction between 'search' and 'browse' in web accessibility. Practitioners should note the limitations: the study is small (N=11) in a small four-floor mall in Tokyo, the baseline was a cut-down version of Inclusive Navi with voice input disabled, the system used prompt engineering only (no RAG or fine-tuning), and the LLM was prompted in English but asked to respond in the user's native language — a design choice that may introduce further translation-layer hallucinations.

Tags: blindness and low vision · large language model · indoor navigation · wayfinding · conversational agent · orientation and mobility · accessible AI · mobile accessibility · exploration