WanderGuide: Indoor Map-less Robotic Guide for Exploration by Blind People
Masaki Kuribayashi, Kohei Uehara, Allan Wang, Shigeo Morishima, Chieko Asakawa · 2025 · Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25) · doi:10.1145/3706598.3713788
Summary
Kuribayashi and colleagues design WanderGuide, a suitcase-shaped robotic guide that supports blind people in recreational, open-ended exploration of indoor environments — wandering, browsing, window-shopping — rather than getting from A to B. The team frames a clear gap in the assistive-navigation literature: almost all prior systems require pre-built maps or BLE beacon infrastructure, which is expensive and impractical to deploy in every shopping mall, museum, or station. The few map-less systems that do exist (PathFinder, GPT-4o demo, Snap&Nav, Seeing AI) are tuned for navigation to a known destination, not for the open-ended discovery that recreational exploration requires. WanderGuide combines a wheeled robot platform (CMU CaBot fork, 360° LiDAR, three RGB-D cameras, 1080p fisheye, motorised wheels) with real-time simultaneous localisation and mapping (Cartographer ROS), an autonomous waypoint detection algorithm that skeletonises the robot's cost map and clusters intersection points into navigable POIs, and a multimodal large language model (GPT-4o) that produces three selectable levels of scene description: detailed (3-4 sentences, lighting, signs, mood), balanced-length (2-3 sentences, key objects), and concise (1-2 sentences, navigation only). Users interact through five buttons in the suitcase handle (mode switch, speed up/down, description detail) and a long-press conversation mode that handles three intents: general queries, specific queries, and command queries (most importantly the Take-Me-There functionality that returns the user to a previously described POI). A formative study with ten blind participants in a museum and shopping mall used a Wizard-of-Oz robot to elicit design requirements, surfacing three user types (Exploration-Inclined, Intermediate, Destination-Oriented). A main study with five further blind participants in the museum evaluated the full prototype with Raw-TLX, System Usability Scale, and Likert items, plus a separate evaluation in which 56 professional museum guides rated 82 randomly sampled MLLM-generated descriptions.
Key findings
The formative study identified three preference clusters — Destination-Oriented, Intermediate, and Exploration-Inclined — that drove the multi-level description design. The main study returned median Likert ratings of 6 or 7 on enjoyment, exploration ability, and willingness to use the system in both familiar and unfamiliar locations. SUS scores ranged from 72.5 to 90 (all above the 70 acceptability threshold), and Raw-TLX scores ranged from 15 to 28, well below the 26-48 typical range for assistive-tech studies, indicating that the system's autonomy did successfully offload navigation cognition. Usage patterns were highly individual: P12 spent 91.66 percent of time in auto mode, while P15 used manual control 21 percent of the time and conversation 20 percent; conversation queries split into general (asking what's nearby), specific (asking about a particular object), and command (Take-Me-There or direction-setting). MLLM error analysis on 164 auto-mode descriptions found 28.6 percent contained at least one error (wrong character recognition, wrong object recognition, non-existent objects); the figure rose to 60 percent for the more demanding conversation-mode responses. Museum-guide expert evaluators rated descriptions median 5/7 for naturalness and precision but only 4/7 for suitability as descriptions for blind people, indicating that off-the-shelf MLLM output still under-specifies named exhibits and shop identities. Participants strongly endorsed audio recognition as the next priority, since the museum's ambient sounds were a rich exploration cue the system ignored.
Relevance
For accessibility practitioners and researchers building indoor navigation tools, this paper stakes out an under-served design space: recreational, intent-changing exploration rather than turn-by-turn wayfinding to a fixed destination. The argument for map-less operation is pragmatically important — beacon and pre-mapped infrastructure simply will not be installed in every venue, so scalable accessible navigation has to come from on-board perception. The engineering contribution (cost-map skeletonisation plus DBSCAN clustering for autonomous POI selection) is straightforward and reusable. The user-experience contribution (three user-selectable description levels with concrete prompt designs included as appendices) is particularly transferable to AI-driven scene-description tools beyond robots, including smart-glass and smartphone apps. Limitations are that MLLM accuracy degrades sharply for specific object names and signage, the study sample is small (n=10 formative, n=5 main), the wheeled form factor cannot handle stairs or uneven terrain, and crowded environments were explicitly avoided. The paper sits within the broader Miraikan/IBM/CMU programme on AI-Suitcase-style accessible robotics and is best read alongside the Hata et al. delegation study and earlier CaBot deployments. Practitioners should note the paper's central design claim: configurability of detail and explicit support for intent change are first-order accessibility requirements for exploration, not nice-to-have polish.
Tags: assistive robotics · indoor navigation · blindness and low vision · visual impairment · recreational exploration · map-less navigation · multimodal large language model · human-robot interaction · museum accessibility · image description