Beyond Omakase: Designing Shared Control for Navigation Robots with Blind People

Rie Kamikubo, Seita Kayukawa, Yuka Kaniwa, Allan Wang, Hernisa Kacorri, Hironobu Takagi, Chieko Asakawa · 2025 · Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25) · doi:10.1145/3706598.3714112

Summary

Kamikubo and colleagues investigate how autonomous navigation robots for blind users should balance robot autonomy against user control, arguing that current systems default to what they call 'omakase' — a Japanese term meaning 'I leave it to you' — in which the robot makes all navigational decisions and the blind user simply follows. The authors frame this as an agency problem: if independence for blind travellers means control, choice, and power, then fully autonomous robots may paradoxically undermine the very independence they claim to provide. The paper reports two studies conducted with blind participants recruited from a Japanese museum-accessibility mailing list. Study 1 is a structured-interview study (N=14) that uses a deductive thematic analysis to identify social-navigation challenges in crowded public spaces, surfacing two recurring scenarios — 'streams of people' (unpredictable dynamic crowds) and 'lines' (queues, including U-shaped and split queues). Study 2 (N=13, 3 overlap) is a navigation-task study at Miraikan science museum using a CaBot/AI-Suitcase-derived autonomous robot, with a Wizard-of-Oz overlay so participants could ask questions about their surroundings. Three modes of autonomy — Omakase (passive following), Monitor (information-seeking dialogue with the robot), and Boss (active command issuing) — are offered as culturally grounded design metaphors. Participants freely switched modes across four navigation tasks designed to expose them to stationary crowds, dynamic crowds, and queues, and then took part in follow-up semi-structured interviews to reflect on their control preferences and brainstorm future interaction ideas.

Key findings

Although 9 of 13 participants agreed they would follow the robot given its current capabilities (Q1, Omakase), all 13 wanted Monitor mode to understand the environment (Q2) and all 13 wanted Boss mode to enhance the robot's capabilities (Q3). Mode-transition data showed strong situational patterning: participants shifted from Omakase to Monitor in 89% of stationary-crowd events and 93% of line events to seek information; they then shifted from Monitor to Boss in 89% of stationary-crowd events (to command obstruction-blocking actors to move aside) but returned to Omakase in 79% of dynamic-crowd events once passers-by had cleared. Monitor-mode queries varied predictably by situation — obstacle queries dominated stationary crowds, action queries dominated dynamic crowds, and line-status queries dominated queues. For commands, user-led control was strongly preferred for inquiring bystanders (77%), alerting bystanders (54%), and following physical cues (69%), while robot-led control was preferred for people-following (92%) and moving forward (85%). Qualitative findings highlighted three recurring concerns: social acceptability (users did not want the robot to speak on their behalf in socially delicate situations), fear of diminished human engagement and reinforcing stereotypes that 'blind people can't do anything on their own', and the need for the robot to proactively explain its own stops, detours, and zigzag movements — opaque robot behaviour was a frequent trigger for mode switching and anxiety.

Relevance

For accessibility practitioners designing assistive robots, autonomous wheelchairs, AI-driven navigation apps, or any AI system that acts on behalf of a disabled user, this paper reframes 'autonomy' as a design variable rather than a goal. The three-mode framework (Omakase, Monitor, Boss) is directly translatable to other AI-assistance products — for example, screen-reader AI features, image-description tools, or conversational smart-home assistants — where the same question arises of when the system should decide, when it should answer, and when it should execute a user command. The empirical mapping of crowd situations to preferred autonomy modes is a usable design input: queues and stationary blockages call for user agency; dynamic crowds call for robot control; and transitions between the two must be legible to the user. Practitioners should also note the paper's ethical framing, which draws on Bennett, Brady and Branham's interdependence framework and foregrounds user concerns about robot-mediated interaction looking performative or rude to bystanders. Limitations to flag: the study took place entirely inside one museum with a Wizard-of-Oz Monitor mode (so user experience may not survive a real LLM/VLM implementation), participants had very limited prior exposure to the robot, and all participants used white canes — only two had guide-dog experience, which the authors acknowledge may skew interaction expectations.

Tags: assistive robotics · blindness and low vision · human-robot interaction · social robot navigation · shared control · autonomy · agency · museum accessibility · social acceptability · inclusive design