SocialCue: Exploring the Design Space of Social Wayfinding Assistants for Blind and Low Vision People
Veronica Bossio Botero, Sidharth Sharma, Ruoyu Iris Xu, Lisa Maria DiSalvo, Ritvik Sharma, Brian A. Smith · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26) · doi:10.1145/3772363.3799011
Summary
SocialCue is a wearable technology probe that targets a problem most assistive navigation work ignores: not where to walk, but how to read and act inside an unfolding social scene. The authors argue that existing tools for blind and low vision (BLV) people focus on spatial wayfinding and object identification, while the real exclusion many BLV people experience comes from "cuelessness" around nonverbal social signals such as gaze, facial expression, posture, and turn-taking. They open with Teng et al.'s anecdote of a blind person trying to start a conversation with a garbage can to make the point that getting the location of a person right is not the same as getting the social moment right. To map the design space, the team ran a two-phase formative study: phase 1 was an elicitation study with two BLV participants using static images, identifying four core social attributes (identity, social availability, facial expression, and physical description such as clothing and hairstyle); phase 2 was co-design interviews with three more BLV participants that produced three design requirements: re-identification through physical attributes, discretion via "hidden" interactions that do not announce assistive use, and flexible noise control. They then built SocialCue on a Meta Quest 3 with a cloud backend running four parallel modules (YOLOv10 for identity and 3D location, GPT-4o mini for physical descriptions, a Vision Transformer for action recognition, DeepFace for facial expression), output via spatialized audio using HRTF, and three input modalities: physical buttons, finger-pinch gestures, and voice.
Key findings
Identity is the foundational attribute, but BLV participants wanted substantially more: "visual confirmation" of facial expression to interpret rapport, social availability cues so they know whether someone is reachable for conversation, and "character-based" physical descriptions (hair, clothing, glasses) so a person remains a stable referent in their mental map across reappearances even when names are unknown. Participants explicitly rejected always-on narration; one warned that constant audio would be "a little annoying." Discretion was a strong theme: P2 found hand gestures "glitchy" to onlookers and asked for a button hidden on a frame so queries do not signal assistive use. The design responded with three configurable delivery modes that can run in parallel: passive dwell (a 3-second gaze on a person triggers a default attribute), active query (pull-on-demand), and progressive disclosure (granularity scales with sustained head-pointing, mimicking gaze-based attention). The system itself is not yet evaluated; quantitative outcomes await planned field studies. The contribution at this stage is a design space and a working probe.
Relevance
For accessibility practitioners, this paper is a useful reframing of what "navigation" assistance even means. The literature on BLV navigation has been dominated by sidewalk routing, obstacle avoidance, and object labeling; SocialCue insists that being included socially is a navigation problem in its own right and that current tools (PeopleLens, Person-Finder glasses, WorldScribe) only partially address it. The four-attribute decomposition (identity / social availability / facial expression / physical description) is directly usable as a checklist for any team building social-awareness features into BLV assistive tech, and the discretion and progressive-disclosure design requirements are transferable to AAC, hearing-impaired, and cognitive assistance contexts where bystander perception and information overload matter. Real concerns the paper flags but does not resolve: GPT-4o mini and DeepFace will produce wrong, biased, or stereotyped descriptions of people (race, gender, age, attire), with consequences potentially worse than image-captioning for objects since the subject is a real person standing nearby; bystander privacy from a continuously streaming wearable camera is acknowledged but unresolved. Sample sizes are tiny (N=2 then N=3), and SocialCue itself has not yet been deployed.
Tags: blind and low vision · social navigation · wayfinding · wearable technology · computer vision · spatial audio · technology probe · nonverbal communication · assistive technology