Investigating Cursor-based Interactions to Support Non-Visual Exploration in the Real World
Anhong Guo, Saige McVea, Xu Wang, Patrick Clary, Ken Goldman, Yang Li, Yu Zhong, Jeffrey P. Bigham · 2018 · Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2018) · doi:10.1145/3234695.3236339
Summary
This paper from Google and Carnegie Mellon University defines and compares three cursor-based interaction techniques designed to help blind and low vision people attend to specific items within complex real-world visual scenes. While computer vision systems like Seeing AI and OrCam can provide audio overviews of scenes, blind users often need to focus on specific details — finding a particular object on a table, reading a specific sign, or operating an appliance control panel. The three techniques are: window cursor (moving the phone to scan, with the camera center as the focus point), finger cursor (pointing a finger at real-world objects to get audio feedback about what is near the fingertip), and touch cursor (dragging a finger on the touchscreen to explore a captured image of the scene). The system was implemented as an Android application using multiple computer vision recognizers including object identification, face detection, landmark detection, food identification, and OCR. Audio feedback includes text-to-speech announcements of detected entities plus earcons — brief sound cues indicating when entities or the user's finger are in the camera's field of view. The researchers conducted a user study with 12 participants (9 blind, 3 low vision) across four tasks representative of daily activities: locating an object on a kitchen table, reading a bulletin board schedule, operating a microwave panel, and exploring a petting zoo environment.
Key findings
Each cursor technique proved best suited to different tasks. Window cursor was preferred for locating objects on large surfaces (7 of 9 blind users preferred it) because it required only one hand and the least coordination, though users found the scanning window too small and had difficulty judging angular alignment. Finger cursor was preferred for accessing appliance control panels (6 preferred) because it provided a direct physical connection to the target, making users most confident they were pressing the right button — though finger occlusion of text was a practical problem. Touch cursor was preferred for interpreting documents and understanding spatial layouts (5 preferred) because it supported systematic exploration, though mapping screen positions to real-world locations was unintuitive. A critical cross-cutting finding was the importance of social acceptability: multiple participants expressed concern about looking strange pointing fingers, holding phones up in public, or hunching over screens, with some stating they would not adopt tools that drew unwanted attention. Users consistently requested the ability to freeze images for exploration rather than working with live camera feeds, and wanted intelligent summaries rather than raw entity-by-entity output. Recognition latency (especially OCR at ~1fps) severely degraded usability across all techniques.
Relevance
This study provides foundational design guidance for anyone building computer vision-based accessibility tools for blind users. Its key contribution is demonstrating that no single interaction paradigm works for all tasks — effective real-world exploration requires multiple complementary cursor modes, ideally with automatic switching based on context. The findings challenge the common assumption in accessibility product design that a single camera-based interaction will suffice. For practitioners, the detailed user feedback reveals practical barriers that technical specifications miss: social stigma of conspicuous device use, the mismatch between screen coordinates and physical space, finger occlusion blocking the very text users want to read, and the need for sub-100ms latency for interactions to feel natural. The study also highlights that users who more recently lost their sight found the techniques more intuitive, suggesting that familiarity with visual concepts aids adoption of vision-substitution tools.
Tags: blindness · low vision · computer vision · non-visual exploration · mobile accessibility · cursor interaction · object recognition · OCR · screen reader