From Screen Reading to "Scene Reading" in SceneVR: Touch-Based Interaction Techniques for Use in Virtual Reality by Blind and Low-Vision Users

Melanie Jo Kneitmix, Jacob O. Wobbrock · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3663547.3746364

Summary

This paper introduces "scene reading," a novel interaction paradigm that extends the familiar concept of touch-based screen reading from 2-D interfaces to 3-D virtual reality environments. The authors developed SceneVR, a system that streams the live view from a Meta Quest 2 VR headset onto an iPad, allowing blind and low vision (BLV) users to explore virtual scenes by dragging their finger across the touchscreen. As users touch different areas, SceneVR identifies the virtual objects beneath their finger through ray casting and announces object labels via spatial audio that conveys the object's position in the virtual space. The system supports multiple interaction modes: spatial scene reading (continuous finger drag for free-form exploration), sequential scene reading (one-finger flicks to move through objects in order), and overview scene reading (circular gesture to hear all visible objects). A split-tap gesture provides detailed descriptions of objects, similar to how VoiceOver and TalkBack reveal additional information. To manage the complexity of 3-D scenes, SceneVR implements progressive disclosure through two mechanisms: user-proximity-based level of detail (child objects revealed as the avatar approaches) and object groups (hierarchical navigation where users flick up to enter a group and down to exit). The system also supports locomotion via two-finger gestures for rotation, walking, and teleportation. The study evaluated SceneVR with 12 BLV adults (1 low vision, 11 legally blind) aged 32-75 through a task-based usability study across two virtual environments: a medieval market and a fast food restaurant.

Key findings

Participants completed 91 of 96 tasks successfully (94.8%), demonstrating that scene reading effectively supports BLV exploration of virtual environments. Satisfaction was high (M = 5.92 on a 7-point scale), with participants describing the system as enjoyable and emphasizing their ability to use it independently—something they could not achieve with commercial VR controllers. NASA-TLX scores indicated low overall workload, with mental demand being the highest dimension but still moderate (M = 2.83 on a 7-point scale). The iGroup Presence Questionnaire revealed high general presence (M = 5.50) and spatial presence (M = 5.35), though experienced realism was lower (M = 3.33). Sequential reading was the most frequently used interaction method (50.91% of scene reading interactions), followed by overview reading (27.59%) and spatial reading (21.50%). Short object labels accounted for 97.32% of annotation usage, with longer descriptions rarely accessed. A critical finding was the tight coupling between sensory feedback and annotations: when users perceived environmental elements through residual vision, spatial audio, or contextual inference, they expected corresponding scene-reading annotations. Mismatches—such as hearing ambient sounds but finding no annotation for the sound source—disrupted the sense of presence and coherence. Nearly all participants experienced difficulty remembering the full gesture set, raising concerns about cognitive load during initial learning.

Relevance

This work addresses a significant gap in VR accessibility by demonstrating that touch-based interaction, already familiar to BLV users through mobile screen readers, can be effectively adapted for 3-D virtual environments. The concept of scene reading offers a practical framework that VR developers and platform makers can build upon to make immersive content accessible. The finding that annotations and environmental sensory feedback must be tightly coupled provides an important design principle for multi-sensory accessible VR: every perceivable element needs an annotation, and every annotation should have corresponding sensory feedback. The progressive disclosure approach through object hierarchies offers a scalable model for managing information complexity in rich virtual environments. For the broader accessibility community, this research demonstrates that BLV users can meaningfully participate in VR experiences with appropriate interaction design, challenging assumptions that VR is inherently visual and inaccessible.

Tags: virtual reality · blind and low vision · touchscreen interaction · spatial audio · scene understanding · screen reader · progressive disclosure · accessible gaming · object hierarchy