Towards Accessible Musical Performances in Virtual Reality: Designing a Conceptual Framework for Omnidirectional Audio Descriptions
Khang Dang, Grace Burke, Hamdi Korreshi, Sooyeon Lee · 2024 · Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '24) · doi:10.1145/3663548.3675618
Summary
This paper develops a conceptual framework for omnidirectional audio description (AD) designed to make musical performances in virtual reality accessible to blind and low-vision (BLV) users. Traditional AD — a monaural narration track describing visual elements — was developed for film and television, where the viewer has a fixed perspective. In VR, users can look in any direction, creating fundamentally different challenges for describing what is happening visually. The authors conducted a two-phase study: first, semi-structured interviews with 19 BLV AD users about their experiences with audio description at musical performances, and second, open-ended co-design interviews with 4 BLV AD professionals using scenario-based design to evaluate proposed omnidirectional AD concepts across different performance types. Participants described what visual information they wanted conveyed (stage layout, performer appearance and costumes, facial expressions, dance choreography, audience reactions, instrumentation), when they wanted AD delivered (pre-performance, intermission, during performance, post-performance), and what factors made AD effective (narrator voice quality, volume control, level of detail, customisation, and spatial features). The study identified three core challenges with current AD in musical settings: conflicts between music and narration (C1), limited time for detailed descriptions during performance (C2), and lack of spatial information about where things are happening on stage (C3).
Key findings
The framework proposes three complementary design concepts for omnidirectional AD, each addressing different challenges. Spatial AD uses binaural, surround sound, or ambisonic audio to position descriptions in three-dimensional space, so listeners perceive the direction from which the description originates — matching the spatial location of the visual element being described. This addresses the lack of spatial information (C3) and reduces cognitive load by eliminating the need for verbal directional narration. View-dependent AD adapts its content based on where the user is looking — if they turn toward the singer, they hear descriptions of the singer; if they shift focus to the orchestra, descriptions change accordingly. This mirrors how sighted viewers naturally explore a scene and was described by professionals as making AD interactive and immersive. Explorative AD activates when the user pauses playback, allowing them to look around and receive detailed descriptions of whatever is in their field of view without competing with the performance audio. This directly solves the conflict between music and AD (C1) and the limited time problem (C2) by decoupling descriptions from the performance timeline. Participants strongly emphasised the importance of user control — the ability to choose how much detail they receive, adjust volume independently, and decide when to pause and explore. The paper also categorises five types of musical performances (A through E) by their mix of visual and auditory elements, noting that Type E performances (musicals) with all elements present pose the greatest AD challenge.
Relevance
This paper pushes audio description beyond its traditional boundaries into immersive media, an area that is becoming increasingly important as VR adoption grows. For accessibility practitioners, the three design concepts — Spatial, View-dependent, and Explorative AD — offer a transferable vocabulary for thinking about audio description in any 360-degree or interactive environment, not just VR musical performances. The emphasis on user agency and customisation aligns with broader accessibility principles: rather than delivering a single prescriptive narration, the framework empowers BLV users to explore content at their own pace and according to their own interests. The discussion of AI-driven Explorative AD (using computer vision and NLP to generate descriptions during pauses) points toward a practical implementation path. Limitations include the conceptual nature of the framework — the three AD types have not yet been built and empirically tested with users — and the relatively small participant pool, though the inclusion of both AD users and AD professionals strengthens the design validity.
Tags: audio description · virtual reality · blind and low vision · spatial audio · musical performances · immersive media