A Generic Direct-Manipulation 3D-Auditory Environment for Hierarchical Navigation in Non-Visual Interaction
Anthony Savidis, Constantine Stephanidis, Andreas Korte, Kai Crispien, Klaus Fellbaum · 1996 · Proceedings of the Second Annual ACM Conference on Assistive Technologies (Assets '96) · doi:10.1145/228347.228366
Summary
This paper presents a generic, reusable 3D-auditory environment designed to support hierarchical navigation in non-visual user interfaces. The authors address a significant gap in assistive technology development: while 3D audio had been used in a few specialized systems for blind users, those implementations were tightly coupled to specific applications and could not be reused across different non-visual interaction contexts. The system combines 3D audio output with 3D pointing via a hand-glove device, hand gestures, and voice input to create a multimodal interaction environment. The core abstraction is a "ring" metaphor where selectable objects are arranged in a circular auditory space around the user, with each object represented by a distinct sound positioned in 3D space. Users navigate by turning the ring left or right, with the focused object always positioned directly ahead. The environment separates the hierarchical structure navigation component from the object dialogue implementation, allowing toolkit developers to plug in different interaction behaviors for different object types. The architecture consists of three asynchronous server components — a ring server managing the audio ring structure, a glove server handling 3D pointing and gesture recognition, and a speech input server processing voice commands — connected through a well-defined communication protocol.
Key findings
The ring-based auditory navigation metaphor proved effective for presenting hierarchically structured selection sets in 3D audio space. The system demonstrated that separating navigation structure from object-specific dialogue implementation enables genuine reusability across different non-visual toolkits. The architecture successfully handled key interaction challenges including dynamic focus tracking via head-related transfer functions (HRTFs), auditory cues for object insertion, deletion, and modification events, and state preservation when navigating between hierarchy levels. The communication protocol between the three server components (ring, glove, and speech) allowed flexible combination of input and output manipulation functions. The modular design meant that toolkit developers only needed to implement object-specific dialogue handlers while relying on the generic navigation infrastructure for all hierarchical browsing tasks. The environment was developed as part of the ACCESS and GUIB European research projects and was integrated into a larger multimedia non-visual toolkit under development.
Relevance
This 1996 paper is an early and important contribution to the design of reusable non-visual interaction toolkits — a challenge that persists in accessibility technology development today. The principle of separating generic navigation infrastructure from application-specific interaction logic remains highly relevant to modern assistive technology design. The multimodal approach combining spatial audio, gesture, and voice input anticipated interaction paradigms now common in virtual and augmented reality accessibility. For practitioners, the paper illustrates the value of toolkit-based approaches that lower the barrier for developers to create accessible non-visual interfaces, rather than requiring each application to build accessibility features from scratch. The ring metaphor for auditory menu navigation influenced subsequent work in audio-based interfaces and screen reader design.
Tags: non-visual interaction · auditory interface · 3D audio · spatial audio · gesture recognition · voice input · assistive technology · toolkit design · hierarchical navigation