← All reviews

Iterative Design of Sonification Techniques to Support People with Visual Impairments in Obstacle Avoidance

Giorgio Presti, Dragan Ahmetovic, Mattia Ducci, Cristian Bernareggi, Luca A. Ludovico, Adriano Baratè, Federico Avanzini, Sergio Mascetti · 2021 · ACM Transactions on Accessible Computing · doi:10.1145/3470649

Summary

This paper presents WatchOut, a sonification system that conveys real-time obstacle information to blind or visually impaired (BVI) people through non-verbal audio cues. While white canes detect obstacles only within about 1 meter and miss elevated hazards, computer vision on mobile devices can identify obstacles at greater distances. The challenge addressed here is how to communicate this information effectively through sound without overwhelming cognitive load or requiring extensive training. The researchers conducted four iterative design cycles with 61 BVI participants through online listening tests, progressively refining how obstacle properties map to auditory dimensions. The system detects obstacles using Apple's ARKit framework, which identifies 3D feature points from camera images and clusters them into obstacle candidates. The sonification uses a synthetic base sound (filtered sine wave with percussive attack) that avoids confusion with environmental sounds, delivered through bone-conducting headphones that keep the ear canal open for ambient awareness. The four obstacle properties initially considered were: distance (0.1-3m), horizontal position (left/center/right), height (walkable vs. must circumvent), and width (narrow vs. wide). Each was mapped to an auditory dimension chosen for "ecological validity"—meaning the sound-to-meaning relationship should feel intuitive based on real-world experience rather than arbitrary convention.

Key findings

The iterative design revealed a critical insight: there is a practical limit to how many concurrent auditory dimensions users can reliably distinguish with short training. With four properties (iteration 1), global recognition accuracy was only 35%. By iteration 3, using improved mappings (pitch for height, reverberation for width), accuracy reached 71%. But the breakthrough came in iteration 4, where dropping width and conveying only three properties—distance, position, and height—achieved 96% accuracy, nearly perfect. The successful auditory mappings were: distance to intermittence rate (50-280 pulses per minute, like sonar), position to stereo panning, and height to pitch (low pitch for walkable obstacles, high pitch for those requiring circumvention). Distance and position mappings achieved over 90% accuracy from the first iteration, validating their ecological validity. Height recognition improved dramatically when switched from timbre (filter cutoff frequency) to pitch—a mapping that appears cross-culturally intuitive. Musical expertise significantly affected recognition of certain dimensions, particularly timbre-based mappings, suggesting that such mappings may be too subtle for general populations. Polyphonic (chord-based) mappings for width improved accuracy but increased unpleasantness ratings. The final three-dimension design scored highest on both comprehensibility (4.06/5) and pleasantness (1.72/5 unpleasantness, lowest across iterations).

Relevance

This research provides actionable design guidance for any assistive technology requiring non-speech audio feedback. The key finding that concurrent sonified dimensions should be limited to approximately three challenges systems that attempt to convey complex spatial scenes through sound alone. For practitioners building navigation aids, obstacle detectors, or augmented reality systems for BVI users, this establishes an evidence-based ceiling for auditory information density. The ecological metaphor principle—that auditory mappings should align with real-world expectations—offers concrete guidance: higher pitch for elevated/urgent obstacles (like alarms), faster pulses for closer objects (like parking sensors), and spatial panning matching physical position. These aren't arbitrary; they leverage existing cognitive associations. The real-world evaluation with 13 BVI participants achieved 85%+ obstacle avoidance and a System Usability Scale score of 72.5 ("Good"). Participants specifically requested the ability to detect head-level obstacles (a common mobility hazard missed by canes) and suggested multimodal feedback combining sonification with vibration. The use of bone-conducting headphones was validated as essential for maintaining environmental awareness—a design requirement that applies broadly to any audio-based assistive technology used during navigation. The iterative, user-centered methodology—using online listening tests to rapidly iterate designs with disabled participants—provides a practical model for accessibility research that accommodates participants who may have difficulty traveling to lab settings.

Tags: visual impairment · blindness · sonification · obstacle avoidance · orientation and mobility · computer vision · auditory display · navigation assistance