Color-Audio Encoding Interface for Visual Substitution: See ColOr Matlab-based Demo
Juan Diego Gomez, Guido Bologna, Thierry Pun · 2010 · Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2010) · doi:10.1145/1878803.1878853
Summary
This demonstration paper presents the See ColOr (Seeing Color with an Orchestra) system, a framework that translates colour information from images into spatialized musical instrument sounds to provide people who are blind with a form of colour perception through hearing. The system works by encoding the three components of the HSL (Hue, Saturation, Luminosity) colour model into distinct audio properties. When a user selects a pixel in an image, a horizontal row of 25 pixels centred on that point is extracted and each pixel is sonified. Hue is mapped to seven musical instruments: oboe for red, viola for orange, pizzicato violin for yellow, flute for green, trumpet for cyan, piano for blue, and saxophone for purple. Colours falling between two hue ranges produce a linear mixture of the corresponding instruments. Saturation is encoded through musical pitch using four notes (C, G, B-flat, E) across four saturation bands. Luminosity is represented by mixing in either a double bass (for darker colours, L<0.5) or a singing voice (for lighter colours, L>0.5), with extreme values producing unmixed bass or voice respectively. Spatial position along the 25-pixel row is conveyed through 2D spatialized audio using Head-Related Transfer Functions (HRTFs), so leftmost pixels sound from the left and rightmost from the right. The system requires 900 pre-computed sound files (36 base sounds times 25 spatial positions) and was implemented as an open-source Matlab demo.
Key findings
The system demonstrates a systematic approach to encoding a theoretically infinite gamut of perceivable colours into distinguishable audio signals using the three HSL dimensions mapped to instrument type, pitch, and bass/voice mixing. Using musical instruments rather than spoken colour names significantly increases the information transfer rate — users can hear complex colour blends and spatial arrangements simultaneously rather than waiting for sequential verbal descriptions. The HRTF-based spatial audio allows users to perceive the horizontal arrangement of colours in a scene, providing spatial context that goes beyond simple colour identification. The framework was tested with embossed images showing obstacle edges and with a stereoscopic vision mobility prototype for depth estimation, demonstrating applicability beyond static image exploration. The open-source Matlab implementation was provided for training blind users to learn the colour-to-instrument associations.
Relevance
See ColOr represents an inventive approach to a fundamental accessibility gap: the inability of people who are blind to perceive colour, which is critical for locating and identifying objects in the environment. For accessibility practitioners, the system illustrates several design principles worth noting. First, using musical instruments as colour representations leverages a rich perceptual dimension — people can distinguish timbres more intuitively than, say, frequency tones. Second, the multi-dimensional encoding (instrument for hue, pitch for saturation, bass/voice for luminosity) shows how multiple visual properties can be mapped to distinct audio dimensions without overloading a single channel. Third, the spatial audio component adds positional information that moves the system beyond simple colour identification toward scene understanding. The practical limitations include the learning burden of mastering instrument-colour associations and the cognitive load of processing complex multi-instrument soundscapes. As a research prototype, See ColOr was a precursor to the same team's later Kinect-iPad scene understanding work, which extended these colour sonification principles into 3D object recognition and spatial mapping.
Tags: sonification · visual substitution · blindness · color perception · sensory substitution · spatial audio · HRTF · assistive technology