Presenting Information in Sound

Sara Bly · 1982 · Proceedings of the 1982 Conference on Human Factors in Computing Systems (CHI '82) · doi:10.1145/800049.801814

Summary

Sara Bly's 1982 CHI paper, written at Lawrence Livermore National Laboratory, is one of the earliest systematic investigations of using computer-generated sound — rather than graphics — to present multivariate data to human analysts. Bly opens by arguing that purely visual techniques for high-dimensional data (bar charts, pseudo-colour imagery, 3-D projections, Andrews curves, and Chernoff faces, which are illustrated in the paper) have two fundamental limits: the human must focus visual attention on a single region of the screen at a time, and only a handful of dimensions can be mapped before the display becomes unreadable. She proposes that computer-generated sound can ease these restrictions by providing auditory cues that do not require visual focus, by increasing the dimensionality of what a display can convey, and by offering an alternative perceptual 'view' of the same data. The core technical contribution is a mapping scheme in which each sample of an n-dimensional data set is rendered as a single discrete note whose pitch, volume, duration, fundamental waveshape, attack envelope, and overtone waveshape each correspond to one data variable. Pitch was varied across 48 notes over four octaves; volume in 12 increments; duration 50-1050 ms in 5 ms steps; waveshape morphed from pure sine to random buzz in ten steps. Bly then reports two empirical studies (Phase 1 and Phase 2) designed to test whether listeners can discriminate data sets by sound alone, and whether combining sound with graphics gives analysts more information than either modality in isolation.

Key findings

Phase 1 tested whether six-dimensional data sets that differed by translation, scaling, or correlation could be distinguished aurally. Seven subjects correctly classified samples at rates well above chance: 92% correct for a 3-standard-deviation translation, 70% for a 1-SD translation, 53% for a 0.5-SD translation (near chance, as expected); 69% for 8-SD scaling, 74% for 4-SD, and 55% for 2-SD; and 60% for strongly correlated sets. Five subjects tested on the hardest scaling condition again after practice improved to 76.5%. Phase 2, with 75 subjects split into sound-only, graphics-only, and combined groups, produced the headline result: the combined sound-plus-graphics group consistently scored highest (69% correct), sound-only matched graphics-only (64.5% vs 62%), and for the top-performing subjects the combined modality advantage was pronounced. A follow-up with 10 sound-only subjects given extended training and reference samples pushed correct identification to 74%. Bly concludes that sound can encode multivariate data meaningfully, that sound alone can sometimes match or exceed graphics, and that combining modalities is most powerful. She also flags important caveats: sound parameters are not perceptually independent (timbre depends on pitch and volume), and it is not yet known which parameters carry the most meaning or how many can be distinguished simultaneously.

Relevance

This paper is a foundational reference for sonification, auditory display, and non-visual interaction in HCI, and it remains directly relevant to modern accessibility work — particularly data-visualisation accessibility for blind and low-vision users, auditory dashboards in screen-reader contexts, and multimodal interfaces that pair visual charts with audio tones (a technique now used in tools like Highcharts' Sonify and iOS VoiceOver audio graphs). Bly's empirical demonstration that sound-only presentation is competitive with graphics challenges the default assumption that data must be visual to be intelligible, and her caution about non-independent sound parameters (pitch, volume, timbre interact) still shapes how modern audio-data designers pick mappings. Limitations: the experiments used small, homogeneous subject pools of scientific analysts rather than blind or low-vision users, the accessibility framing is implicit rather than explicit, and subjects heard discrete note-per-sample encodings rather than the continuous-parameter sonifications common today. Nonetheless, for any practitioner designing accessible data experiences, this paper is essential historical context.

Tags: sonification · auditory display · data visualization · multivariate data · non-visual interaction · historical · computer music · accessible data