Automatic impact sound generation for using in nonvisual interfaces

A. Darvishi, E. Munteanu, V. Guggiana, H. Schauer, M. Motavalli, M. Rauterberg · 1994 · Proceedings of the First Annual ACM Conference on Assistive Technologies (Assets '94) · doi:10.1145/191028.191055

Summary

This paper describes a joint research project between the University of Zurich and the Swiss Federal Institute of Technology (ETH) aimed at developing an audio framework for non-visual interfaces based on physically modeled impact sounds. Rather than using arbitrary audio cues or recorded sound clips, the authors propose generating sounds through physical simulation of real-world interactions — specifically, the sound of spherical objects striking flat plates or beams. The research motivation is that physically accurate sounds carry inherent semantic information: listeners can naturally infer properties of objects (size, material, weight) and interactions (force, surface type) from the sounds they produce. The paper covers three aspects of impact sound research: different synthesis approaches, the process of recording real impact sounds for reference, and spectral analysis comparing recorded sounds to those generated by the physical model. The ultimate goal is an audio framework that describes sounds at a high semantic level, where every sound is characterized as the result of interactions between objects at a particular place and in a particular environment.

Key findings

The authors demonstrate that physical modeling can generate impact sounds that approximate real recorded sounds in their spectral characteristics, validating the approach as a basis for non-visual interface audio. The physical model of spherical objects hitting flat surfaces captures key perceptual properties — listeners can distinguish object size, material density, and surface hardness from the synthesized sounds, just as they can from real-world impacts. The semantic framework approach means that sounds are not arbitrary mappings but carry intrinsic meaning derived from physical properties, potentially reducing the learning burden for users of non-visual interfaces. By parameterizing sound generation, the system allows interface designers to systematically vary audio properties to convey different object characteristics and interaction types without requiring large libraries of pre-recorded samples.

Relevance

This research addresses a fundamental question in non-visual interface design: how to generate meaningful, information-rich audio that helps blind users build mental models of interface objects and their properties. The physically-based approach anticipated modern developments in spatial audio, physics-based sound engines in gaming and VR, and the growing interest in auditory augmented reality for accessibility. The semantic audio framework concept — describing sounds by their physical causes rather than their acoustic properties — aligns with current work on auditory scene description and sound-based object recognition. For accessibility practitioners, the paper highlights that effective non-speech audio goes beyond simple earcons and alerts; physically grounded sounds can convey rich information about object identity and state that supports more intuitive non-visual interaction.

Tags: sonification · auditory display · non-visual interface · physical modeling · sound synthesis · impact sounds · audio framework · blind users · non-speech audio · earcon