Auditory and Tactile Interfaces for Representing the Visual Effects on the Web
Chieko Asakawa, Hironobu Takagi, Shuichi Ino, Tohru Ifukube · 2002 · Proceedings of the Fifth International ACM Conference on Assistive Technologies (Assets 02) · doi:10.1145/638249.638263
Summary
This paper from IBM Japan and Hokkaido University explores how blind users can perceive the visual structure and emphasis effects of web pages through auditory and tactile interfaces. The researchers identified a critical gap: modern web pages increasingly use visual effects like background colors, font sizes, and text styling to convey semantic groupings and emphasis, but screen readers and voice browsers ignore this information entirely, presenting only the text content. The study first surveyed sighted users to determine which visual cues are most important for understanding page structure, finding that background and foreground colors are the primary means of recognizing content groupings. Two experimental approaches were then designed and tested with five blind participants. The "macro approach" represents the overall page layout by assigning distinct melodies (Background Color Music, or BCM) to different color-based content groups and non-speech foreground sounds (FS) to indicate links, images, and text within those groups. The auditory and tactile representations play simultaneously — BCM in the right ear and FS in the left — leveraging the Cocktail Party effect to allow selective attention. A custom tactile device with a 16x2 pin array driven by a piezo vibrator provided Background Color Vibration (BCV) as an alternative to BCM. The "micro approach" addresses text-level emphasis by analyzing rich text attributes (font size, style) to classify text into emphasis levels (ELs), represented by bell sounds (Audio Emphasis Levels, or AEL) or vibration patterns layered onto speech output.
Key findings
In the macro approach experiments, subjects achieved high recognition rates for identifying the main content group (83-100% by the second trial) using BCM/BCV with foreground sounds. Recognition of color-based groupings improved with practice but was more challenging for small groups with short durations — groups under 0.2 seconds were essentially unrecognizable. Subjects generally preferred the auditory interface over the tactile interface for the macro approach, finding it more intuitive and natural for grasping overall page structure. In the micro approach experiments testing emphasis level recognition, results were mixed: no subject could identify emphasis levels from plain speech alone (even after three repetitions), but most could correctly identify them when AEL bell sounds or vibration cues were added. Interestingly, for the micro approach, three of five subjects preferred the tactile method while two preferred auditory — the opposite of the macro approach results. This suggests that the auditory sense is better suited for intuitive, holistic recognition of spatial layout, while the tactile sense may be more effective for focused, detail-oriented tasks like recognizing text emphasis during reading. The researchers noted that using one sense for reading content while another provides secondary emphasis information may reduce cognitive overload.
Relevance
This paper, led by Chieko Asakawa (a pioneering blind researcher at IBM who later created the IBM Home Page Reader), addresses a problem that remains largely unsolved today: screen readers still primarily convey text content while ignoring the visual design semantics of web pages. The insight that color-based visual groupings convey semantic meaning that blind users are missing is directly relevant to modern web accessibility. For practitioners, this research highlights that WCAG compliance alone (ensuring text alternatives exist) does not fully address the information gap — visual layout communicates relationships and emphasis that current assistive technologies do not convey. The finding that different senses are better suited to different types of information (auditory for spatial overview, tactile for focused detail) has implications for designing multimodal assistive technologies. While the specific implementation using melodies mapped to colors has not been widely adopted, the underlying concept of nonvisually representing visual structure continues to influence research in accessible data visualization and spatial web navigation.
Tags: sonification · tactile interface · blindness · web accessibility · nonvisual interaction · visual effects · screen reader · multimodal interaction