Augmenting Imagery with Multimodal Vibrotactile Representations: Touch, Feel, and Hear

Mazen Salous, Matthias Kramer, Wilko Heuten, Charles Hudin, Susanne Boll, Larbi Abdenebaoui · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3791493

Summary

This CHI 2026 paper addresses a long-standing gap in image accessibility for blind and visually impaired (BVI) users: while alt text, sonification, tactile graphics, and refreshable Braille displays convey shape, layout, and identity, they rarely communicate how an object actually feels or sounds. Existing tools can tell a user an image shows a "leather sofa" or trace its silhouette in raised lines, but cannot convey the subtle cues of texture, hardness, or material substance. The authors propose augmenting material regions of photographs with multimodal vibrotactile patterns that users can touch and feel via a multi-local vibrotactile tablet, adding a "how it feels" dimension to existing image-access pipelines rather than replacing them. The study compares four pattern-generation approaches (APs) side by side. AP1 (LLM_One-Shot) uses OpenAI's o3 with a one-shot example prompt to synthesize a vibrotactile waveform directly from a material label. AP2 (LLM_Audio) generates an intermediate audio clip from text via the MyEdit model, then converts it to a vibrotactile signal. AP3 (Real_Recording) captures real finger-material interaction audio and translates it to vibration. AP4 (HDB) uses pre-recorded patterns from the public Cluster Haptic Texture Database. A custom multi-local vibrotactile tablet with 16 piezoelectric actuators on a 15-inch OLED screen played patterns for 10 materials (wood, stone, glass, plastic, ceramic, metal, leather, denim, felt, water). Eight BVI participants (ages 45-81, five congenitally blind) explored each image with all four patterns alongside a real material sample, ranking patterns from best to worst match using a Borda-type scoring rule and providing think-aloud justifications.

Key findings

No single generation method dominated across materials. AP3 (Real_Recording) was the most stable approach with rankings clustering between 2-3 (Median=3, IQR=0.81) and was most authentic for materials with rich interaction sounds like denim and glass. AI-generated patterns (AP1, AP2) performed comparably to the HDB baseline, with AP1 (LLM_One-Shot) topping ceramic (Median=4, IQR=2.5) and AP2 (LLM_Audio) leading for wood and metal (Median=3.25). HDB excelled for rough rigid materials — plastic, leather, and stone. Thematic analysis of think-aloud data produced five themes: (T1) realism — participants expected rough/grainy vibrations for wood and stone, smooth/steady for glass, and wave-like dynamics for water; (T2) distinctiveness — uniform buzzes were criticized, with participants demanding separable cues between materials; (T3) personal material associations varied widely, some expecting softness for denim, others coarseness; (T4) cognitive effort and calibration — faint patterns were "hard to find" or "almost not to feel," requiring per-material intensity tuning; (T5) preferences including hybrid pipelines and explicit incorporation of actuator motor noise as a useful auditory cue. Design implications include: strong, coarse, crackling signals for rough rigid textures; stable minimal-sharpness vibrations for smooth hard solids; dynamic temporal patterns (waves, ringing decays) for materials defined by bulk properties rather than surface texture; calibration of intensity per material and per user to avoid both "almost disgusting" over-intensity and imperceptibility; and hybrid pipelines combining AI-generated patterns with real recordings and user tunability.

Relevance

For accessibility practitioners, this paper opens a concrete research path beyond shape-and-outline tactile graphics toward representing material substance — a dimension BVI users repeatedly describe as missing from photo-based platforms like online shopping catalogues, news imagery, and social media. The quoted participant observation ("When my sister shows me her new dress… if it could ever be on phones, would let me get a sense of it immediately") frames a real user need that existing alt-text pipelines and AI captions do not address. The paper is also methodologically useful: it demonstrates a side-by-side evaluation of AI-generated, audio-derived, and database-sourced haptics with BVI participants rather than sighted users, a rarity in haptics research. Limitations are significant — eight participants skewed toward older (45-81) and congenitally blind, a custom piezoelectric tablet that is not yet consumer-available, and an exploratory ranking design that precludes inferential claims. Nonetheless, the clear design guidelines and the public GitHub release of patterns and prompts give HCI and accessibility researchers a tractable starting point for extending image accessibility into multimodal material perception.

Tags: image accessibility · blind and low vision · vibrotactile feedback · haptic rendering · material perception · sensory substitution · multimodal interaction · AI-generated haptics · audio-to-haptics