AR Haptic-Audio Conversion for Non-Visual Product Understanding in Smartphone AR

Satomi Tokida, Ayaka Tsutsui, Norimasa Kobori, Matthew Gillingham, Bektur Ryskeldiev · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26) · doi:10.1145/3772363.3798874

Summary

Tokida and colleagues (Mercari R4D, University of Tokyo, University of Tsukuba) tackle a specific gap in e-commerce accessibility: the AR 'view-in-room' features now common in online marketplaces (IKEA Place, Amazon AR View) assume sighted interaction and leave blind and low-vision (BLV) users with no way to independently assess a product's shape, size, or material. The paper combines a formative interview study (seven BLV participants, semi-structured Zoom interviews, reflexive thematic analysis) with a working iOS prototype and a four-participant pilot evaluation. Formative findings map directly onto design requirements: color and pattern nuance is lost in text; material and texture information is essentially absent online; numeric dimensions do not convey spatial feel; and BLV shoppers want to 'touch' products as they would in a physical store. The prototype, built on ARKit, RealityKit, and Core Haptics, loads USDZ 3D models, places them one meter in front of the user, and drives feedback from a raycast at the screen centre. Geometry is converted into continuous vibrotactile feedback whose intensity rises as the phone approaches an object; material properties (colour, material, texture, part name and description) are announced via speech synthesis through VoiceOver or AVSpeechSynthesizer. Four feedback conditions were compared in the pilot: baseline text, haptics-only, audio-only, and combined haptics+audio, using four products (bag, microwave, chest of drawers, hoodie).

Key findings

This is a preliminary pilot with only four BLV participants (three totally blind using VoiceOver, one low-vision using iPhone zoom), so results are directional rather than conclusive. Across four 5-point Likert items, the combined haptics+audio condition scored highest on ease of product understanding (M=4.00, SD=0.00) and on confidence in purchase decision-making (M=3.00, SD=0.82). For shape perception, haptics-only and combined tied (M=4.00), supporting the hypothesis that vibrotactile feedback is well-suited to conveying geometry; participants described 'tracing the outline' to grasp size. Crucially, haptics-only scored lowest on imagining material properties (M=1.25, SD=0.50) - participants said vibration alone 'doesn't convey material, only size', validating the authors' decision to route material information through speech rather than trying to render texture with commodity smartphone actuators. Qualitative feedback highlighted emotional value ('I had given up thinking it was impossible... being able to touch the product was moving') and concrete improvements: adjustable spawn distance, varied vibration patterns per component, and more natural-language audio descriptions (the current output was described as 'stiff'). Current prototype limitations: manually authored JSON material metadata and hand-defined spherical part zones do not scale to real catalogs.

Relevance

For accessibility practitioners working on e-commerce or mobile AR, this paper offers a concrete, commodity-hardware blueprint: commodity iPhones + ARKit + Core Haptics can render non-visual product exploration without specialized haptic gloves or tactile displays. The key design insight - modality-matched encoding, with haptics for spatial/shape information and speech for material and semantic information - generalizes beyond shopping to any AR scenario where BLV users need to understand 3D content. The paper also usefully documents formative barriers (colour nuance, material absence, size imagination) that any accessible e-commerce effort should treat as baseline requirements. Caveats are significant: n=4 pilot, two participants also in the formative study, iOS-only, manually authored metadata, no deployment in a real marketplace. The authors' own future-work suggestions (vision-language models for automatic material classification, mesh segmentation for part boundaries) are the realistic path to scale.

Tags: blind and low vision · augmented reality · mobile AR · haptics · vibrotactile · e-commerce accessibility · multimodal interaction · non-visual interaction