Toward Effective Communication of AI-Based Decisions in Assistive Tools: Conveying Confidence and Doubt to People with Visual Impairments at Accelerated Speech

Taslima Akter, Manohar Swaminathan, Apu Kapadia · 2024 · Proceedings of the 21st International Web for All Conference (W4A) · doi:10.1145/3677846.3677862

Summary

This paper investigates how people with visual impairments (PVIs) perceive confidence and doubt in speakers' voices when audio is played at accelerated speeds — a critical question because AI-based assistive tools (scene descriptors, facial recognition, image captioners) can mischaracterize information and currently provide no indication of their certainty level. Since PVIs routinely consume audio at accelerated speeds through screen readers, the authors explored whether vocal tone cues conveying confidence or doubt survive speed increases. Two large online surveys were conducted: one with 151 PVI participants (97 totally blind, 54 with other visual impairments; 64.2% blind since birth; 57.3% advanced screen reader users) and another with 170 sighted participants. Participants heard short semantically neutral sentences (e.g., "He is on a new medication") recorded by six native English actors at intended confidence levels (confident or doubtful), played at four speeds: default, 1.5x, 2x, and 2.5x. They rated perceived confidence on a 5-point Likert scale. The study used a between-subjects design for speed conditions and within-subjects for confidence levels, analyzed with linear mixed-effects models. The researchers also examined how factors like preferred playback speed, screen reader proficiency, age, and gender influenced perception accuracy.

Key findings

PVI participants could accurately perceive confidence and doubt in speakers' voices up to 1.5x playback speed, while sighted participants maintained accuracy up to 2x speed. This gap narrowed at 1.5x speed, where PVIs actually outperformed sighted participants in perceiving confidence. At 2.5x speed, both groups' performance dropped significantly. Several individual factors shaped perception accuracy among PVIs: those who habitually preferred accelerated playback speeds in daily life perceived doubt significantly more accurately at higher speeds compared to those who preferred default speed (d = 0.6-0.87 across speeds). Advanced screen reader users perceived confidence more accurately than intermediate users at default (d = 0.3) and 1.5x speeds (d = 0.38). Most PVIs preferred 1.5x speed (25.5%) or 1.25x (14.8%), with 82.1% relying exclusively on screen reader audio. Sighted participants preferred default (36.7%) or 1.25x (27.8%). Both groups perceived voices with rising pitch as less confident, and both perceived doubt similarly across all speeds. The study found no significant differences between groups in doubt perception at any speed, suggesting doubt cues may be more robust to acceleration than confidence cues.

Relevance

As AI-powered assistive tools become ubiquitous — from smartphone scene descriptors to smart glasses — this research addresses a fundamental trust problem: these systems present their outputs with equal certainty regardless of actual confidence, which can lead to embarrassment or harm when descriptions are wrong (e.g., misidentifying someone's facial expression or gender). The findings provide concrete design guidance: assistive technologies should convey AI confidence through vocal tone, but must limit speedup rates to 1.5x for PVI users to preserve these cues. This has implications for screen reader design, voice assistant output, and any audio-based AI feedback system. The research also highlights the broader issue of AI bias in assistive tools — BIPOC, non-binary, and transgender users have reported being misrepresented by AI image descriptors, making confidence communication even more critical for marginalized communities. Limitations include the use of human-recorded rather than synthesized speech (PVIs primarily hear synthesized voices) and a relatively small stimulus set of three sentences, but the large participant samples (N=321 total) and rigorous statistical analysis make the core findings robust.

Tags: visual impairments · artificial intelligence · speech processing · screen readers · assistive technology · explainable AI · accelerated speech · trust