Lessons Learned in Designing AI for Autistic Adults

Andrew Begel, John Tang, Sean Andrist, Michael Barnett, Tony Carbary, Piali Choudhury, Edward Cutrell, Alberto Fung, Sasa Junuzovic, Daniel McDuff, Kael Rowan, Shibashankar Sahoo, Jennifer Frances Waldern, Jessica Wolk, Hui Zheng, Annuska Zolyomi · 2020 · ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/3373625.3418305

Summary

This short paper from Microsoft Research presents a candid account of a design failure and the valuable lessons it produced. The team set out to build Video Calling for Autism (VC4A), a video calling application with an "Expressiveness Mirror" that used AI-driven computer vision to detect a user's facial expressions and display them as emojis representing six basic emotions (happiness, sadness, fear, surprise, anger, disgust), along with a bar indicating overall expressiveness. The goal was to help autistic adults understand how their expressions might be interpreted by neurotypical conversation partners, since difficulty reading and conveying facial expressions can create communication barriers in video calls. The design followed a typical iterative process. The team first conducted a formative study interviewing 22 autistic adults about their video calling experiences, then used agile design sprints and co-creation with UX researchers and developers to generate concepts. They converged on the Expressiveness Mirror and evaluated it using Wizard of Oz (WOz) prototyping with five autistic adults, who responded enthusiastically — one said "1000% I want this. I want this in my life." Encouraged, the team built a working prototype using WebRTC and an AI computer vision pipeline trained on neurotypical facial expression data. However, when they conducted a user study with 21 autistic participants using the actual working prototype, the feedback was largely negative, leading them to cancel the second week of the study after just seven sessions.

Key findings

The working prototype failed for several interconnected reasons. First, the AI emotion detection was inaccurate and distracting — it was trained on neurotypical facial expression data and struggled to interpret autistic expressions correctly. The system frequently showed conflicting emotions simultaneously (e.g., happy and sad at the same time), causing participants to lose trust. Second, the six basic emotions were too simplistic; participants wanted to see complex states like anxiety, stress, frustration, confusion, and sarcasm rather than basic Ekman emotions. Third, autistic participants tended to maintain more neutral expressions, which the AI could misinterpret as anger since prolonged neutral expression may be socially read as suppressed anger — a social context the AI completely lacked. The paper identifies four critical design lessons. On AI for accessibility: the model was trained on neurotypical data and applied to a neurodiverse population, a fundamental mismatch. On Wizard of Oz methodology: the human wizard was far more accurate and responsive than the actual AI, creating "conceptual optimism" that masked real-world limitations. Wizards avoided false positives, kept up with real-time conversation flow, and never showed conflicting emotions — things the AI could not replicate. On prototype fidelity: autistic participants responded very differently to low-fidelity WOz prototypes versus the working system, suggesting that this population may need higher-fidelity prototypes to provide reliable feedback. On participatory design: although autistic people were included as study participants, they were not included on the design and development team, violating the disability rights principle of "nothing about us without us."

Relevance

This paper is a valuable case study in the ethics and methodology of designing AI-powered accessibility tools. It demonstrates several pitfalls that are broadly applicable: the danger of training AI on majority-population data and deploying it for minority populations; the limitations of Wizard of Oz prototyping for AI features where human performance vastly exceeds machine performance; and the critical importance of including disabled people not just as test subjects but as co-designers throughout the development process. For accessibility practitioners working with AI, the key takeaway is that well-intentioned technology can backfire when it encodes neurotypical norms as the standard against which neurodivergent people are measured. The Expressiveness Mirror implicitly asked autistic users to conform to neurotypical expression patterns rather than supporting communication on their own terms. The team's honest reflection and their decision to establish an autistic advisory board for future work provides a model for how to respond constructively when a design approach fails. This paper should be required reading for anyone developing AI-driven accessibility features, particularly those involving emotion recognition or social communication support.

Tags: autism · artificial intelligence · emotion recognition · video calling · participatory design · Wizard of Oz · neurodiversity · facial expression