Inclusive Design of AI's Explanations: Just for Those Previously Left Out?

Md Montaser Hamid, Fatima A. Moussaoui, Jimena Noa Guevara, Andrew Anderson, Puja Agarwal, Jonathan Dodge, Margaret Burnett · 2026 · ACM Transactions on Interactive Intelligent Systems · doi:10.1145/3772074

Summary

This paper investigates whether inclusive design applied to AI explanations can produce curb-cut effects: improvements designed to help underserved users that also benefit everyone. The study centres on Explainable AI (XAI), a field concerned with helping users understand how AI systems reason. The authors examine a real AI product team (Team Game) who used GenderMag, a systematic inclusive design inspection method based on five facets of problem-solving style (Motivations, Information Processing Style, Computer Self-Efficacy, Attitudes Toward Risk, and Learning Style), to find and fix inclusivity bugs in their XAI prototype. The prototype explained two AI agents playing a grid-based version of Tic-Tac-Toe using three explanation types: ranked move scores, scores through time, and scores on the board. Before the study, Team Game applied GenderMag walkthroughs to find places where their explanations failed users with Abi-like problem-solving styles, those more commonly associated with women and often the most underserved in software design. They made 13 fixes, the most impactful being Fix-2 (adding exact win, loss, and draw percentages via tooltips) and Fix-7 (adding a permanent Top 5 Moves display). A between-subjects study was then run with 69 participants who had no formal AI background: 34 used the original prototype and 35 used the post-GenderMag version. Participants were evaluated on their conceptual understanding of the AI agents (mental model concepts scores, based on which correct and incorrect concepts they expressed about the AI) and their ability to predict the AI's next move (prediction error). The GenderMag facets and participants' gender were used to analyse whether effects were especially strong for underserved groups.

Key findings

Four main results emerged. First, the post-GenderMag prototype improved overall conceptual mental model scores significantly: post-GenderMag participants scored higher than original participants (mean 129 vs 109, p=0.049, d=0.40), constituting a broad benefit for everyone. Second, the improvement was especially strong for Abi-like participants (the underserved population targeted by the fixes), who gained significantly more than their original-version counterparts (mean 141 vs 113, p=0.030), confirming a curb-cut effect. The gender gap in mental model scores narrowed by 45%, with women in the post-GenderMag group significantly outperforming women in the original group (mean 128 vs 99, p=0.015). Third, however, the fixes did not improve prediction accuracy and may have harmed it: original participants had marginally better prediction error scores than post-GenderMag participants (p=0.089). This was driven by Fix-7's always-on Top 5 Moves display, which encouraged participants to base predictions on the previous move's top choices, a heuristic that misled them when the game situation had changed significantly. This constitutes a curb-fence effect: an inclusivity fix that over-benefits users by promoting overreliance. Fourth, explanation usage predicted mental model concepts scores (linear regression, p<0.001), and post-GenderMag participants engaged more with explanations.

Relevance

This study has direct implications for accessibility professionals working on AI-powered tools and interfaces. It provides empirical evidence that inclusive design methods like GenderMag can simultaneously improve AI explanations for underserved users and for everyone, producing curb-cut effects analogous to physical curb cuts benefiting wheelchair users and the general public alike. The 45% reduction in the gender gap in AI understanding is a particularly concrete outcome for organisations committed to equitable AI. The curb-fence finding is equally important: design features intended to make AI more accessible can inadvertently encourage overreliance, degrading users' independent judgment. For accessibility practitioners deploying AI explanation systems in contexts such as assistive technology, healthcare tools, or workplace accommodation systems, this tension between declarative comprehension and procedural prediction ability is a practical design challenge. The paper also raises methodological caution: prediction accuracy, widely used to measure XAI effectiveness, may not be appropriate when the explanation goal is conceptual understanding rather than action mimicry.

Tags: explainable AI · inclusive design · GenderMag · mental models · XAI · accessibility · curb-cut effect · gender inclusivity · AI overreliance · problem-solving styles