Designing Tools for High-Quality Alt Text Authoring

Kelly Mack, Edward Cutrell, Bongshin Lee, Meredith Ringel Morris · 2021 · Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '21) · doi:10.1145/3441852.3471207

Summary

This paper investigates how to improve the quality of alternative text through better authoring interfaces and feedback mechanisms for automatic alt text, focusing on Microsoft PowerPoint as the application context. The researchers built and tested four prototype interfaces: two for authoring alt text (a free-form interface with bulleted suggestions and a structured template interface with separate fields for each suggestion) and two for providing feedback on automatic alt text (a checkbox interface and an icon-based interface). The study was motivated by the finding that while prior research has examined screen reader users' preferences for alt text content, very little work had explored the experience of the people who actually write alt text. The team conducted two complementary studies. Study 1 involved combined interview and usability testing sessions with 12 sighted alt text authors at Microsoft, who tested all four interface prototypes using images from their own PowerPoint presentations. Study 2 consisted of interviews with six screen reader users (SRUs), five of whom identified as blind and one as low vision, who evaluated the quality of alt text generated through the different interfaces and shared their perspectives on what constitutes high-quality alt text. The researchers distinguished between two broad image categories — photographs and non-photographs (graphs, tables, screenshots, diagrams) — recognising that each requires different information in its alt text. The authoring interfaces provided tailored suggestions for each image type, such as describing the subject, actions, and setting for photographs, versus the image type, key information, and main takeaway for non-photographs.

Key findings

The study produced three major findings with significant implications for alt text tooling. First, authoring interfaces that provide suggestions for what to include resulted in higher quality alt text compared to the current PowerPoint interface, as judged by both a researcher-applied quality scale and SRU rankings. The template interface generated the best-ranked alt text overall, while experienced authors preferred the free-form interface and novices preferred the more structured template. Second, there was a substantial mismatch between alt text authors' and screen reader users' perceptions of quality. Over half of author participants rated inaccurate automatic alt text as "acceptable," while SRUs unanimously found the same descriptions unacceptable. SRUs prioritised accuracy and completeness, while authors had lower bars for acceptability. This gap means that author-provided feedback on automatic alt text quality may not reliably reflect SRU needs. Third, and perhaps most strikingly, authors wrote significantly lower quality alt text when starting from pre-populated automatic alt text (mean quality 2.38/4) compared to starting from a blank text box (mean quality 2.96/4). Authors treated the automatic alt text as a "gold standard" baseline, often simply adding minor edits rather than writing comprehensive descriptions. This anchoring effect meant that important details like gender, context, and specific visual features were frequently omitted when automatic alt text was present.

Relevance

This research has immediate practical implications for any platform that supports image descriptions. The finding that automatic alt text actually degrades author-written alt text quality is a critical warning for platforms like PowerPoint, Google Slides, and social media services that pre-populate alt text fields with AI-generated descriptions. Platform designers must carefully consider how and when to surface automatic alt text to avoid anchoring authors to low-quality baselines. The paper suggests educating authors about the purpose and limitations of automatic alt text, or prompting them to check for completeness and accuracy rather than simply accepting AI output. The concept of image-type-aware suggestions is immediately actionable: even without AI, providing simple prompts like "describe the subject, setting, and main action" for photographs can meaningfully improve alt text quality. The SRU finding that context matters — why an image was included, not just what it shows — challenges common alt text guidance and suggests that authors who understand the document's purpose produce better descriptions than crowd workers or AI systems who lack that context. For accessibility practitioners, this paper reinforces that alt text quality is a design problem, not just a content problem.

Tags: alt text · image accessibility · screen readers · authoring tools · automatic alt text · computer vision · content accessibility · user interface design

Standards referenced: WCAG 2.0