Guiding Novice Web Workers in Making Image Descriptions Using Templates

Valerie S. Morash, Yue-Ting Siu, Joshua A. Miele, Lucia Hasty, Steven Landau · 2015 · ACM Transactions on Accessible Computing (TACCESS) · doi:10.1145/2764916

Summary

This study compares two approaches for using non-expert crowdworkers to create accessible descriptions of STEM images (charts, graphs, diagrams) for people who are blind or have print-reading disabilities. The researchers tested Free-Response Image Description (FRID), where workers receive NCAM accessibility guidelines and an empty text box/table to compose descriptions freely, against Queried Image Description (QID), where workers answer structured questions about the image and a template automatically generates the description according to guidelines. Twenty-two participants described six STEM images—horizontal and vertical bar charts, line graph, pie chart, scatter plot, and Venn diagram—using both methods in randomized order. Each participant described three images with FRID and three with QID. The study developed question sets and description templates for each image category, with conditional logic to adapt to image features (e.g., asking about each wedge only if the worker indicates a pie chart has multiple wedges). The templates produce both prose descriptions and data tables following NCAM guidelines developed through expert consensus and validated with users who have visual impairments.

Key findings

QID descriptions were significantly more complete on key accessibility elements: 100% included image category (vs 72% for FRID), 84% included title (vs 35%), 94% included caption (vs 18%), and 100% included units (vs 36%). These are precisely the elements that users with visual impairments report asking about first when exploring STEM images. QID descriptions were also more standardized—37% of six-word segments were shared between QID descriptions of the same image, compared to only 1% for FRID descriptions. This consistency matters because research shows users find image descriptions easier to understand when they follow predictable formats. Despite the structural guidance, both methods took similar time (10-12 minutes per image) and produced similar word counts. Workers strongly preferred QID: 16 chose it over 3 who preferred FRID (3 had no preference). They rated QID significantly easier (1.95 vs 2.95 on a 1-4 scale). This suggests template-based approaches could increase worker willingness to describe images at scale.

Relevance

This research has direct implications for organizations seeking cost-effective ways to make image-heavy content accessible. The key finding—that explicit guidelines alone are insufficient for novice describers—challenges common accessibility training approaches. Simply teaching people the rules does not produce compliant descriptions; instead, structured workflows that extract information and apply guidelines automatically are more effective. The templates developed for six STEM image categories provide a practical starting point for educational publishers, testing organizations, and STEM educators. The QID approach could be integrated into tools like Benetech's POET or crowdsourcing platforms. For accessibility practitioners, this validates investing in structured description workflows rather than relying solely on training. A limitation is that the study did not evaluate description usability with actual users who have visual impairments—a critical next step for validating that QID's technical completeness translates to better user experience.

Tags: image description · alt text · crowdsourcing · human computation · STEM accessibility · visual impairment · data visualization · blind

Standards referenced: NCAM Guidelines for Describing STEM Images