Visual Challenges in the Everyday Lives of Blind People

Erin Brady, Meredith Ringel Morris, Yu Zhong, Samuel White, Jeffrey P. Bigham · 2013 · CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems · doi:10.1145/2470654.2481291

Summary

This paper presents the findings of a year-long large-scale study of VizWiz Social, an iPhone application that allows blind users to take a photograph, record a spoken question about it, and receive answers from crowd workers or social contacts within about a minute. Between May 2011 and May 2012, 5,329 blind users asked 40,748 questions through the application. The researchers analyzed a random sample of 1,000 questions using a detailed taxonomy developed through affinity diagramming by four researchers. Questions were classified into four major categories: Identification (asking what an object is — 41% of questions), Description (asking about visual or physical properties — 24%), Reading (asking for text to be transcribed — 17%), and Other (unanswerable or meta-questions — 17%). Each category contains multiple subcategories; for instance, Identification includes no-context queries ("What is this?"), contextual queries ("Is this diet or regular Pepsi?"), medicine identification, currency, and media. The study also examined the primary photographic subjects (76% were objects, with food/drink being the largest subcategory at 28%), perceived urgency (68% needed answers within 10 minutes), subjectivity (96% of questions were objective in nature), and photograph quality (average score 3.41 out of 5, with errors in blur, lighting, framing, and composition). Additionally, the researchers tracked user behavior patterns across 100 randomly selected users and the 25 most active "power users" to understand adoption and usage evolution over time.

Key findings

The study revealed that blind users' visual information needs are overwhelmingly practical and objective — 96% of questions sought factual information rather than subjective opinions, contrasting sharply with sighted users' Q&A behavior which tends toward subjective queries. The most common need was simple object identification (41%), often without any context ("What is this?"), suggesting that basic object recognition technology could address the largest single category of questions. Reading requests (17%) ranged from mail and digital displays to cooking instructions and bathroom products, highlighting the prevalence of inaccessible physical interfaces in everyday life. Urgency analysis showed 68% of questions needed answers within 10 minutes, with 10% needing answers within a minute, underscoring the importance of fast response times for these services. User retention analysis revealed that 55% of first-time users returned after day one, and multi-day users averaged 15.19 questions over 120.45 days. Critically, users who had poor first experiences — either due to audio recording errors or low-quality crowd answers — had significantly higher abandonment rates (67% of those with errors on first question did not return, vs. 45% overall, chi-squared p = 0.02). Power users showed an evolution in usage patterns: early questions were predominantly Identification (73%), but over time shifted toward Reading (46% of recent questions), suggesting users discovered new practical applications as they became more comfortable with the system. Photo quality also improved slightly over time for power users, with a trend-level effect (F(1,216) = 3.15, p < 0.10).

Relevance

This paper provides one of the most comprehensive empirical accounts of the visual information challenges blind people face daily, making it an essential reference for anyone designing accessible technology or services. The taxonomy of visual questions serves as a practical roadmap for prioritizing which accessibility challenges to address: object identification, text reading, and visual description cover the vast majority of needs. For accessibility practitioners, several findings have direct implications. The dominance of objective questions suggests that computer vision and OCR could eventually automate answers to most queries, which has since been validated by tools like Be My Eyes, Seeing AI, and modern multimodal AI. The prevalence of food, cooking, and bathroom product questions highlights that product labeling and packaging remain major accessibility barriers. The finding that inaccessible digital displays (thermostats, ovens, alarm clocks) drive many questions underscores the importance of accessible hardware design and smart home integration. The user retention data demonstrates that first impressions critically determine whether blind users adopt assistive technology — a lesson applicable to any accessibility service or tool launch. This paper effectively maps the gap between what blind people need to know about their visual environment and what technology can currently provide.

Tags: blind users · crowdsourcing · mobile accessibility · VizWiz · visual question answering · assistive technology · human-powered access technology · information needs