Image Recognition Tools for Blind and Visually Impaired Users: An Emphasis on the Design Considerations
Sandra Fernando, Chiemela Ndukwe, Bal Virdee, Ramzi Djemai · 2025 · ACM Transactions on Accessible Computing · doi:10.1145/3702208
Summary
This research examines the current landscape of image recognition tools (IRT) designed for blind and visually impaired users, evaluating their capabilities against user needs and ISO ergonomic design standards. The authors conducted both a comprehensive review of 21 existing tools and a primary user study with 10 BLV participants to identify design deficiencies and establish requirements for improved technologies. The paper catalogs features across major IRT platforms including Microsoft's Seeing AI, Be My Eyes (with GPT-4V integration), Google Vision AI, Amazon Rekognition, TapTapSee, Envision, Sullivan+, and VizWiz. Features analyzed include voice output, text reading (OCR), object detection, currency recognition, environment description, barcode identification, color detection, face analysis, handwriting recognition, and voice commands. The analysis reveals that while voice output and text reading are nearly universal, more advanced features like environment description and face analysis remain less common despite their potential value. The study applies ISO 9241-110 principles for ergonomic human-system interaction—suitability for task, self-descriptiveness, controllability, conformity with user expectations, error tolerance, suitability for individualization, and suitability for learning—as a framework for evaluating these tools. The authors also review the underlying AI technologies (CNNs, ResNet-50, YOLO, LSTM, MobileNet-SSD) and their trade-offs between accuracy, speed, and mobile device constraints.
Key findings
The user study with 10 BLV participants (ages 32-74, all with tertiary education) revealed several important patterns: Usage and Trust: Seeing AI was the most popular tool (50% of participants), followed by Be My Eyes (30%), TapTapSee (20%), and JAWS (20%). 70% of participants expressed trust in their image recognition tools, while 10% reported partial trust and 20% expressed distrust. Most Valued Features: Participants rated voice results and voice commands highest (average ~10/12), followed by handwriting recognition, face analysis, environment description, and currency recognition (all ~9/12). Object detection, color detection, and light detection rated somewhat lower (6-8/12). User Concerns: Common complaints included lack of mobile device integration (20%), requirement to correctly orient the camera without visual feedback (10%), insufficient description detail (20%), privacy/confidentiality concerns (20%), absence of facial analysis (10%), inaccuracy (10%), and battery consumption (10%). Desired Improvements: 70% of participants wanted more detailed descriptions. Others requested improved accuracy, onomatopoeia (sound effects), and customizable description levels. All participants found voice feedback satisfactory and would recommend their tools to other blind users.
Relevance
The research proposes five key design characteristics for optimal image recognition tools: (1) Inherent Suitability—tools should be naturally designed for BLV users without requiring significant customization; (2) Intuitive User Interface—self-descriptive interfaces that accommodate the activity sequence of blind users; (3) Customization—allowing individual preferences given the variability in visual impairments; (4) Error Minimization and Correction—tolerance for errors with easy rectification mechanisms; (5) Continuous Learning—AI that refines responses based on user interactions over time. For practitioners and developers, this study highlights the gap between current tools and user needs. While basic features like text reading are mature, environment description and contextual understanding remain areas for improvement. The emphasis on ISO 9241-110 provides a concrete framework for evaluating accessibility beyond basic functionality. The multimodal approach—combining visual, textual, and auditory outputs—is recommended to accommodate diverse cognitive processing styles and preferences. Privacy concerns about camera and microphone access warrant attention in design. The study's limitation is its small sample size (n=10), but it provides valuable qualitative insights that complement larger-scale quantitative research.
Tags: image recognition · computer vision · blind and low vision · assistive technology · AI · deep learning · mobile apps · design guidelines
Standards referenced: ISO 9241-110 · ISO 9241-171 · ISO 9999:2016