← All reviews

People with Visual Impairment Training Personal Object Recognizers: Feasibility and Challenges

Hernisa Kacorri, Kris M. Kitani, Jeffrey P. Bigham, Chieko Asakawa · 2017 · CHI Conference on Human Factors in Computing Systems · doi:10.1145/3025453.3025899

Summary

This paper explores whether people with visual impairments can train their own personalized object recognition systems using a smartphone camera and a small number of example photos. The authors address a fundamental limitation of existing object recognition tools for blind users: generic models are trained on images taken by sighted people under different conditions than those captured by blind photographers, leading to poor real-world performance. Their approach uses transfer learning with Google's Inception-v3 deep neural network, replacing the final classification layer to allow users to define their own object classes with custom labels and just a handful of training images. The research was conducted in two phases. In Phase 1, eight blind participants photographed up to 50 objects of their choosing at home over a week, providing insights into what objects blind people want to recognize (primarily food, drinks, and hygiene products) and how they naturally take training photos. In Phase 2, participants trained and tested recognizers on 15 food and drink items in a controlled lab setting. The researchers also analyzed the VizWiz dataset of 33,543 questions from blind users to understand the types of objects and fine-grained distinctions (brand, flavor, color) that matter most to this population.

Key findings

Personal object recognizers trained by blind participants achieved accuracies ranging from 50.7% to 92% (mean 75.95%) on a 15-way classification task, compared to 7% for random chance. Some blind participants matched the performance of a sighted person unfamiliar with the underlying technology (96.9%). The study revealed distinct and consistent photo-taking strategies among blind participants, including holding objects with one hand while photographing with the other, and introducing variation through rotation and distance changes. Key factors affecting performance included photo quality variance, the presence of the user's hand in training images, exaggerated or non-discriminative viewpoints, and object shape similarity. Models trained and tested on photos from the same blind person outperformed models trained on photos by sighted people when using 20 training samples (for 6 of 8 participants). Data augmentation (random cropping, scaling, brightness adjustment) boosted accuracy by up to 10.7%. With as few as 5 training images per object, reasonable accuracy was achievable. Participants overwhelmingly agreed (88% strongly agree, 12% agree) that training a personal object recognizer is feasible.

Relevance

This research makes a compelling case for personalization as a key strategy in assistive computer vision. Rather than trying to build one universal object recognizer, letting individuals train systems on their own objects under their own conditions produces better results and more useful labels. For accessibility practitioners and developers of assistive apps, the key takeaway is that transfer learning with deep neural networks makes it practical for blind users to create effective custom classifiers with minimal training data — as few as 5-20 photos per object. The study also provides valuable design guidance: apps should provide feedback on photo quality, detect background clutter, guide camera positioning, and account for the common behavior of holding objects during photography. The finding that blind users want fine-grained labels (specific brand, flavor, variety) rather than generic categories highlights a gap that generic recognition systems cannot fill. This work laid groundwork for subsequent accessible object recognition research and influenced the design of teachable object recognition interfaces.

Tags: object recognition · computer vision · blindness · transfer learning · personalization · machine learning · assistive technology · mobile accessibility