Understanding How Blind Users Handle Object Recognition Errors: Strategies and Challenges

Jonggi Hong, Hernisa Kacorri · 2024 · ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3663548.3675635

Summary

This paper investigates how blind and low-vision users interact with object recognition systems, specifically focusing on how they identify and handle recognition errors. While object recognition technologies powered by computer vision and machine learning have enormous potential to help blind users identify objects in their environment, a significant gap exists between benchmark performance on curated datasets and real-world usability. Recognition errors — misidentifications, missed objects, and false positives — are inevitable, yet blind users face a unique challenge: they cannot visually verify whether the system's output is correct. The study used URCam, a pre-existing camera-based object recognition system fine-tuned for the experiment, and involved 12 blind and low-vision participants in two parts. First, in-depth interviews explored participants' everyday experiences with camera-based assistive technologies (including Seeing AI, Be My Eyes, and general-purpose AI assistants), their strategies for handling errors, and their attitudes toward misrecognitions. Second, a hands-on error identification task required participants to use URCam to recognise a set of household objects, some of which the system was designed to misrecognise, and attempt to identify which recognitions were incorrect. Participants performed the task twice to assess whether experience improved error detection. The study examined both the strategies participants employed and the success rate of their error identification efforts.

Key findings

During interviews, participants reported strong apprehension toward misrecognitions, particularly in high-stakes situations like food identification (allergens, expiration dates) and medication identification. Most participants preferred to independently review and verify recognition results rather than relying on sighted assistance, reflecting a desire for autonomy — but acknowledged that independent verification is inherently limited without visual access. In the hands-on error identification task, participants employed several strategies to test and challenge recognition results: varying the camera viewpoint (photographing objects from multiple angles), changing backgrounds (moving objects to contrasting surfaces), adjusting object distance (zooming in and out), and using contextual knowledge (questioning results that seemed implausible based on the object's feel, weight, or location). Despite these creative strategies, participants identified only approximately half of the errors on average, and critically, the proportion of errors identified did not significantly improve on their second attempt — suggesting that simply repeating the recognition process is insufficient for reliable error detection. Participants had particular difficulty detecting misrecognitions where the incorrect label was plausible (e.g., a similar-looking product from the same category), while implausible errors (e.g., a bottle labelled as a shoe) were easier to catch through common sense. Participants expressed desire for confidence scores, alternative suggestions ("this might also be..."), and the ability to ask follow-up questions about specific object features to help disambiguate uncertain recognitions.

Relevance

This research addresses a critical gap in assistive AI design: most object recognition research focuses on improving accuracy, but even highly accurate systems will produce errors, and blind users need tools and strategies to handle those errors safely. The finding that participants could only identify half of errors — with no improvement on retry — is a sobering reality check for the field, demonstrating that simply providing recognition output without error-handling support is insufficient. For accessibility practitioners and AI developers, the implications are clear: object recognition systems for blind users must include uncertainty communication (confidence scores, alternative labels), contextual verification support (describing visual features that would help users cross-reference with touch or prior knowledge), and appropriate escalation paths for high-stakes recognitions. The strategies participants naturally developed — varying viewpoints, backgrounds, and distances — suggest that future systems could proactively guide users through these verification steps rather than leaving them to improvise. The connection to the broader "misfitting with AI" theme (Alharbi et al., also at ASSETS '24) is evident: blind users are actively developing sophisticated error-handling expertise that AI systems should support rather than ignore.

Tags: blind users · object recognition · AI errors · computer vision · camera-based assistive technology · error handling · trust · machine learning