Understanding Personalized Accessibility through Teachable AI: Designing and Evaluating Find My Things for People who are Blind or Low Vision

Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, Linda Wen, Edward Cutrell · 2023 · Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '23) · doi:10.1145/3597638.3608395

Summary

This paper from Microsoft Research presents Find My Things, one of the first fully realized end-to-end teachable AI applications for accessibility. The app allows people who are blind or low vision to teach their phone to recognize personal objects — keys, earbuds, lip balm, guide canes, braille styluses — and then locate them in 3D space using audio, haptic, and visual guidance. The system uses few-shot learning based on Prototypical Networks with an EfficientNetB0 feature extractor trained on the ORBIT dataset (videos collected by blind and low vision users). Users teach the app by recording four short videos of an object from different angles and backgrounds; personalization happens on-device in 3 seconds on an iPhone 12 Pro. The finding experience uses a localization pipeline that identifies objects up to 4 meters away with 100-200ms inference time, guiding users with increasing-pitch beeps and vibration as they approach. The design process was deeply collaborative, involving a citizen design team of eight blind and low vision young people (ages 14-25, recruited through the VICTA charity in the UK) who participated as co-creators over four months across three workshop sessions. The research team also worked iteratively through approximately 790 experiments on the ORBIT dataset to select the final AI model, prioritizing metrics meaningful to the user experience — not just accuracy, but personalization speed and inference time.

Key findings

The user evaluation with 15 blind and low vision participants (recruited through CNIB) demonstrated strong real-world viability. Across 86 valid runs, the app achieved an 83% success rate, with users reporting that the app helped or somewhat helped them locate objects in 71 of those attempts. When the object was in view and within 4 meters, the localization algorithm correctly identified it 72.4% of the time. Twelve of 15 participants reported feeling more confident finding things. Users taught objects in an average of 2.4 minutes initially, dropping to about 1 minute after 3-4 objects. The paper produced three key design principles for teachable AI accessibility: (1) understand the quantity and quality of examples needed — counterintuitively, training with "imperfect" images containing real-world issues like blur and partial framing produced more robust models (96.9% accuracy at 60% imperfect frames) than training with all-perfect images (90.5%); (2) support users in providing examples through structured guidance that reduces cognitive load, using video rather than photos and building on existing embodied strategies like hand-to-hand referencing; (3) shorten the feedback loop to enable rapid testing and re-teaching, which proved more effective than showing users uncertainty metrics or model confidence scores. More training examples beyond 10-20 frames per video actually decreased performance by reducing the signal-to-noise ratio in the embedding space.

Relevance

This paper is a landmark demonstration that teachable AI can practically address the long-tail distribution of accessibility needs — the enormous variety of personal contexts, objects, and environments that no pre-trained model can anticipate. For accessibility practitioners and organizations, it offers a compelling alternative to the one-size-fits-all approach of general object recognition: rather than trying to build ever-larger models that recognize everything, give users the tools to personalize AI for what matters to them. The design principles around the teaching loop are broadly applicable to any teachable AI system for accessibility. The citizen design team approach — treating blind young people as co-creators rather than test subjects, with apprenticeship into the design process — is a model for inclusive technology development. The finding that participants valued independence, autonomy, and discretion (avoiding having to touch the floor to find things in public) reveals how deeply personal the benefits of such tools can be.

Tags: teachable AI · object recognition · blind and low vision · personalization · few-shot learning · computer vision · mobile accessibility · participatory design