Improving the Accessibility of Mobile OCR Apps Via Interactive Modalities

Michael Cutter, Roberto Manduchi · 2017 · ACM Transactions on Accessible Computing · doi:10.1145/3075300

Summary

This paper addresses a critical usability challenge in mobile OCR for blind users: while OCR technology itself works well, it requires properly framed images at adequate resolution—something difficult to achieve without visual feedback. Blind users commonly hold the camera too close (cropping text), too far (insufficient resolution), or at wrong angles, resulting in failed recognition attempts. The authors developed an iOS app testing three interaction modalities: (1) Manual mode where users decide when to take a snapshot based on their spatial awareness alone; (2) Autoshot mode where the system continuously analyzes camera images and automatically triggers a high-resolution capture when a "compliant" pose is detected (text visible, properly framed, adequate resolution); and (3) Guidance mode which adds spoken directional instructions ("Move up 5", "Left 3", "Raise") to help users reach a compliant camera position. Two studies were conducted with different compliance detection approaches. Study 1 used augmented reality fiducials printed on documents to precisely track camera pose, enabling detailed trajectory analysis. Study 2 used actual printed text with a custom text spotting algorithm that detected text lines and white padding to determine compliance.

Key findings

Study 1 (12 blind participants) found no significant difference in time to completion between guidance and autoshot modes, contrary to the hypothesis that explicit guidance would be faster. However, Study 2 (9 blind participants) with a simplified interface did show guidance was significantly faster—33% reduction compared to autoshot (median 10s vs 17s). The most striking finding from both studies was an unexpected training effect: participants who experienced the interactive modalities became significantly better at taking OCR-readable pictures in manual mode without any system feedback. Readability improved from 71% to 86% (Study 1) and from 64% to 94% (Study 2) between the first and second manual sessions. This suggests that experiencing guided feedback helps blind users calibrate their proprioceptive sense of correct camera positioning. Analysis of failure cases revealed that 84% of non-compliant snapshots resulted from holding the camera too close to the document; most could have been fixed by simple repositioning (59%) or reorientation alone (16%). Font size (10pt vs 16pt) did not significantly affect performance, as both require similar minimum distances to maintain adequate resolution.

Relevance

This research offers practical guidance for mobile OCR app developers. The training effect is particularly valuable: even temporary use of guided feedback can permanently improve a blind user's ability to capture good images manually. For guidance interfaces, the study found that simpler, shorter instructions outperformed detailed metric directions—participants found centimeter-based guidance unintuitive and processing lengthy instructions slowed their exploration. Continuous feedback was preferred over having to request information (as in KNFB Reader's "Field of view report"). Three participants suggested that users should be able to modulate the amount of guidance based on their needs. The research also highlights that the compliance detection algorithm itself affects user experience: overly conservative systems that only trigger snapshots in a small subset of acceptable poses may paradoxically slow users down, since with a larger compliance region, users might reach an acceptable pose faster by chance. For organizations, the main practical takeaway is that adding real-time camera guidance to OCR apps can both speed up document capture and serve as implicit training for users' spatial skills.

Tags: OCR · mobile accessibility · blindness · camera guidance · computer vision · text recognition