Text Locating in Scene Images for Reading and Navigation Aids for Visually Impaired Persons

Chucai Yi · 2010 · Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2010) · doi:10.1145/1878803.1878894

Summary

This paper presents a computer vision algorithm for locating text in natural scene images — a key capability needed for camera-based reading assistants and navigation aids for people who are blind or visually impaired. While existing reading assistants could handle text in scanned documents (where character size and layout are predictable), locating text in real-world scenes (street signs, product labels, vending machine instructions, book spines) remained an unsolved challenge due to complex backgrounds, varying fonts, colours, sizes, and orientations. The proposed algorithm combines two established features of text: colour uniformity (text characters tend to share similar colours) and high edge density (character strokes create distinctive boundaries). The method works in four stages: (1) edge detection followed by repainting edge pixels with a non-dominant colour to enhance character boundaries; (2) colour reduction using histogram analysis and K-means clustering to segment the image into colour layers; (3) boundary labelling and noise filtering in each colour layer, with a "degree" metric measuring how many colour layers contain a boundary at the same position (text characters typically appear in at least three layers); and (4) text line fitting, where high-degree boundary centroids that fall in approximate linear alignment are grouped into text strings, with false positives removed using variance measures of boundary area, bounding box height, and inter-centroid distance.

Key findings

Evaluated on the ICDAR 2003/2005 Robust Reading Dataset (420 images) plus an 80-image supplemental dataset of non-horizontal text, the algorithm achieved 0.56 precision and 0.65 recall (f-measure 0.51), placing it competitively among state-of-the-art methods from the ICDAR text locating competitions. A key advantage over prior methods is the ability to locate text strings at arbitrary orientations, not just horizontal text — important for real-world environments where signs and labels appear at various angles. The algorithm operates on text strings rather than individual characters, exploiting the structural property that characters in a string share approximate linear alignment. The degree-based filtering (requiring boundary centroids to appear across multiple colour layers) provides an effective mechanism for distinguishing text characters from background noise. Limitations include the assumption that text strings contain at least three characters with relatively uniform colour, and the algorithm addresses only text localisation, not recognition — the detected regions would need to be passed to an OCR system for actual reading.

Relevance

Scene text localisation is a foundational component for camera-based assistive technology that helps blind and visually impaired people read environmental text — from street signs and shop names to product labels and medicine bottles. This 2010 work represents an early contribution to a problem that has since been dramatically advanced by deep learning approaches, but the core accessibility motivation remains unchanged: people who are blind need automated systems to find and read the text that permeates physical environments. For accessibility practitioners, the paper illustrates that reading assistive technology requires not just OCR (which works on clean document images) but scene text detection that handles the messiness of the real world. The arbitrary-orientation capability is particularly important for practical use, as real-world text frequently appears at angles. The work also highlights the pipeline nature of such systems: localisation must precede recognition, and both must precede the human interface challenge of conveying the detected text to the user (e.g., through speech).

Tags: computer vision · text recognition · blindness · navigation · reading assistant · scene text · optical character recognition · assistive technology