Smartphone Application for Indoor Scene Localization

Nabeel Younus Khan, Brendan McCane · 2012 · Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2012) · doi:10.1145/2384916.2384953

Summary

This demonstration paper presents a smartphone-based assistive technology that uses computer vision to determine a blind user's location inside a building where GPS is ineffective. The system operates on a client-server model running on an Android HTC Wildfire S phone. It consists of two modules: a mapping phase where a large number of images covering all locations in a building are captured and stored on the server with corresponding location labels, and a localization phase where the user takes a photo from their current position and sends it to the server for matching. The server uses a Visual Bag of Words (BoW) approach based on SIFT (Scale-Invariant Feature Transform) features for image matching. Generated features from stored images are clustered using approximate k-means to create visual word frequencies for each image, stored in an inverted index for fast retrieval. When a query image arrives, the server retrieves the 100 most similar images, ranks them, and determines the location — either by consensus if the top three images all indicate the same location, or through geometric verification using fundamental matrix estimation if at least 20% of features match between the query and a top-ranked database image. The location is communicated to the user via voice message.

Key findings

The application achieves localization in 2-4 seconds on average and was evaluated on images taken from office buildings. The system can work in any building for which a suitable image database has been collected, making it a generalizable approach. The interface is designed for accessibility: blind users can launch it via a voice-powered virtual assistant, the application provides voice messages for all output, and the phone can be held in either landscape or portrait orientation. The authors note that a voice-activated button would further improve the interface. The two-stage matching approach (fast BoW retrieval followed by geometric verification) balances speed and accuracy. The vision-based approach has advantages over infrastructure-dependent solutions (like Bluetooth beacons or Wi-Fi fingerprinting) in that it only requires a pre-collected image database and no hardware installation in the building. However, the system requires wireless internet connectivity to communicate with the server, and the quality of localization depends on the comprehensiveness of the building's image database.

Relevance

This paper represents an early example of using smartphone cameras and computer vision for indoor localization to assist blind users — an approach that has since become much more refined with advances in deep learning and mobile computing power. For accessibility practitioners, it demonstrates the potential of leveraging existing consumer devices (smartphones) and infrastructure (building imagery) rather than requiring specialized hardware or environmental modifications. The vision-based approach complements other indoor navigation methods and addresses the critical gap between outdoor GPS navigation and indoor wayfinding. While modern solutions (like Apple's Indoor Maps or Google's Visual Positioning System) have advanced considerably, the fundamental concept of image-based indoor localization for blind navigation remains highly relevant. The client-server architecture, which was necessary in 2012 due to limited mobile processing power, would likely be implementable on-device with modern smartphones.

Tags: visual impairment · blind navigation · indoor localization · computer vision · smartphone · image matching · SIFT features · client-server · assistive technology · Android