Automated Description Generation for Indoor Floor Maps

Devi A. Paladugu, Hima Bindu Maguluri, Qiongjie Tian, Baoxin Li · 2012 · Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2012) · doi:10.1145/2384916.2384958

Summary

This demonstration paper presents a prototype system that automatically generates verbal descriptions of indoor floor maps to support navigation for people with visual impairments. The system addresses a practical gap: while tools like canes, guide dogs, and GPS devices help with outdoor mobility, they offer little support for navigating unfamiliar indoor environments such as school buildings, hotels, or public libraries. The system works through a pipeline of automated steps. A user provides the name of a building or establishment, and a web crawler searches for and downloads the corresponding floor plan from the building's website. The system then applies image processing algorithms to identify key landmarks within the map — such as rooms, libraries, entrances, and other points of interest — using text detection, recognition, and spatial analysis. Finally, it generates a verbal description that conveys the relative positions of these landmarks, giving the user an overview of the building layout before they arrive. The current prototype was focused on public libraries as a proof of concept, constraining the domain to make landmark detection more reliable.

Key findings

The system demonstrates a complete end-to-end pipeline from building name input to verbal spatial description output, tackling three distinct technical challenges: automated floor plan retrieval from the web, image-based landmark extraction, and natural language description generation using relative spatial positions. The web crawler uses Google search to find building websites and then parses URLs to locate floor plan images or PDFs. Key landmark localization employs image processing techniques including text detection and recognition to identify labeled rooms and features on the map. The verbal descriptions use a relative positioning scheme (describing landmarks in relation to each other) rather than absolute coordinates, eliminating the need for real-time tracking devices. Preliminary user surveys and experiments indicated the approach was promising, with visually impaired participants finding the spatial descriptions useful for building a mental model of unfamiliar indoor spaces before visiting them.

Relevance

This paper addresses an important and still-underserved area of accessibility: helping people with visual impairments navigate unfamiliar indoor environments. While outdoor navigation has improved dramatically through GPS and mapping apps, indoor spaces remain challenging. The concept of generating verbal descriptions from visual floor plans is a form of automated image description — a technique that has since advanced significantly with deep learning and computer vision. For accessibility practitioners, the work highlights that making floor plans available on websites is only the first step; those visual representations also need to be made accessible through alternative formats such as text descriptions or tactile maps. The paper's approach of using relative spatial descriptions is particularly relevant to current work on accessible wayfinding and spatial orientation for blind users.

Tags: visual impairment · indoor navigation · wayfinding · image processing · floor plans · verbal description · web crawling · landmark detection