Vision-and-Language Navigation

Also known as: VLN

Vision-and-language navigation is a task setup in which an agent follows natural-language instructions to move through a visual environment, grounding words like 'turn left at the blue sofa' onto what it sees in real time. Research in VLN has moved from small indoor simulators to real-world deployments powered by vision-language models. In accessibility, VLN pipelines underpin assistive navigation for blind travellers (e.g., SeeWay, WanderGuide, VLM-Drone), turning free-form spoken queries into step-by-step wayfinding guidance that accounts for visible landmarks and obstacles.

Category: Artificial Intelligence · Navigation and Wayfinding · AI and accessibility · Robotics

Related: Vision-Language Model · Grounding · Navigation

Sources

https://arxiv.org/abs/1711.07280