Visual Dialogue

Also known as: Visual Dialog, VisDial

Visual dialogue is an AI task that involves holding a multi-turn natural language conversation about visual content such as an image or video frame. Unlike single-turn visual question answering (VQA), visual dialogue systems maintain context across multiple exchanges, using dialogue history to provide coherent and consistent responses. This capability is important for accessibility because it allows blind and low vision users to iteratively explore visual content, asking follow-up questions to build a progressively detailed understanding of what is shown, rather than relying on a single static description.

Category: Artificial Intelligence · computer vision · natural language processing · visual accessibility

Related: Visual Question Answering · Image Captioning · Audio Description

Sources

https://visualdialog.org/
https://doi.org/10.1145/3597638.3608402