Visual Grounding

Also known as: Grounded Visual Understanding

The ability of an AI model to connect its language output to specific elements actually present in the visual input, ensuring that descriptions and responses are anchored to real objects and scenes rather than generated from learned patterns or assumptions. Poor visual grounding leads to hallucinations where the model describes objects or spatial relationships that do not exist in the image. Improving visual grounding is essential for making AI-based assistive tools reliable for blind users who depend on accurate environmental descriptions.

Category: artificial intelligence · computer vision

Related: AI Hallucination · Voice and Video-Capable Language Model · Visual Question Answering

Sources

https://doi.org/10.1145/3663547.3749833