Visual Hallucination

Also known as: AI hallucination, MLLM hallucination

In the context of multimodal AI systems, visual hallucination refers to the generation of descriptions or responses that contain information not grounded in the actual visual input—fabricating non-existent objects, misattributing properties such as colour or size, or misinterpreting spatial relationships between elements in an image. Hallucinations arise because multimodal large language models (MLLMs) rely on statistical language priors learned during training, which can override actual visual evidence particularly when input images are degraded or unusual. Visual hallucination is a critical accessibility concern for users who are blind or have low vision (BLV), since they cannot independently verify AI-generated descriptions and may act on incorrect information affecting their safety and decision-making. Researchers are actively developing hallucination mitigation techniques including contrastive decoding, improved visual grounding, and retrieval-augmented generation.

Category: multimodal AI · visual impairment · assistive technology · AI safety

Related: Multimodal AI · Image Captioning · Blindness and Low Vision · Contrastive Decoding · VizWiz

Sources