Visual Interpretation

Also known as: AI Visual Interpretation, Visual Interpretation Application

The task of translating visual content — a scene, object, document, interface, or image — into a form accessible to a blind or low-vision user, typically spoken or written description, answers to questions, or actionable guidance. Visual interpretation systems may be human-powered (remote sighted assistance via apps like Aira or Be My Eyes), AI-powered (Seeing AI, Be My AI, Envision, MLLM-based chat), or hybrid. Modern MLLM-enabled visual interpretation applications differ from earlier captioning tools by supporting follow-up, goal-directed conversation rather than one-shot descriptions, but come with new failure modes including hallucination and inconsistent abstention on sensitive content.

Category: AI accessibility · Assistive Technology · Blindness and Low Vision

Related: Visual Question Answering · Multimodal Large Language Model · Be My AI · Seeing AI · Aira

Sources

https://doi.org/10.1145/3772318.3793266