VQA

Also known as: Visual Question Answering

VQA (Visual Question Answering) is an AI task in which a system answers natural-language questions about the content of an image. In assistive contexts, VQA systems such as Be My AI, Seeing AI, and Aira let blind and low-vision users ask about their visual surroundings - from reading labels to locating objects - and receive spoken or text responses. Modern VQA is built on vision-language models and large language models, enabling multi-turn conversations, but current systems still struggle with spatial reasoning, distance estimation, and inaccessible image framing.

Category: AI and accessibility · Assistive Technology · Computer Vision · Blindness and Low Vision

Related: Visual Question Answering · Be My Eyes · Seeing AI · Aira · Large Language Model · Generative AI · Screen Reader

Sources

https://doi.org/10.1145/3772318.3791834
https://www.bemyeyes.com/