Visual Assistant

Also known as: Visual Assistant Skill

Proposed by Gonzalez Penuela et al. (2026), the 'visual assistant' skill is a set of behaviours that an AI visual interpretation system must exhibit — beyond simply producing accurate captions — to meaningfully support blind and low-vision users in daily life. The nine proposed behaviours are: neutral factual communication, adaptive communication protocols, goal-oriented collaboration, content quality guidance, comprehensive information provision, contextual self-awareness, privacy protection, transparent uncertainty handling, and graceful hand-off. The concept distinguishes 'captioning skill' (describing what is in a picture) from the broader competencies needed when a BLV user is actively trying to accomplish a task with a photo — such as cooking, navigating, verifying medication, or reading correspondence.

Category: AI accessibility · Assistive Technology · Design Principles

Related: Visual Interpretation · Multimodal Large Language Model · Visual Question Answering

Sources

https://doi.org/10.1145/3772318.3793266