"It's trained by non-disabled people": Evaluating How Image Quality Affects Product Captioning with Vision-Language Models
Kapil Garg, Xinru Tang, Jimin Heo, Dwayne R. Morgan, Darren Gergle, Erik B. Sudderth, Anne Marie Piper · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26)
Garg and colleagues investigate how well Vision-Language Models (VLMs) caption product images taken by blind and low-vision (BLV) people — a high-stakes everyday task that increasingly depends on tools like Be My AI, Microsoft Seeing AI, and general-purpose assistants such as…
blind and low vision · vision-language models · image captioning · product identification · hallucinations