Reviews

The literature-review database. Every paper Bob has reviewed (he has read many more), with a short summary, key findings, and tags. Browse, filter, search.

Search results

Multi-Perspective Visual Contrastive Decoding for Reliable Assistance
Bocheng Pan, Hailong Shi, Xingyu Gao · 2026 · ACM Transactions on Internet of Things
This technical paper presents MPVCD (Multi-Perspective Visual Contrastive Decoding), a framework designed to address the reliability of AI-generated visual descriptions for people who are blind or have low vision (BLV). The core problem it tackles: when BLV users photograph…
blindness and low vision · multimodal AI · image captioning · visual hallucination · assistive technology
Expanding Perspectives to Improve Access to Visual Archives through Multimodal Image Enrichment
Karina Rodriguez Echavarria, Myrsini Samaroudi · 2026 · ACM Journal on Computing and Cultural Heritage
This paper addresses a pervasive challenge in the Galleries, Libraries, Archives and Museums (GLAM) sector: large-scale visual collections that have been digitised but remain undiscoverable because they lack descriptive metadata. The authors, from the University of Brighton,…
cultural heritage · metadata enrichment · AI image classification · FAIR principles · information discovery
SceneScout: Towards AI-Driven Access to Street Level Imagery for Blind Users
Gaurav Jain, Leah Findlater, Cole Gleason · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26)
Jain, Findlater and Gleason present SceneScout, a prototype web interface that uses a multimodal large language model (GPT-4o) to make street level imagery — the panoramic pedestrian-height photography behind Apple Maps Look Around and Google Street View — directly usable by…
accessibility · navigation · screen readers · AI · multimodal AI
From Struggle to Success: Context-Aware Guidance for Screen Reader Users in Computer Use
Nan Chen, Jing Lu, Zilong Wang, Luna K. Qiu, Siming Chen, Yuqing Yang · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26)
Chen, Lu, Wang, Qiu, Chen and Yang present AskEase, an NVDA add-on that delivers on-demand, step-by-step, screen-reader-friendly guidance for blind and low-vision computer users tackling unfamiliar desktop software. The work responds to a persistent problem: mainstream tutorials…
accessibility · screen readers · AI · LLM · assistive technology
Mnemonic Tracing: Using Eye Gaze to Search for Visual Memories
Wazeer Zulfikar, Yasith Samaradivakara, Paul Pu Liang, Pattie Maes · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26)
Mnemonic Tracing is a non-verbal image-retrieval interaction in which a user, wearing eye-tracking glasses, deliberately retraces the contents of a remembered image with their gaze on a blank surface. The paper builds on gaze-reinstatement research, which shows that when people…
eye tracking · gaze interaction · gaze reinstatement · episodic memory · image retrieval
Check Now, Can You See It?: Exploring Voice and Video-Capable Language Models for Identifying and Spatially Locating Items of Interest for Blind and Low-Vision Travelers
Aziz N Zeidieh, JooYoung Seo · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility
This experience report documents the lived experiences of two blind travelers — Aziz (28, blind in left eye, 20/2200 in right) and JooYoung (35, blind in right eye, limited vision in left) — as they adapted commercially available voice and video-capable language models (VVLMs)…
artificial intelligence · navigation · blindness and visual impairment · multimodal AI · large language models
Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions
Meng Chen, Akhil Iyer, Amy Pavel · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility
This paper addresses a critical safety problem in AI-powered visual access technology: multimodal large language models (MLLMs) like GPT-4o, Gemini, and Claude produce fluent, confident image descriptions that can contain fabricated content, misinterpretations, and omissions…
blindness · low vision · image descriptions · multimodal AI · large language models
Temp access: Reflecting on multimodal GAI as an accessibility technology for temporary disability
Kate S. Glazko · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility
This paper presents an autoethnographic account of using multimodal generative AI (GAI) tools as accessibility technology during a period of temporary disability. The author, an accessibility researcher, experienced an illness that simultaneously impacted verbal communication,…
generative AI · temporary disability · assistive technology · autoethnography · multimodal AI
DescribePro: Collaborative Audio Description with Human-AI Interaction
Maryam S Cheema, Sina Elahimanesh, Samuel Martin, Pooyan Fazli, Hasti Seifi · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility
This paper presents DescribePro, a web-based platform that combines human expertise with AI capabilities to create and refine audio descriptions (AD) for video content. The system addresses the fundamental tension in AD production: human-crafted descriptions are high quality but…
audio description · video accessibility · human-AI collaboration · authoring tools · blind and low vision
AccessMenu: Enhancing Usability of Online Restaurant Menus for Screen Reader Users
Nithiya Venkatraman, Akshay Kolgar Nayak, Suyog Dahal, Yash Prakash, Hae-Na Lee, Vikas Ashok · 2025 · Proceedings of the 22nd International Web for All Conference (W4A)
This paper addresses the significant accessibility barriers that blind and visually impaired (BVI) screen reader users face when trying to access online restaurant menus, which are typically presented as images or PDFs. The research proceeds in two phases. First, an interview…
screen readers · blind users · visual document understanding · LLM accessibility · multimodal AI
Making Accessible Movies Easily: An Intelligent Tool for Authoring and Integrating Audio Descriptions to Movies
Ming Shen, Gang Huang, Yuxuan Wu, Shuyi Song, Sheng Zhou, Liangcheng Li, Zhi Yu, Wei Wang, Jiajun Bu · 2024 · Proceedings of the 21st International Web for All Conference (W4A)
This paper introduces EasyAD, an intelligent tool that automates the process of authoring and integrating audio descriptions (AD) into movies for blind and visually impaired (BVI) users. The traditional AD production workflow is highly labor-intensive, requiring authors to…
audio description · blind and low vision · media accessibility · multimodal AI · speech synthesis

11 results.

Reviews

Year

Tag

Search results

Multi-Perspective Visual Contrastive Decoding for Reliable Assistance

Expanding Perspectives to Improve Access to Visual Archives through Multimodal Image Enrichment

SceneScout: Towards AI-Driven Access to Street Level Imagery for Blind Users

From Struggle to Success: Context-Aware Guidance for Screen Reader Users in Computer Use

Mnemonic Tracing: Using Eye Gaze to Search for Visual Memories

Check Now, Can You See It?: Exploring Voice and Video-Capable Language Models for Identifying and Spatially Locating Items of Interest for Blind and Low-Vision Travelers

Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions

Temp access: Reflecting on multimodal GAI as an accessibility technology for temporary disability

DescribePro: Collaborative Audio Description with Human-AI Interaction

AccessMenu: Enhancing Usability of Online Restaurant Menus for Screen Reader Users

Making Accessible Movies Easily: An Intelligent Tool for Authoring and Integrating Audio Descriptions to Movies