← Writing · Glossary →

Reviews

The literature-review database. Every paper Bob has reviewed (he has read many more), with a short summary, key findings, and tags. Browse, filter, search.

Search results

  • Multi-Perspective Visual Contrastive Decoding for Reliable Assistance

    Bocheng Pan, Hailong Shi, Xingyu Gao · 2026 · ACM Transactions on Internet of Things

    This technical paper presents MPVCD (Multi-Perspective Visual Contrastive Decoding), a framework designed to address the reliability of AI-generated visual descriptions for people who are blind or have low vision (BLV). The core problem it tackles: when BLV users photograph…

    blindness and low vision · multimodal AI · image captioning · visual hallucination · assistive technology

  • Expanding Perspectives to Improve Access to Visual Archives through Multimodal Image Enrichment

    Karina Rodriguez Echavarria, Myrsini Samaroudi · 2026 · ACM Journal on Computing and Cultural Heritage

    This paper addresses a pervasive challenge in the Galleries, Libraries, Archives and Museums (GLAM) sector: large-scale visual collections that have been digitised but remain undiscoverable because they lack descriptive metadata. The authors, from the University of Brighton,…

    cultural heritage · metadata enrichment · AI image classification · FAIR principles · information discovery

  • SceneScout: Towards AI-Driven Access to Street Level Imagery for Blind Users

    Gaurav Jain, Leah Findlater, Cole Gleason · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26)

    Jain, Findlater and Gleason present SceneScout, a prototype web interface that uses a multimodal large language model (GPT-4o) to make street level imagery — the panoramic pedestrian-height photography behind Apple Maps Look Around and Google Street View — directly usable by…

    accessibility · navigation · screen readers · AI · multimodal AI

  • From Struggle to Success: Context-Aware Guidance for Screen Reader Users in Computer Use

    Nan Chen, Jing Lu, Zilong Wang, Luna K. Qiu, Siming Chen, Yuqing Yang · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26)

    Chen, Lu, Wang, Qiu, Chen and Yang present AskEase, an NVDA add-on that delivers on-demand, step-by-step, screen-reader-friendly guidance for blind and low-vision computer users tackling unfamiliar desktop software. The work responds to a persistent problem: mainstream tutorials…

    accessibility · screen readers · AI · LLM · assistive technology

  • Mnemonic Tracing: Using Eye Gaze to Search for Visual Memories

    Wazeer Zulfikar, Yasith Samaradivakara, Paul Pu Liang, Pattie Maes · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26)

    Mnemonic Tracing is a non-verbal image-retrieval interaction in which a user, wearing eye-tracking glasses, deliberately retraces the contents of a remembered image with their gaze on a blank surface. The paper builds on gaze-reinstatement research, which shows that when people…

    eye tracking · gaze interaction · gaze reinstatement · episodic memory · image retrieval

  • Check Now, Can You See It?: Exploring Voice and Video-Capable Language Models for Identifying and Spatially Locating Items of Interest for Blind and Low-Vision Travelers

    Aziz N Zeidieh, JooYoung Seo · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility

    This experience report documents the lived experiences of two blind travelers — Aziz (28, blind in left eye, 20/2200 in right) and JooYoung (35, blind in right eye, limited vision in left) — as they adapted commercially available voice and video-capable language models (VVLMs)…

    artificial intelligence · navigation · blindness and visual impairment · multimodal AI · large language models

  • Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions

    Meng Chen, Akhil Iyer, Amy Pavel · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility

    This paper addresses a critical safety problem in AI-powered visual access technology: multimodal large language models (MLLMs) like GPT-4o, Gemini, and Claude produce fluent, confident image descriptions that can contain fabricated content, misinterpretations, and omissions…

    blindness · low vision · image descriptions · multimodal AI · large language models

  • Temp access: Reflecting on multimodal GAI as an accessibility technology for temporary disability

    Kate S. Glazko · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility

    This paper presents an autoethnographic account of using multimodal generative AI (GAI) tools as accessibility technology during a period of temporary disability. The author, an accessibility researcher, experienced an illness that simultaneously impacted verbal communication,…

    generative AI · temporary disability · assistive technology · autoethnography · multimodal AI

  • DescribePro: Collaborative Audio Description with Human-AI Interaction

    Maryam S Cheema, Sina Elahimanesh, Samuel Martin, Pooyan Fazli, Hasti Seifi · 2025 · ASSETS 2025: 27th International ACM SIGACCESS Conference on Computers and Accessibility

    This paper presents DescribePro, a web-based platform that combines human expertise with AI capabilities to create and refine audio descriptions (AD) for video content. The system addresses the fundamental tension in AD production: human-crafted descriptions are high quality but…

    audio description · video accessibility · human-AI collaboration · authoring tools · blind and low vision

  • AccessMenu: Enhancing Usability of Online Restaurant Menus for Screen Reader Users

    Nithiya Venkatraman, Akshay Kolgar Nayak, Suyog Dahal, Yash Prakash, Hae-Na Lee, Vikas Ashok · 2025 · Proceedings of the 22nd International Web for All Conference (W4A)

    This paper addresses the significant accessibility barriers that blind and visually impaired (BVI) screen reader users face when trying to access online restaurant menus, which are typically presented as images or PDFs. The research proceeds in two phases. First, an interview…

    screen readers · blind users · visual document understanding · LLM accessibility · multimodal AI

  • Making Accessible Movies Easily: An Intelligent Tool for Authoring and Integrating Audio Descriptions to Movies

    Ming Shen, Gang Huang, Yuxuan Wu, Shuyi Song, Sheng Zhou, Liangcheng Li, Zhi Yu, Wei Wang, Jiajun Bu · 2024 · Proceedings of the 21st International Web for All Conference (W4A)

    This paper introduces EasyAD, an intelligent tool that automates the process of authoring and integrating audio descriptions (AD) into movies for blind and visually impaired (BVI) users. The traditional AD production workflow is highly labor-intensive, requiring authors to…

    audio description · blind and low vision · media accessibility · multimodal AI · speech synthesis

11 results.