← All reviews

Making Lecture Videos Accessible for Students who are Blind or have Low Vision through AI-Assisted Navigation and Visual Question Answering

Katharina Anderer, Karin Müller, Lukas Strobel, Matthias Wölfel, Jan Niehues, Kathrin Gerling · 2025 · Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2025) · doi:10.1145/3663547.3746349

Summary

This paper presents the design and evaluation of LectureAssistant, an AI-powered prototype that makes lecture videos more accessible for students who are blind or have low vision. The research follows a three-part human-centred design process. First, need-finding interviews with 12 students from six German universities who are blind or have low vision explored their challenges with current lecture materials and their visions for how AI could improve accessibility. Key themes emerged: students wanted AI to customise lecture materials (adapting visual presentation, providing alternative text descriptions at varying detail levels), enable interactive engagement with content (asking questions about visual elements on slides), and synchronise disparate information sources (connecting video, transcript, and slides so students can track which slide is currently being discussed). Based on these findings, the researchers developed LectureAssistant, built on the open-source Lecture Translator platform, featuring three integrated windows: a video player, an AI chatbot, and a synchronised transcript. The chatbot uses Llama 3.1 with retrieval-augmented generation (RAG) to answer questions about lecture content from the transcript, and employs the Minicpm-v vision-language model to answer visual questions about the current video frame. The system is hosted locally to protect student data privacy and is built entirely on open-source frameworks. An iterative design process with two visually impaired users refined the prototype before a final evaluation with seven students.

Key findings

In the evaluation, all seven participants perceived LectureAssistant as useful, accessible, and easy to use, with particular enthusiasm for the AI-powered video search function (asking the chatbot to navigate to specific topics) and the ability to ask questions about visual content in the current video frame. Participants found it offered clear advantages over existing platforms like YouTube, where navigating longer lecture videos is difficult and visual content is entirely inaccessible. However, several barriers were identified: visibility issues (poor contrast, small UI elements), navigation barriers (forced transcript focus interfering with screen reader navigation, unexpected jumps during navigation), and perception barriers (lack of audio or tactile feedback when slides change or when AI generates responses). The AI-generated image descriptions were a mixed experience—some participants found them appropriately detailed, while others found them too condensed or occasionally hallucinated content not present in the video frame. Participants requested personalised image descriptions, notifications for slide transitions, brief slide overviews, and the option to upload their own videos rather than depending on lecturers. The study emphasises that AI assistive technology should not replace efforts to make lectures inherently inclusive—lecturers should still be trained to verbally describe visual content and offer accessible materials directly.

Relevance

This research addresses a critical gap in higher education accessibility where the rapid shift to digital and video-based learning materials has created significant barriers for students with visual impairments. For practitioners and developers, the LectureAssistant prototype demonstrates how current AI technologies (LLMs and VLMs) can be combined with existing video platforms to create accessible learning tools without relying on proprietary APIs or compromising student data privacy. The open-source approach is particularly valuable for educational institutions with limited budgets. The need-finding results provide a useful requirements catalogue for anyone building accessible educational video tools: customisable alternative text descriptions, multimodal output (audio, braille, visual), transcript-video synchronisation, and interactive Q&A about visual content. The paper also raises important cautionary points about AI hallucination in educational contexts and the social implications of assistive technology potentially increasing student isolation if it reduces interpersonal interaction in lectures.

Tags: blind and low vision · lecture accessibility · higher education · large language models · vision-language models · AI-assisted accessibility · video accessibility · screen readers

Standards referenced: WCAG 2.0