An Adaptive Videos Enrichment System Based on Decision Trees for People with Sensory Disabilities

José Francisco Saray Villamizar, Benoît Encelle, Yannick Prié, Pierre-Antoine Champin · 2011 · Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/1969289.1969299

Summary

This paper from the ACAV (Collaborative Annotation for Video Accessibility) project at the University of Lyon proposes an adaptive system that personalises how video accessibility descriptions are presented to users with sensory disabilities. The core insight is that predefined, one-size-fits-all video descriptions and their presentation modalities do not satisfy the diverse needs of users with different disabilities, preferences, and cognitive capabilities. For example, some visually impaired users may want detailed descriptions of characters and settings while others prefer action descriptions; some read Braille while others rely on text-to-speech. The system enriches videos with typed annotations (describing settings, actions, characters, etc.) at varying Levels of Detail (LoD), presented through different output modalities (text-to-speech, refreshable Braille display with regular or contracted Braille). Rather than requiring users to configure preferences manually before watching (an "adaptable" approach), the system uses an adaptive approach based on decision trees that learns from user behaviour in real time. Users provide implicit feedback through a simple spacebar press to reject unwanted annotation presentations; absence of feedback indicates acceptance.

Key findings

The system operates in two stages. During the learning phase, the system presents annotation renderings and collects user feedback tuples (accepted/rejected) for each "rendering section" — a temporal segment of the video where the set of concurrent annotation presentations remains constant. These feedback tuples are organised into tables grouped by the number of concurrent presentations (SCard), and J48 decision trees are induced from each table. During the adaptation stage, the system uses these trees to predict whether a user will accept or reject any incoming rendering section. For rejected predictions, the system transforms the presentation to the most similar accepted template, using a weighted distance measure based on information gain — prioritising changes to the attribute that has the most influence on user preferences (e.g., changing modality before changing level of detail). Simulation evaluation with four different user models showed acceptation rates quickly converging to 94% on average after a learning phase of 30 annotations, with tree relearning every 10 presentations. Two user models achieved 100% acceptation throughout, while the others improved from 60% to 90-100% within 40-50 annotations.

Relevance

This research addresses a significant gap in video accessibility: the assumption that a single caption track or audio description serves all users equally. For accessibility practitioners, the key takeaway is that personalisation of accessibility features — not just their presence — matters significantly for user experience. Different users with the same disability category may have very different preferences for description type, detail level, and output modality. The adaptive approach, which learns from minimal user input (a single keypress for rejection) without requiring complex preference configuration, offers a model for how accessibility features could be made more user-responsive. While the specific decision tree implementation is a research prototype, the underlying principle — that machine learning can personalise accessibility presentations based on lightweight user feedback — anticipates modern approaches to adaptive accessibility and is increasingly feasible with current AI capabilities. The work also highlights the importance of multiple output modalities (speech, Braille) and description granularity as dimensions of accessibility that go beyond simple presence/absence of captions or descriptions.

Tags: video accessibility · multimedia accessibility · machine learning · personalization · adaptive systems · audio description · sensory disabilities · decision trees · blind and low vision · deaf and hard of hearing