A Saliency-Driven Video Magnifier for People with Low Vision

Ali Selman Aydin, Shirin Feiz, Vikas Ashok, I V Ramakrishnan · 2020 · Proceedings of the 17th International Web for All Conference (W4A) · doi:10.1145/3371300.3383356

Summary

This demonstration paper presents SViM (Saliency-driven Video Magnifier), a system that uses deep learning-based visual saliency prediction to automatically guide screen magnification to the most important regions of a video for people with low vision. Screen magnifiers are the primary assistive technology for low-vision users, but watching video content with magnification is particularly challenging because the dynamic nature of video requires constant manual navigation to keep track of interesting content — and users risk missing important action happening outside their magnified viewport. SViM addresses this by using DeepVS, a neural network that predicts visual saliency by combining object detection, motion analysis, and temporal relationship learning across frames. The system generates per-frame heatmaps quantifying pixel-level "interestingness," then post-processes these heatmaps using hierarchical clustering to identify distinct regions of interest (ROIs). For example, in a conversation between two people, the system identifies two ROIs corresponding to each person's face. ROIs are tracked over time using a Kalman filter for smoothing, preventing the jittery viewport movement that would otherwise disrupt the viewing experience. The system offers three interface modes: SViM-Basic (automatic ROI tracking with click-to-switch between multiple ROIs), SViM-Mixed (adds free mouse navigation alongside automatic tracking), and SViM-Lens (magnifies only ROIs while keeping the background visible at normal size). Additional features include mouse-wheel zoom control and visual transformations such as contrast enhancement, sharpening, brightness adjustment, and colour inversion.

Key findings

A user study with 13 participants compared the three SViM interfaces against Windows Magnifier and VLC Player's built-in magnifier. Both subjective ratings and objective measurements demonstrated that users had a better experience with all three SViM interfaces compared to the baseline magnifiers. The system runs in real-time on a standard PC once the saliency heatmaps have been computed offline, and since heatmaps only need to be computed once per video, this processing can be offloaded to GPU-equipped servers, making web deployment feasible. The three interface modes offer different trade-offs between automation and user control — SViM-Basic is fully guided and requires minimal interaction, SViM-Mixed gives users the option to explore freely when they want, and SViM-Lens preserves spatial context by only magnifying the salient regions.

Relevance

Video content is increasingly central to web experiences — from education and entertainment to news and social media — yet it remains one of the most challenging content types for people with low vision. While screen magnifiers handle static content reasonably well, the temporal and spatial complexity of video means that low-vision users frequently miss important visual information or exhaust themselves with constant manual viewport adjustments. SViM represents an important direction for video accessibility by shifting the navigation burden from the user to an intelligent system that predicts where the interesting content is. For accessibility practitioners and content platforms, this approach could be integrated into video players as an accessibility feature, similar to how captions serve deaf and hard-of-hearing users. The web-deployable architecture (offline heatmap computation on GPU servers, real-time playback on client) makes this practical for streaming services and educational platforms. The work also highlights that low-vision accessibility extends well beyond text magnification — rich media requires fundamentally different assistive approaches.

Tags: low vision · screen magnifier · video accessibility · computer vision · deep learning · assistive technology · saliency detection