Evaluating Quality and Comprehension of Real-Time Sign Language Video on Mobile Phones

Jessica J. Tran, Joy Kim, Jaehong Chon, Eve A. Riskin, Richard E. Ladner, Jacob O. Wobbrock · 2011 · Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2011) · doi:10.1145/2049536.2049558

Summary

This paper investigates the relationship between objective video quality metrics (PSNR — peak signal-to-noise ratio), subjective quality preferences, and actual comprehension of American Sign Language (ASL) video transmitted at very low bitrates on mobile phones. At the time of publication, most video chat applications required Wi-Fi or expensive 4G connections, and many US carriers were restricting mobile video calls. The researchers's MobileASL project aimed to enable real-time sign language communication over 3G networks, which required extremely low bitrates (10-60 kbps) and small spatial resolutions (192x144 and 320x240 pixels). Through a national online survey of 103 respondents (56 ASL speakers, 39 non-ASL speakers), the study used a paired-comparison experiment to assess video quality preferences and a single-stimulus experiment to measure comprehension of ASL content. The study examined whether the standard engineering metric PSNR — widely used to evaluate video compression quality — actually predicts what matters most for sign language video: human comprehensibility.

Key findings

Both ASL and non-ASL speakers overwhelmingly preferred the 320x240 spatial resolution at bitrates of 20 kbps and higher (p<0.0001). Surprisingly, at the lowest bitrate (10 kbps), both groups preferred the lower 192x144 resolution — the opposite of what one might expect. This contradicts PSNR predictions, which showed the smaller resolution having higher objective quality at bitrates below 40 kbps (a known crossover effect in video compression). However, PSNR did accurately correlate with perceived ease/difficulty of ASL comprehension: at 50 kbps and higher, the crossover point where PSNR predicts the larger resolution is better matched the point where respondents reported significantly easier comprehension with the 320x240 resolution (Z=100.0, p<0.001). A key practical finding was that at 40 kbps, transmitting at 192x144 provided intelligible sign language video while keeping computational costs low. ASL fluency did not affect quality preferences — both groups made the same choices — suggesting that the visual preference for sign language video is not specialised but follows general video quality perception.

Relevance

This research has direct practical implications for making mobile video communication accessible to deaf sign language users. The finding that PSNR does not predict subjective quality preferences but does correlate with comprehension is important for engineers designing video compression systems: optimising for perceived visual quality and optimising for sign language intelligibility require different approaches. The specific bitrate and resolution recommendations (192x144 at 40 kbps for intelligible ASL on constrained networks) provided actionable parameters for MobileASL and similar applications. For accessibility practitioners, the study underscores that standard metrics and assumptions about video quality may not apply when the video serves a communicative function fundamentally different from entertainment. Sign language video has unique requirements — comprehension depends on seeing hand shapes, facial expressions, and body movements — that generic compression algorithms may not prioritise. Although bandwidth constraints have eased since 2011, the underlying principles remain relevant for situations with limited connectivity and for optimising battery life on mobile devices used for sign language video calls.

Tags: deaf · American Sign Language · video compression · mobile accessibility · video quality · telecommunications · sign language