Exploring the Benefits and Applications of Video-Span Selection and Search for Real-Time Support in Sign Language Video Comprehension among ASL Learners

Saad Hassan, Caluã de Lacerda Pataca, Akhter Al Amin, Laleh Nourian, Diego Navarro, Sooyeon Lee, Alexis Gordon, Matthew Watkins, Garreth W. Tigwell, Matt Huenerfauth · 2024 · ACM Transactions on Accessible Computing · doi:10.1145/3690647

Summary

This research investigates how to better support hearing students learning American Sign Language (ASL) when they encounter unfamiliar signs in video content. Existing ASL dictionary tools present significant barriers: they require learners to either recall and enter linguistic properties of an unknown sign (handshape, movement, location) or perform the sign into a webcam—both challenging tasks when you don't know the sign. Additionally, using external dictionary websites disrupts the video-watching experience and causes learners to lose context. The researchers conducted three studies with ASL learners (all hearing students with 3+ years of formal ASL classes). Study 1 interviewed 14 participants about their experiences watching challenging ASL videos, their workarounds, and preferences for a future integrated dictionary system. Study 2 was an observational study where 8 participants used a Wizard-of-Oz prototype that allowed them to select spans of video and receive dictionary results for signs within that selection. Study 3 compared this integrated search tool against a baseline condition using an existing feature-based dictionary website (HandSpeak). The prototype interface featured a video player with a timeline span-selector (allowing users to highlight portions of video), a "Search selection" button that returned potential sign matches from the American Sign Language Lexicon Video Dataset (ASLLVD), and a translation text area. The Wizard-of-Oz methodology meant researchers manually prepared search results to simulate automatic sign recognition, allowing focus on user interaction patterns rather than technical recognition accuracy.

Key findings

Study 1 revealed that ASL learners face comprehension challenges from regional/dialectical variation, linguistic complexity (fingerspelling, classifiers, compound signs), and genre differences (conversational videos are faster and more casual; theatre/poetry uses more depiction). Current workarounds include pausing, replaying, slowing playback, using context clues, and asking teachers. Participants expressed frustration with switching between video and external dictionaries. Study 2's observational findings showed learners used the span-selector in dual ways: (1) to constrain the video playhead for focused viewing of challenging segments, and (2) to initiate dictionary searches. Participants made an average of 8.29 span adjustments per video when using integrated search. They gradually narrowed spans before searching (average 2.33 seconds for search spans vs 8.17 seconds for viewing-only spans). Video genre significantly affected behavior—theatre/poetry videos required wider spans (M=16.74s) compared to conversational (M=9.86s) or educational videos (M=9.61s). Study 3's comparative results demonstrated clear advantages for the integrated search tool: - **Translation quality**: 8.03/10 vs 6.67/10 for baseline (p=0.0424) - **Time taken**: 547% of video duration vs 1244% for baseline (p=0.042) - **Workload**: Significantly lower mental demand, temporal demand, and frustration (NASA-TLX) Participants struggled when citation forms in dictionary results differed from signs as actually produced in context (affected by coarticulation, emotion, or dialectical variation).

Relevance

This research has direct implications for ASL educators, sign language technology developers, and accessibility researchers working on Deaf/HH communication tools. For ASL educators, the integrated video-span tool offers a way to support independent learning. Students can practice comprehension with challenging videos while having on-demand dictionary access without losing context. The tool could reduce teacher burden for creating annotated practice materials and help identify where students struggle (through analysis of which spans they search). For technology developers, the findings provide design guidance for integrated sign language dictionary systems: users prefer manual span selection over automatic pre-segmentation, same-page results over separate windows, two-column result layouts, and linguistic metadata accompanying each result. The discovery that span-selection serves dual purposes (playback control and search input) suggests this feature should be standard in ASL learning platforms. For computer vision researchers, the study reveals a new task: sign recognition from user-selected continuous video spans rather than isolated signs. This is more challenging because span boundaries may not align with sign boundaries and signs may be affected by coarticulation. However, it better reflects real-world learning scenarios. The research focuses on hearing ASL learners, but the video-span selection concept could extend to other contexts: annotating ASL videos, supporting ASL interpreters with technical content, or enabling analysis of any video requiring identification of specific movements or gestures.

Tags: American Sign Language · sign language learning · video comprehension · dictionary lookup · Deaf and hard of hearing · human-computer interaction