Towards an AI-Driven Video-Based American Sign Language Dictionary: Exploring Design and Usage Experience with Learners
Saad Hassan, Matyas Bohacek, Chaelin Kim, Denise Crochet · 2025 · Proceedings of the 22nd International Web for All Conference (W4A) · doi:10.1145/3744257.3744258
Summary
This paper presents the design, implementation, and user evaluation of a fully automated video-based American Sign Language (ASL) dictionary that allows learners to look up unfamiliar signs by recording themselves performing the sign via webcam. Unlike traditional sign language dictionaries that require users to search by English gloss or linguistic features like handshape, this system uses isolated sign language recognition (ISLR) technology to match a user's video input against a database of known signs. The prototype is built as a web application with a React frontend and Python Flask backend, using the PopSign ASL v1.0 model trained on the ASL Citizen dataset with a vocabulary of 250 signs. When a user records a video, the system extracts hand and body landmarks using MediaPipe, processes them through the recognition model, and returns a ranked list of candidate signs with confidence scores and example videos. The design incorporates recommendations from prior Wizard-of-Oz studies, including search-by-feature filtering, handshape browsing, and displaying multiple ranked results rather than a single match. An observational study with 12 novice ASL learners explored how participants used the dictionary during video comprehension and question-answering tasks, examining their recording strategies, result navigation behaviors, and perceptions of the AI-powered lookup system.
Key findings
Participants conducted an average of 2.32 searches per continuous recording session, with most willing to resubmit at least three times when initial results were unsatisfactory. The Discounted Cumulative Gain (DCG) analysis showed significant improvement over prior work, with the correct sign appearing in the top results more frequently. Users developed diverse strategies for improving their video submissions, including adjusting camera distance based on sign location (farther for body signs, closer for face signs), managing background clutter, and slowing their signing speed. However, reproducing unfamiliar signs from memory proved challenging, representing a fundamental tension in video-based lookup: users must accurately perform a sign they do not yet know. System latency of approximately 7 seconds per query was noticeable but generally acceptable, though it disrupted workflow during video comprehension tasks. Privacy concerns emerged as a significant theme, with participants questioning webcam recording despite the system processing only extracted landmarks and discarding video data. A surprising finding was the presence of unrelated signs in results, suggesting potential biases in the AI model stemming from the pose estimation preprocessing. Participants preferred seeing all results on a single landing page rather than paginated results.
Relevance
This research addresses a critical accessibility gap in sign language education technology. Traditional ASL dictionaries assume learners can identify signs by their English equivalent or linguistic features, but learners encountering unknown signs in video content have no way to look them up without these reference points. The video-based approach offers a more natural lookup paradigm analogous to how hearing learners can sound out unfamiliar words. For accessibility practitioners, the study highlights important design considerations for AI-powered language tools, including managing user expectations around recognition accuracy, providing clear guidance for video recording quality, and addressing privacy concerns transparently. The findings about model bias in results are particularly relevant as AI-driven accessibility tools become more prevalent, underscoring the need for diverse training data and transparent confidence indicators. The study's limitation to a 250-sign vocabulary points to scalability challenges that must be resolved before such tools can serve real-world learning needs.
Tags: sign language technology · ASL dictionary · sign language recognition · video-based search · language learning · AI-driven accessibility