Design and Evaluation of Hybrid Search for American Sign Language to English Dictionaries: Making the Most of Imperfect Sign Recognition

Saad Hassan, Akhter Al Amin, Alexis Gordon, Sooyeon Lee, Matt Huenerfauth · 2022 · Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI '22) · doi:10.1145/3491102.3501986

Summary

This CHI 2022 paper tackles a practical problem faced by the roughly 200,000 students currently studying American Sign Language (ASL) in the US: how do you look up the meaning of a sign you just saw when you don't know its English gloss? Unlike written languages, ASL has no standard writing system, so you cannot type it into a dictionary. Two established paradigms exist: search-by-feature (the user manually selects linguistic properties such as handshape, location, and movement from a menu, which is accurate but cumbersome and demands linguistic knowledge novices lack) and search-by-video (the user signs the word into a webcam and computer-vision-based sign recognition returns a ranked list of candidates, which is easier to query but currently imperfect — correct answers can appear well down a long list). The authors propose a hybrid-search approach that combines the two: the user first submits a video query, then optionally filters the result list by linguistic properties. The research is presented in two linked studies. Study 1 was a formative interview study (n=32 introductory ASL students, remote over Zoom during COVID, recruited via ASL course instructors, mean age 21) that evaluated an early prototype and surfaced design requirements. Study 2 was a between-subjects summative experiment (n=20, 10 per condition) that compared the refined hybrid prototype against a search-by-video baseline on 32 real search tasks per participant using a Wizard-of-Oz backend that simulated sign recognition at slightly above state-of-the-art accuracy.

Key findings

Study 1 produced design guidance the authors then built into the Study 2 prototype: result videos should auto-play in a looping grid rather than on hover (17/32 preferred this for fast visual scanning of movement and handshape); each result should have textual linguistic metadata alongside the video (26/32 wanted handshape as a filter, using both a handshape image and its conventional English name because novices cannot recall handshape names); the minimal useful filter set is handshape, body-relative location, one-vs-two-handed (with symmetric/asymmetric sub-options), and repeated-vs-non-repeated movement. Study 2 found that hybrid-search participants were significantly more satisfied with the search experience (Mann-Whitney p = 0.004), rated results more useful (p = 0.026), were more satisfied with result ranking (p = 0.016), and felt greater sense of control (p = 0.03) than search-by-video participants — even though the underlying recognition accuracy was identical across conditions. Task-success rates were 84% for hybrid vs 79% for search-by-video (not statistically significant with n=20). Search times were lower for hybrid when the desired item was in the top 50 results. Users consistently scanned the first few rows before falling back to filters, suggesting filters should be presented as a fallback rather than a first step.

Relevance

For accessibility practitioners, this paper is a rare and instructive example of designing around — rather than waiting for — imperfect AI. The core lesson is that human-AI collaboration design can deliver real user-experience gains at the same AI accuracy level, which generalises far beyond ASL: the same hybrid pattern would apply to sign dictionaries for BSL, Auslan, ISL and other signed languages; to searching human-movement datasets (dance moves, martial arts); and to lookup of orthographically deep spoken languages via ASR or handwriting recognition. The findings also argue for sense-of-control as a first-class UX metric for AI-assisted accessibility tools. Limitations include a sample restricted to university-level introductory ASL students (not Deaf users, not experienced signers, not younger learners), a single short-video query form factor (desktop webcam, not mobile phones), Wizard-of-Oz rather than real recognition, and subjective self-reports rather than eye-tracking or longitudinal learning outcomes.

Tags: american sign language · sign language · sign language recognition · search interfaces · dictionary · deaf and hard of hearing · human-AI collaboration · user experience · qualitative research