← All reviews

The Sem-Lex Benchmark: Modeling ASL Signs and their Phonemes

Lee Kezar, Jesse Thomason, Naomi Caselli, Zed Sehyr, Elana Pontecorvo · 2023 · ASSETS '23: Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3597638.3608408

Summary

This paper introduces the Sem-Lex Benchmark, the largest curated dataset of its kind for American Sign Language (ASL) isolated sign recognition, containing over 84,000 videos of isolated sign productions from 41 deaf ASL signers. The dataset addresses two critical barriers in sign language technology research: the lack of large-scale, ethically sourced data, and the failure to incorporate linguistic knowledge about sign language structure into recognition models. Unlike some existing datasets that were scraped from the internet without consent, all Sem-Lex contributors are deaf, gave informed consent to share their video data in a public repository, and were compensated for their participation. The videos were collected using a free semantic associations paradigm through an interface called SignLab, where participants were shown a cue sign and asked to produce the first three meaning-related signs that came to mind. This approach ensured signers produced signs they naturally knew and used. Human experts then labeled the videos using a novel system that matches productions against reference signs from existing lexical databases (ASL-LEX and SignBank) rather than relying on English glosses, which are problematic because the mapping between English words and ASL signs is not one-to-one. The dataset covers 3,149 unique signs, with approximately 78% of videos aligned with ASL-LEX entries that include detailed phonological feature annotations across 16 categories including handshape, location, movement, and finger configuration.

Key findings

The researchers demonstrate that incorporating phonological features — the linguistic building blocks of signs analogous to phonemes in spoken language — significantly improves sign recognition accuracy. An SL-GCN (Sign Language Graph Convolution Network) model trained on the Sem-Lex Benchmark achieved 67.7% top-1 accuracy for isolated sign recognition (ISR), substantially outperforming the same model on the WLASL dataset (26.4%). When trained to recognize both gloss and all 16 phonological feature types simultaneously, ISR accuracy improved to 71.3%, a meaningful gain that confirms phonology is a useful auxiliary task. Phonological features themselves were recognizable at 85% average accuracy, with wrist twist (92.6%), thumb contact (91.7%), and thumb position (91.5%) being the most accurate. The phonological approach proved especially valuable for few-shot learning: with only 4 training examples per sign, the model achieved 62.2% accuracy on Sem-Lex versus 18.4% on WLASL, and adding phonological features further improved this to 68.2%. This is critical because 45% of signs in the benchmark have fewer than 10 training instances, reflecting the realistic long-tailed distribution of sign frequency. The test set was intentionally composed of signers from underrepresented demographics to evaluate fairness, revealing a slight performance drop (68.2% validation vs. 66.6% test) that suggests some reliance on signer-specific rather than sign-specific features, though this bias is mitigated by using pose estimation rather than raw video.

Relevance

This work is significant for deaf accessibility because sign language recognition technology could dramatically increase access for deaf signing communities — enabling applications from sign language search and translation to educational tools. However, the paper also models responsible research practices that the field badly needs. Many existing ASL datasets were scraped without consent, include signers of unknown hearing status who may not be native signers, and lack demographic information needed to evaluate fairness. The Sem-Lex Benchmark addresses these ethical concerns by ensuring all contributors are deaf, compensated, and consenting. The authors also convened deaf and signing scholars to develop data-sharing guidelines, asking users to commit to "do no harm," work closely with deaf communities, and recognize that even high-performing models do not replace the need for sign language interpreters and teachers. For accessibility practitioners, the phonological approach offers a linguistically grounded path toward more accurate and generalizable sign recognition, treating ASL as the structured language it is rather than reducing it to a computer vision problem.

Tags: sign language recognition · American Sign Language · machine learning · phonology · dataset · deaf accessibility · computer vision · benchmark