Exploring Collection of Sign Language Datasets: Privacy, Participation, and Model Performance

Danielle Bragg, Oscar Koller, Naomi Caselli, William Thies · 2020 · Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2020) · doi:10.1145/3373625.3417024

Summary

This paper tackles a fundamental tension in building machine learning systems for marginalized communities: the need for large training datasets versus the privacy risks of collecting data from small, identifiable populations. The authors focus on sign language video collection, where advancing recognition and translation technology requires extensive video corpora of people signing — but video inherently reveals personal identity, and the relatively small Deaf community (approximately 70 million worldwide) makes individual identification more likely. The Deaf community's history of audism (discrimination based on audiological status) further heightens privacy stakes. The authors propose Privacy-Enhancing Data Filters — visual modifications applied to training videos that obscure contributor identity while preserving the linguistic information needed for machine learning. They explore this concept through two complementary studies. The first is a web-based user study with 61 participants (23 deaf, 11 hard of hearing, 25 hearing, 2 other) who were American Sign Language users. Participants experienced two filters in real-time — frame cel shading (a flattened greyscale rendering) and a tiger face avatar — and were asked about their privacy concerns, willingness to contribute filtered versus unfiltered videos to different data owners, and what types of filters they would prefer. The second study is a computer vision experiment using the RWTH-PHOENIX-Weather 2014 dataset of German Sign Language, testing continuous sign language recognition performance with filtered versus unfiltered training data at various dataset sizes.

Key findings

The user study revealed that privacy concerns are pervasive in the sign language community: 93% of participants reported some concerns about contributing videos. The most common concern was video misuse (61% overall, 68% DHH), followed by being recognized (39%) and revealing surroundings (36%). Willingness to contribute varied dramatically by data recipient — 90% would contribute unfiltered video to a company, 89% to a university, but only 36% to the public. Filters shifted these numbers: frame cel shading maintained similar contribution rates to private entities while boosting public willingness to 56%. DHH participants most wanted cartoon character replacements, while hearing participants preferred face blurring. Critically, participants emphasized that any avatar filter must preserve facial expressions, which are grammatically meaningful in sign languages. The computer vision experiment demonstrated that while filters degrade recognition accuracy at equal dataset sizes, models trained on larger filtered datasets can outperform models trained on smaller unfiltered ones. At the largest training set size (50k glossed signs), face cel shading achieved a word error rate (44.1%) comparable to the unfiltered baseline (43.8%), while frame cel shading was slightly worse (48.0%). This suggests that if privacy filters attract enough additional contributors, the net effect on model performance could be positive.

Relevance

This research has broad implications for any accessibility technology project that requires collecting sensitive data from disability communities. It establishes a framework for thinking about the privacy-participation-performance tradeoff that applies far beyond sign language — to voice data from people with speech disabilities, eye tracking data from people with motor impairments, or medical imagery for diagnostic AI. For practitioners building sign language recognition systems, the paper provides concrete evidence that privacy-preserving approaches are not only ethically important but potentially beneficial for model performance through increased data collection. The finding that facial expressions must be preserved in any filter is particularly important, as it reflects a linguistic reality of sign languages that technologists might otherwise overlook. The paper also raises important questions about data ownership and community control, noting that monetization of Deaf community data by outside organizations is a significant concern.

Tags: sign language · privacy · machine learning · data collection · Deaf culture · computer vision · sign language recognition