Evaluating American Sign Language Generation Through the Participation of Native ASL Signers

Matt Huenerfauth, Liming Zhao, Erdan Gu, Jan Allbeck · 2007 · Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '07) · doi:10.1145/1296843.1296879

Summary

This paper addresses the challenge of generating animations of American Sign Language (ASL) sentences, focusing specifically on classifier predicates — complex spatial constructions that describe the location, movement, size, and shape of objects. While previous ASL generation systems had not attempted these frequent and important structures, the authors developed a prototype system that translates English input sentences into ASL animations containing classifier predicates. The system works by parsing English sentences, identifying spatial information about objects, and using a 3D model of the arrangement of entities to select appropriate handshapes and movement paths for the animated signer. The research is grounded in the recognition that ASL is a full natural language distinct from English, used by approximately half a million people in the United States, and that a majority of deaf 18-year-olds read below a 10-year-old hearing student level — making ASL translation technology potentially transformative for information access. Beyond the technical implementation, the paper makes a significant methodological contribution by examining how evaluation studies for ASL generation systems should be designed. The authors argue that standard natural language generation evaluation methods (comparing output to gold-standard text strings) are fundamentally unsuitable for ASL because the language has no standard written form, and users would never consume written ASL — they would watch animations. This necessitates user-based evaluation with native ASL signers, which introduces important cultural and linguistic considerations.

Key findings

The evaluation study with 15 native ASL signers found that the prototype system produced classifier predicate animations that were rated significantly higher than Signed English animations (a lower baseline) across grammaticality, understandability, and naturalness on 1-10 scales. However, an interesting finding was that understandability scores and actual comprehension success (measured via matching tasks) were only moderately correlated, suggesting that perceived understandability does not reliably predict actual understanding. This means evaluation studies should include comprehension verification tasks, not just subjective ratings. The paper also identified specific areas for improvement: subjects wanted more realistic hand movements, better facial expressions and eye-gaze behaviour, and more natural body proportions for the animated character. Culturally, the authors found it essential to conduct evaluations in ASL-immersive environments, have a native ASL signer present as a researcher, provide instructions in ASL rather than English, and engage participants in ASL conversation before the study to establish comfort — all factors that could significantly affect result validity if overlooked.

Relevance

This paper is foundational for anyone working on sign language technology or evaluating accessibility tools with Deaf users. Its methodological insights about user study design remain highly relevant: the cultural and linguistic protocols described — ASL-immersive environments, Deaf researchers, appropriate recruitment through community connections — should be standard practice for any research involving the Deaf community. The work also highlights a critical gap in accessibility technology: despite decades of progress, automated English-to-ASL translation remains an unsolved challenge, and the classifier predicate structures addressed here are essential for natural ASL communication. For practitioners, the finding that subjective ratings do not reliably predict actual comprehension is a crucial warning against relying solely on user satisfaction surveys when evaluating communication accessibility tools.

Tags: sign language generation · American Sign Language · classifier predicates · animation evaluation · natural language generation · Deaf community · accessibility technology · user-based evaluation