Data-Driven Synthesis of Spatially Inflected Verbs for American Sign Language Animation
Pengfei Lu, Matt Huenerfauth · 2011 · ACM Transactions on Accessible Computing · doi:10.1145/2039339.2039343
Summary
This paper addresses a critical accessibility challenge: most deaf individuals in the US have limited English literacy (the majority of deaf high school graduates read at a fourth-grade level), making written web content inaccessible. While sign language animation offers a potential solution, existing systems struggle with a fundamental linguistic feature of ASL—spatial verb inflection, where verbs change their movement path based on the 3D locations of subjects and objects in the signing space. The authors developed a data-driven approach to automatically synthesize spatially inflected ASL verbs. They collected motion data from native ASL signers producing verbs across different subject/object spatial arrangements, then built polynomial models that predict hand location (x, y, z coordinates) and orientation (quaternion values) based on input parameters specifying where subjects and objects are positioned on an arc in front of the signer. The approach uses 3rd-order polynomials trained on approximately 42 training examples per verb, capturing how hand movements vary with different spatial configurations. The research progressed through four phases: initial model development and validation, user evaluation with native signers, exploration of split modeling for complex verbs (separating clockwise vs counterclockwise arrangements), and testing model robustness with reduced training data. Eight ASL verbs were studied, including both one-handed (ASK, TELL, SCOLD) and two-handed (GIVE, MEET, COPY, EMAIL, SEND) inflecting verbs.
Key findings
The polynomial modeling approach successfully generates verb animations indistinguishable from human animator output. In comprehension studies with 18 native ASL signers, model-generated verbs performed statistically equivalent to human-animated versions, and both scored significantly higher than uninflected baseline verbs on comprehension questions (p < 0.05). Side-by-side comparisons showed no significant quality differences between model output and human animator work on 1-10 Likert scales for grammaticality, understandability, and naturalness. The split modeling approach (separating clockwise vs counterclockwise spatial arrangements) showed no improvement over unified models, suggesting simple polynomial fitting handles these variations adequately. Critically, the models proved robust with reduced training data—quality remained stable with as few as 20 training examples (half the original dataset), only degrading noticeably below that threshold. This finding significantly reduces the data collection burden for future implementations. Natural variation between multiple performances by human animators provided a baseline, confirming that model-vs-human differences fall within acceptable natural variation ranges.
Relevance
This research directly enables more accessible web content for deaf users by advancing sign language animation technology. Rather than requiring human animators to manually create every verb variation, the polynomial modeling approach can generate infinite spatially-appropriate verb forms from limited training data—making sign language content production more scalable. For accessibility practitioners, the key insight is that proper ASL grammar requires spatial inflection; uninflected "citation form" verbs are significantly harder to understand. Any sign language animation system must support this feature to be truly accessible. The finding that 20 training examples suffice per verb makes the methodology practical for expanding to more verbs and potentially other sign languages. The work also highlights the importance of involving native signers throughout development—both as data providers and evaluators. The researchers note that signer proficiency with animation tools affects model quality, emphasizing that linguistic expertise must guide technical implementation.
Tags: sign language · ASL · animation · deaf accessibility · natural language processing · avatar technology