Modeling and Synthesizing Spatially Inflected Verbs for American Sign Language Animations
Matt Huenerfauth, Pengfei Lu · 2010 · Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2010) · doi:10.1145/1878803.1878823
Summary
This paper presents a novel computational method for automatically synthesizing animations of American Sign Language (ASL) verbs that undergo spatial inflection — a grammatical process where verb motion paths change based on the 3D locations in space that have been assigned to their subject and/or object during discourse. In ASL, signers associate entities under discussion with specific locations around their body; many verbs then modify their hand path and orientation to move from the subject's location toward the object's location, conveying who is doing what to whom. Because signers can use continuously variable locations (not a finite set), it is impossible to pre-record all possible inflected forms in a dictionary. The researchers addressed this by collecting multiple examples of five ASL verbs (ASK, GIVE, MEET, SCOLD, TELL) produced by a native signer using VCom3D Gesture Builder software, with subjects and objects placed at seven discrete arc positions around the signer. They then fit third-order polynomial models to the hand location (x,y,z) and orientation (quaternion) data for each keyframe, parameterised on the arc positions of subject and object. These mathematical models can then synthesise any novel inflected verb instance for arbitrary subject/object positions — producing infinitely many versions from finite training data.
Key findings
The evaluation had two components. First, comparing model-generated verb instances to three separate performances by the human animator showed no significant differences in hand location or orientation — the model's deviation from each human sample was comparable to the variation among the human's own repeated performances. Second, an evaluation study with 18 native ASL signers compared animations containing: (1) model-synthesised inflected verbs, (2) human-animator inflected verbs, and (3) uninflected dictionary verbs. In a side-by-side quality comparison, model-produced verbs received similar Likert-scale scores to human-produced inflected verbs. Critically, comprehension question scores for both model-produced and human-produced inflected verbs were significantly higher than for uninflected verbs — confirming prior research showing that spatial inflection roughly doubles comprehension scores. The methodology is generalisable to other ASL verbs and other sign languages that use spatial inflection. The approach uses relatively simple mathematical techniques (third-order polynomial least-squares fitting) that other researchers can replicate, and it can partially automate the labour-intensive process of creating custom signs in ASL scripting software.
Relevance
This paper addresses a fundamental challenge in making digital content accessible to deaf signers: the majority of deaf high school graduates in the U.S. have English reading levels at or below fourth grade (age 10), making ASL animation a critical accessibility technology. However, the quality of ASL animation systems has been limited by their inability to produce spatially inflected verbs — a pervasive grammatical feature without which ASL sentences appear stilted and are harder to understand. The demonstrated doubling of comprehension scores with inflected versus uninflected verbs underscores the real-world impact of this work. For accessibility practitioners, the key takeaway is that sign language accessibility requires far more than simply stringing together dictionary signs — the spatial grammar of sign languages is essential to comprehension and must be computationally modelled. The data-driven approach also allows for regional dialect variation, an important consideration given the diversity within the deaf community. The methodology represents a significant step toward making ASL animation systems practical for content delivery, education, and communication access.
Tags: American Sign Language · sign language animation · deaf accessibility · natural language generation · avatar · spatial inflection · machine learning · computer animation