A Parametric Approach to Sign Language Synthesis

Amanda Irving, Richard Foulds · 2005 · Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '05) · doi:10.1145/1090785.1090835

Summary

This paper describes a parametric approach to synthesizing American Sign Language (ASL) using a commercially available human avatar (UGS Jack) driven by kinematic parameters. The system addresses the fundamental challenge that signed and spoken languages are not mutually understood, and human interpreting services are often of limited availability. Rather than relying on concatenated video clips or motion capture data — approaches that produce lifelike results but require enormous resources to build inventories of 5,000 to 10,000 signs — the authors encode signs as compact parameter sets (approximately 600 bytes each) based on William Stokoe's foundational categorization of sign language into handshape, location, and movement characteristics. The system includes a graphical sign editor that allows users to enter sign descriptions from printed dictionaries or personal experience, specifying dominant and non-dominant hand locations, handshapes, movements, orientations, and end-effector positions. The avatar's 20 phalangeal degrees of freedom define handshapes, while Jack's inverse kinematics engine handles the primary arm movements between specified target locations and orientations. Text input is accepted in real time via keyboard, speech recognizer, or text file, with each word matched against the sign dictionary and fingerspelled if not found.

Key findings

The system produces signs that are described as intelligible and reasonably human-like, with each new sign taking approximately 5 minutes to create and test using the sign editor. This efficiency is a significant advantage over motion capture approaches, which require access to experienced human signers and substantial storage for each sign. The team was working to build an initial inventory of the 5,000 signs defined in Random House Webster's Concise American Sign Language Dictionary, plus the 26 fingerspelling letters and digits 0-9. Coarticulation between signs — the natural blending of movement from one sign to the next — is handled by inheriting the final end-effector position and orientation of the previous sign as the starting location of the following sign. The authors note that a formal intelligibility study comparing their synthetic signs with motion-captured animations and human signer videos was planned to determine whether more sophisticated coarticulation would be necessary. The compact parametric representation (600 bytes per sign versus megabytes for motion capture data) makes the approach highly scalable for building large sign inventories.

Relevance

This work represents an important step in the ongoing effort to make signed language content available through technology, supplementing the limited availability of human interpreters. The parametric approach offers a practical trade-off: while motion capture produces more lifelike signing, the ability to create new signs in 5 minutes from dictionary descriptions makes it feasible to build comprehensive sign inventories without access to native signers for each recording session. For accessibility practitioners, this research highlights both the promise and challenges of avatar-based sign language systems. The compact parametric encoding means sign databases can be distributed easily, and the text-to-sign pipeline (keyboard or speech input, dictionary lookup, fingerspelling fallback) provides a model for real-time translation tools. However, the intelligibility of synthesized signs relative to human signing remains an open question that must be resolved through user studies with Deaf users before such systems can be deployed as reliable communication aids.

Tags: sign language · sign language synthesis · signing avatar · American Sign Language · animation · deaf accessibility · fingerspelling · motion capture