Representing Coordination and Non-Coordination in an American Sign Language Animation

Matt Huenerfauth · 2005 · Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '05) · doi:10.1145/1090785.1090796

Summary

This paper introduces the Partition/Constitute (P/C) Formalism, a new computational representation for encoding the multichannel structure of American Sign Language (ASL) performances to support English-to-ASL machine translation and animation. ASL is a full natural language distinct from English, used by approximately 500,000 deaf people in the United States, many of whom have limited English literacy (the majority of deaf U.S. high school graduates read at only a fourth-grade level). ASL performances are inherently multichannel: a signer simultaneously produces manual signs with both hands while also conveying meaning through non-manual signals (NMS) including facial expressions, eye gaze, head movements, and body posture. These channels may be coordinated (temporally aligned and semantically linked) or non-coordinated (operating independently). Traditional NLP representations — linear strings and syntax trees — fail to capture these multichannel relationships. The paper demonstrates how previous approaches (feature propagation, naive 3D trees, bracketing diagrams) each have significant limitations, particularly in over-specifying coordination relationships or failing to represent non-coordination between channels.

Key findings

The P/C Formalism represents an ASL signal as a tree structure where nodes can either partition (split a signal across time or channels in a non-overlapping way, delegating responsibility to children) or constitute (specify temporal sequences of sub-phenomena that compose their parent). Partitioning nodes can optionally specify coordination or non-coordination relationships between their children across different channels. This allows the formalism to encode precisely when two channels must be temporally aligned (coordination) and when their timing is independent (non-coordination), without over-specifying relationships that do not matter. The paper demonstrates P/C's expressiveness through detailed examples including a sentence with negative headshake and eye gaze channels, and classifier predicates — an important ASL phenomenon where signers use handshapes to describe the position, movement, shape, or contour of objects in space. The P/C representation of classifier predicates successfully encodes complex multi-channel relationships (dominant hand location, non-dominant hand location, eye gaze, handshapes) that previous formalisms could not adequately represent. Compared to SYNC trees (used in speech/gesture generation) and BEAT (a gesture animation toolkit), P/C better encodes temporal relationships and supports the hierarchical structure needed for ASL's multichannel nature.

Relevance

This work addresses a fundamental accessibility challenge: making information available in ASL for deaf individuals who have limited English literacy. English-to-ASL machine translation could make TV closed captions, teletype phones, and computer interfaces accessible in a user's primary language. The P/C Formalism's significance extends beyond ASL to any sign language, as all sign languages share the multichannel property of simultaneous manual and non-manual signals. For accessibility practitioners, this research highlights that sign language is not simply a gestural encoding of English — it has its own grammar, spatial structure, and simultaneous information channels that require fundamentally different computational representations. The work also demonstrates why generating natural-looking sign language animation is far more complex than text-to-speech synthesis, requiring coordination of multiple body channels with precise temporal relationships rather than a single sequential output stream.

Tags: American Sign Language · sign language animation · machine translation · natural language processing · deaf accessibility · signing avatar · multimodal generation · gesture generation