Towards Fully Automated Motion Capture of Signs -- Development and Evaluation of a Key Word Signing Avatar
Simon Alexanderson, Jonas Beskow · 2015 · ACM Transactions on Accessible Computing (TACCESS) · doi:10.1145/2764918
Summary
This research develops a cost-effective method for capturing sign language motion to animate a signing avatar for the Tivoli project—a game-based learning environment teaching Key Word Signing (KWS) to children with communication disabilities including developmental disorders, language disorders, and autism. Unlike native sign language users, KWS users employ signing to complement spoken language, making clear isolated signs particularly important for learning. The technical challenge is that sign language requires simultaneously capturing high-fidelity motion of body, face, hands, and fingers. Traditional approaches use either expensive instrumented gloves (requiring lengthy calibration) or optical motion capture (prone to finger marker occlusion). The researchers propose a novel dual-sensor approach combining a 16-camera Optitrack system with low-cost five-sensor bend-sensing gloves. Optical markers capture the body and proximal hand joints reliably, while glove sensors estimate distal finger joints where marker occlusion is most severe. A corpus of 150 Swedish Sign Language signs was recorded from a fluent hearing signer, with each sign performed slowly (4 seconds average) for pedagogical clarity. The data was transferred to a highly stylized cartoon avatar designed to appeal to the young target audience, with special attention to hand and mouth proportions—key reference points for signing.
Key findings
The occlusion analysis revealed that marker gaps occur primarily at distal finger joints: proximal markers had gap rates of 0.03-0.58%, while distal markers and fingertips showed 1.4-12.8% gaps. Two-handed signs had more occlusion than one-handed signs. This detailed characterization of occlusion patterns—rarely reported in the literature—validates the dual-sensor approach of using optical capture where it works well and glove sensors where it does not. The proposed 11-marker method (using only markers least prone to occlusion, plus glove data) achieved comparable intelligibility to the full 21-marker method requiring extensive manual cleanup. In user evaluation with 25 sign-language-fluent participants (19 deaf, 6 hearing), 73% of signs were correctly identified and 14% were identified as homonyms (same hand shape but different meaning due to mouth movement). Only 13% were incorrect. There was no significant difference in accuracy or perceived clearness between the reduced and full marker conditions. Manual postprocessing for the 10-minute dataset took approximately one week—substantially less than commercial setups where hand data is often skipped entirely and animated manually afterward.
Relevance
This work addresses a practical bottleneck in signing avatar development: the prohibitive cost and labor of capturing accurate hand and finger motion. The dual-sensor approach using commodity hardware (total system cost far below specialized gloves alone) makes data-driven sign animation accessible to more projects. The finding that reduced marker sets with automatic glove calibration achieve comparable intelligibility suggests that perfect accuracy may not be necessary for many AAC applications. For practitioners developing signing avatars or sign language learning tools, the detailed occlusion analysis provides guidance on camera placement and marker configuration. The evaluation methodology—testing with both deaf and hearing sign-fluent users on isolated sign identification—offers a replicable framework for assessing animated sign quality. A key limitation noted by participants is the importance of facial expression and mouth movement in sign language; the current system captures minimal facial data, which caused the homonym confusions. Future AAC avatar work should prioritize non-manual features alongside hand motion.
Tags: sign language · motion capture · signing avatar · augmentative and alternative communication · Key Word Signing · Swedish Sign Language · animation · virtual characters · children