Implementation and Evaluation of Animation Controls Sufficient for Conveying ASL Facial Expressions

Hernisa Kacorri, Matt Huenerfauth · 2014 · Proceedings of the 16th International ACM SIGACCESS Conference on Computers & Accessibility (ASSETS '14) · doi:10.1145/2661334.2661387

Summary

This ASSETS 2014 short paper (2 pages) reports an infrastructure contribution to sign-language-animation research: the authors extended an existing virtual human character (Max, on the open-source EMBR animation platform) with a full set of MPEG-4 Facial Action Parameter (FAP) controls and ran a user study to confirm that the controls were expressive enough to convey linguistically meaningful ASL facial expressions. The problem they are addressing is that ASL facial expressions are grammatically required — raising eyebrows and tilting the head forward turns a declarative statement into a yes-no question, furrowing brows and tilting forward marks a wh-question, a brow furrow plus head-shake marks negation — and a signing-avatar system that cannot render these markers cannot produce grammatically valid ASL. The authors wanted a face-control parameterisation that was (a) proportion-invariant so it could be reused across characters of different face shapes, (b) sufficient to control eyes, eyebrows, mouth, and head, and (c) based on a published standard so other researchers could reuse their work. MPEG-4 FAP, with its 68 facial feature points displaced by normalised scaling factors, met all three criteria. A Visage face tracker was used to extract MPEG-4-compatible data from video of a human signer, which was then replayed on Max. The evaluation used a between-subjects design with 14 native ASL signers watching 18 short-story animations (6 yes-no questions, 6 wh-questions, 6 negations) in two conditions: driven (face + head + torso driven by a human recording) vs neutral (only the hand signs animated, face/head/torso static).

Key findings

The new MPEG-4 FAP face controls worked: driven animations received significantly higher scores than neutral animations on both whether participants noticed the correct facial expression (Mann-Whitney p < 0.00014) and on comprehension-question accuracy (t-test p < 0.000001). The neutral condition (a static face during signing) failed to convey the grammatical facial markers — confirming both that the facial expressions are essential to ASL comprehension and that the new FAP parameterisation of Max is capable of carrying the signal. A professional artist also modified Max's surface mesh so skin wrinkles form automatically under FAP displacement, and a lighting scheme was tuned to make those wrinkles visible — important because ASL lip/cheek/brow detail is what viewers actually track when decoding facial grammar. The stimuli and evaluation questions were publicly released (Huenerfauth & Kacorri 2014, LREC SLTAT workshop) so other researchers can evaluate their own facial-expression synthesis methods against the same corpus.

Relevance

For accessibility practitioners and researchers building signing-avatar content, the contribution is twofold: an empirical demonstration that MPEG-4 FAP is sufficient (not just necessary) for ASL facial grammar, and a reusable open-source testbed (Max plus public stimuli and questions) that lowers the barrier to further work on automatic facial-expression synthesis. The methodological approach — driving a virtual signer from human face-tracker recordings as a gold standard, and comparing against a neutral baseline with the same hand signs — became a template for later evaluation studies in this subfield. Limitations are substantial given the paper format: sample size of 14, single virtual character, human-driven facial data rather than automatic synthesis (the harder problem), and no comparison against video of a human signer. The paper is most useful as a reference pointer for anyone evaluating ASL signing avatars or specifying face-control requirements in a generation pipeline.

Tags: american sign language · sign language animation · signing avatar · deaf and hard of hearing · facial expression · animation · accessibility research

Standards referenced: MPEG-4 · ISO/IEC 14496-2