Effect of Presenting Video as a Baseline During an American Sign Language Animation User Study

Pengfei Lu, Hernisa Kacorri · 2012 · Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '12) · doi:10.1145/2384916.2384949

Summary

This paper investigates a methodological question central to sign language animation research: what type of upper baseline should be used when evaluating synthesized American Sign Language (ASL) animations? The authors laboratory at CUNY had previously conducted multiple studies evaluating ASL animations by asking native Deaf signers to watch different animation versions and answer comprehension and subjective quality questions. In prior work, they used an animation of a virtual human character carefully created by a skilled human animator (who was a native ASL signer) as the upper baseline. However, other researchers had questioned why they did not use video of a real human signer instead. To quantify the effect of baseline choice, the authors replicated a 2010 study with one controlled change: replacing the animator-produced upper baseline with a video recording of a native ASL signer performing the same content. Both studies used 18 native ASL signers as participants. The study evaluated a mathematical model for synthesizing inflected ASL verb signs, where verb movements depend on spatial locations of subject and object around the signer. Participants watched animated stories containing inflected verbs, answered comprehension questions, and provided Likert-scale ratings for grammaticality, understandability, and naturalness. A side-by-side comparison task also had participants rate three versions of individual sentences simultaneously.

Key findings

The study tested four hypotheses with nuanced results. H1 (video gets higher comprehension than animation baseline) was not supported -- videos did not achieve higher comprehension scores than the carefully animated upper baseline, indicating that well-crafted animation can be equally comprehensible. H2 (baseline type does not affect comprehension of other stimuli) was partially supported -- changing the upper baseline did not significantly affect comprehension scores for the synthesized animations. H3 (video gets higher subjective scores than animation baseline) was supported -- human videos received significantly higher Likert-scale scores for grammaticality, understandability, and naturalness. Most notably, H4 (video baseline depresses subjective scores of other stimuli) was supported in the side-by-side comparison condition: when participants directly compared animations to human video, the synthesized animations received subjective scores 10-20% lower than when compared to an animated baseline. However, this depressive effect was not observed during sequential viewing with comprehension tasks, suggesting the effect depends on the evaluation context. Participant feedback also revealed that seeing a real human signer made the limitations of virtual characters more apparent, particularly regarding facial expressions and natural movement.

Relevance

This paper provides essential methodological guidance for anyone conducting user studies of sign language technology, including animation synthesis, avatar-based interpretation, and sign language recognition systems. The key practical takeaway is that the choice of upper baseline materially affects how participants evaluate the technology being tested, particularly for subjective quality ratings in side-by-side comparisons. Researchers must therefore carefully consider their goals when selecting baselines: video baselines better communicate current quality levels to general audiences, while animated baselines may be more appropriate for isolating specific linguistic features. The finding that comprehension scores remain stable regardless of baseline type is reassuring for researchers focused on understandability metrics. For the broader accessibility community, this work underscores that ASL animation technology, while not yet matching human signing quality, can achieve comparable comprehension levels -- an important milestone for making digital content accessible to Deaf individuals with lower English literacy.

Tags: sign language animation · American Sign Language · user study methodology · evaluation baselines · deaf accessibility · virtual humans