Beyond Accuracy: Auditing Allocative Harms in Facial-Gesture Recognition for People with Motor Impairments

Siyu Zhang, Yelu Gu, Kirsten Cater, Oussama Metatla · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3791927

Summary

This paper challenges the conventional framing of facial-gesture recognition accuracy as a purely technical property, and reframes it as a sensorimotor alignment problem between user intention and algorithmic interpretation. The authors conducted a mixed-methods empirical study with 11 people with upper-limb motor impairments (PwM; conditions included SMA II, spinal cord injuries, cerebral palsy, stroke hemiplegia, poliomyelitis, and transverse myelitis) and 11 non-impaired controls, each performing 37 above-neck gestures (head, eyebrow, eyelid, eyeball, cheek, mouth, tongue, teeth, and jaw) in front of a 30 fps webcam, with short interviews after every trial and an 8-point self-evaluation of perceived success. Recognition was measured against two widely deployed pipelines - MediaPipe Face Mesh and OpenFace 3.0 using FACS Action Units - neither of which was fine-tuned. From this, the authors introduce FairGesture, a diagnostic auditing method that combines (1) a Perception Gap metric (normalised self-evaluation minus model confidence) to surface allocative harm, (2) landmark-based trajectory analysis along four motion dimensions (amplitude, direction, temporal dynamics, and activation region), and (3) qualitative coding of participant accounts of proprioception, directional certainty, and compensatory muscle use. They release open-source PG computation scripts, heatmap tools, a trajectory viewer, and an interpretive checklist.

Key findings

MediaPipe recognition accuracy averaged 56.2% (SD 26.8) for the motor-impaired group versus 74.8% (SD 16.5) for controls (p<0.0001), with statistically significant per-gesture disparities on 26 of 37 gestures after Benjamini-Hochberg correction. OpenFace 3.0 showed the same pattern (53.9% vs 77.0%, p=0.0049), indicating the problem is architectural rather than model-specific. In PwM, 75.8% of trials aligned with the model and - crucially - 87.1% of mismatches were overestimations where the user believed they had executed the gesture but the system failed to detect it; control-group mismatches were more evenly split. Failures concentrated in low-amplitude, asymmetric, directional, and slow-onset gestures (winks, eye-look directions, mouth-corner raises, jaw-forward). Trajectory analysis identified four recurring mismatch modes: reduced amplitude below detection thresholds, directional drift and left-right confusion, delayed or unstable timing, and compensatory region shifts (e.g., arching the back to fake a head tilt). Qualitatively, many participants could not confirm whether they had executed a gesture at all - proprioceptive uncertainty compounds the model failure. The authors argue these are ability-sensitive allocative harms: valid user inputs are systematically rejected, limiting access to interfaces, authentication, and assistive functionality.

Relevance

For teams building hands-free, face-based, or gesture-based accessibility interfaces, this is a concrete call to move past aggregate accuracy and audit for ability-sensitive allocative harm. The Perception Gap metric and the four motion dimensions (amplitude, direction, temporal dynamics, activation region) give developers actionable diagnostic axes. The proposed remediation strategies - targeted personalisation via small per-user micro-datasets (5-10 samples), strategic sampling informed by PG clusters, and tiering datasets by motor-complexity - are practical and avoid demanding that users conform to normative motion templates. Interface-side recommendations include mirror feedback widgets so users can see what the system sees, and intent-centric personalisation that treats a user's characteristic asymmetry as their canonical gesture. Limitations: the sample is 11 PwM (all Asian, adults, excluding older users and those with cognitive impairments), gestures are limited to discrete one-bit and zero-bit actions, and inter-rater agreement on qualitative coding was moderate (kappa 0.69). Still, the released auditing tooling lowers the barrier for other teams to replicate and extend.

Tags: facial gesture recognition · motor impairment · algorithmic fairness · allocative harm · accessibility · computer vision · sensorimotor · hands-free interaction · algorithmic audit