Fast Human-Computer Interaction by Combining Gaze Pointing and Face Gestures

David Rozado, Jason Niu, Martin Lochner · 2017 · ACM Transactions on Accessible Computing · doi:10.1145/3075301

Summary

This paper presents FaceSwitch, an open-source multimodal accessibility system that combines eye gaze tracking for cursor positioning with facial gesture recognition for triggering actions. The system addresses limitations of existing gaze-based interaction methods: gaze-only interfaces suffer from false activations and long dwell times, while gaze with a single mechanical switch provides only one degree of freedom and requires physical hardware that some users cannot operate. FaceSwitch uses a Tobii eye tracker for gaze positioning and a standard webcam with the Beyond Reality Face Tracker to detect four facial gestures: opening the mouth, raising eyebrows, smiling, and twitching the nose. Each gesture can be mapped to different commands (left click, right click, scroll down, page up, etc.) through a customizable GUI. The system tracks 66 facial landmark points and monitors distances between specific point pairs to detect gestures, using a calibration procedure to normalize for varying distances between face and camera. The target population includes people with motor disabilities from conditions such as spinal cord injury, cerebral palsy, muscular dystrophy, multiple sclerosis, and motor neuron disease. Because eye movements and facial musculature are often preserved even when limb function is severely affected, this approach offers an alternative for users who cannot operate mechanical switches. The software is freely available on GitHub to maximize accessibility impact.

Key findings

A rigorous user study with 20 participants (able-bodied, acknowledged as a limitation) yielded quantitative performance data across five experiments. Individual gesture recognition accuracy ranged from 93-96% for mouth opening and nose twitching to 81-87% for raising eyebrows and smiling. However, monitoring multiple gestures simultaneously reduced accuracy and increased false positives; three gestures proved to be the practical maximum for reliable recognition. For target acquisition tasks, average times were: mouse 1,600ms, gaze with FaceSwitch 2,500ms, gaze with mechanical switch 2,500ms, and gaze-only 3,900ms. When completing a sequence of 15 realistic desktop tasks (opening applications, clicking menus, scrolling pages), completion times were: mouse 72 seconds, gaze with FaceSwitch 93 seconds, gaze with mechanical switch 116 seconds, and gaze-only 171 seconds. The FaceSwitch modality was significantly faster than both gaze-only and gaze with single switch approaches. Critically, a learning study over eight sessions showed significant improvement with FaceSwitch, with task completion times approaching mouse performance by the fifth session. No learning effect occurred with mouse interaction, confirming the improvement was specific to mastering the novel FaceSwitch modality. After eight sessions, participants averaged 83 seconds with FaceSwitch versus 68 seconds with mouse.

Relevance

FaceSwitch demonstrates that facial gestures detected via standard webcams can effectively replace mechanical switches in gaze-based accessibility systems, offering multiple advantages: no physical hardware to position or maintain, multiple degrees of freedom (three gestures versus one switch), and comparable performance to traditional gaze-with-switch interaction. For practitioners and organizations serving users with motor disabilities, FaceSwitch represents a practical, cost-effective option. The software is open source (GitHub: accessibilitysoftwarehub/FaceSwitch), requires only a webcam and compatible eye tracker, and can be customized through a graphical interface. The finding that users approach mouse-level performance after about five practice sessions suggests that initial learning curves should not discourage adoption. The study's limitation—testing only able-bodied participants—is acknowledged by the authors, who argue that most forms of below-neck motor impairment should not affect gaze or facial muscle control. However, future research with the actual target population is needed to validate these results. The system may require adaptation for users with conditions affecting facial musculature or ocular motor control.

Tags: eye tracking · gaze interaction · face tracking · facial gestures · motor disabilities · multimodal interaction · alternative input · open source · assistive technology