Computer Vision-based Methodology to Support AAC
Rúbia Eliza de Oliveira Schultz Ascari, Roberto Pereira, Luciano Silva · 2020 · ACM Transactions on Accessible Computing · doi:10.1145/3408300
Summary
This paper presents a methodology for supporting augmentative and alternative communication (AAC) through personalized gestural interaction using computer vision and machine learning. The authors developed the PGCA (Personal Gesture Communication Assistant) system, which enables people with motor and speech impairments to communicate through customized gestures captured via standard webcam. The system consists of three main areas: a Caregiver Area for creating personalized gesture datasets, a User Area for gesture-based interaction, and a Communication Boards Area for configuring communication symbols. Two machine learning approaches were evaluated—Support Vector Machine (SVM) with Histogram of Oriented Gradients (HOG) descriptors and Convolutional Neural Networks (CNN) using Transfer Learning with Inception V3. The research also compared two motion representation techniques: conventional Motion History Image (MHI) and a novel Optical Flow-based MHI (OF-MHI). Three evaluation steps were conducted: testing with volunteers without disabilities, validation using the public Keck Gesture Dataset, and real-world testing with students who have motor and speech impairments in a school environment.
Key findings
The SVM-based classifier combined with OF-MHI motion representation consistently outperformed the CNN approach across most test scenarios. Both classifiers achieved over 94% accuracy when trained on personalized datasets from volunteers without disabilities. In the school-based evaluation with students who have cerebral palsy, four of seven selected students successfully created usable personalized datasets, while three could not participate due to comprehension limitations, lack of voluntary gesture control, or disinterest. Results varied significantly between students—some achieved perfect recognition accuracy while others showed 68-87% accuracy—reflecting the heterogeneous nature of motor disabilities. The research demonstrated that effective gesture recognition is possible with relatively small training datasets (as few as 10-17 samples per gesture class). Involuntary movements and spastic gestures presented challenges, with some students requiring gesture sets that avoided similar motion patterns to prevent classifier confusion. The personalized approach proved essential, as each student's unique motor capabilities required individually tailored gesture vocabularies.
Relevance
This research demonstrates the feasibility of low-cost, personalized AAC solutions using standard webcams and machine learning, making gesture-based communication accessible without expensive specialized hardware. The work is particularly relevant for practitioners supporting students with cerebral palsy and similar conditions affecting motor and speech abilities. The methodology acknowledges that motor disabilities create highly individual profiles of capabilities and limitations, requiring personalized rather than one-size-fits-all approaches. The findings highlight important practical considerations: caregiver involvement is essential for dataset creation, users need sufficient comprehension to understand gesture-system mapping, and environmental factors (lighting, background, camera positioning) significantly affect recognition accuracy. The system's reliance on explicit gesture input limits applicability for users with severe cognitive impairments—a limitation the authors suggest could be addressed through future integration with brain-computer interfaces and more autonomous movement pattern interpretation.
Tags: AAC · augmentative and alternative communication · computer vision · machine learning · gesture recognition · cerebral palsy · motor impairment · speech impairment · personalization