Extraction of Emotional Information via Visual Scanning Patterns: A Feasibility Study of Participants with Schizophrenia and Neurotypical Individuals

Joshua Wade, Heathman S. Nichols, Megan Ichinose, Dayi Bian, Esube Bekele, Matthew Snodgress, Ashwaq Zaini Amat, Eric Granholm, Sohee Park, Nilanjan Sarkar · 2018 · ACM Transactions on Accessible Computing · doi:10.1145/3282434

Summary

This paper presents MASI-VR (Multimodal Adaptive Social Intervention in Virtual Reality), a gaze-sensitive social skills training system designed to improve emotion recognition in adults with schizophrenia (SZ). Emotion recognition impairment is a core feature of schizophrenia that persists throughout the condition and significantly impacts social functioning, employment, and relationships. While previous research has established that individuals with SZ demonstrate atypical visual attention patterns when viewing faces—particularly reduced attention to emotionally salient regions like eyes and mouth—existing interventions have not fully leveraged eye tracking technology for real-time adaptation. The research team developed two complementary systems: ERRATA (Emotion Recognition and RATing Application), an assessment tool using 12 diverse animated avatars displaying seven emotions (joy, surprise, sadness, fear, contempt, anger, disgust) at varying intensities, and an extended version of MASI-VR featuring virtual social environments (cafeteria, bus stop, grocery store) where users practice conversations with avatars. The key innovation is gaze-sensitivity: avatar faces remain masked with a green overlay until the user directs their gaze toward the face, ensuring participants actually attend to facial expressions rather than avoiding them or focusing only on dialog boxes. A feasibility study compared 10 adults with confirmed SZ diagnosis (mean age 45.2, mean condition duration 24.78 years) to 10 neurotypical controls on emotion recognition accuracy, confidence, deliberation time, and visual attention patterns measured via eye tracking.

Key findings

Baseline comparisons confirmed patterns from prior literature: the SZ group demonstrated significantly lower recognition of medium-intensity fearful faces (37.5% vs 75% for controls, p=.040), spent significantly more time deliberating about emotions (9.2 seconds vs 7.5 seconds, p=.018), and showed significantly fewer fixations overall (131.95 vs 174.28, p=.023). Gaze heatmaps visually illustrated the attention disparity, with SZ participants showing reduced focus on the eye region compared to controls. Following five MASI-VR training sessions over three weeks, participants with SZ showed significant improvement in recognition of fearful faces specifically—rising from 50% to 58.33% accuracy, a 16.67% improvement (p=.027). While overall emotion recognition improved only nominally, the specific gain in fear recognition is notable given the well-documented neurobiological basis for fear recognition deficits in SZ. Post-training gaze metrics showed trends toward more neurotypical patterns, including decreased fixation duration and increased number of fixations, though these did not reach statistical significance with the small sample. Emotion intensity significantly affected both accuracy and confidence across groups, with low-intensity emotions being harder to recognize. Joy and surprise were identified most accurately, while anger and fear showed the lowest recognition rates.

Relevance

This research demonstrates the feasibility of using gaze-sensitive VR systems for cognitive rehabilitation in serious mental illness—an underexplored application of accessible technology. The work has implications beyond schizophrenia; the underlying principle that forcing visual attention to salient facial regions can improve emotion recognition could apply to other populations with social cognition difficulties, including autism spectrum disorder (for which MASI-VR was originally developed). For accessibility practitioners, this study illustrates how eye tracking can serve not just as an input modality for motor-impaired users, but as a therapeutic tool that shapes attention and behavior. The gaze-contingent masking approach—revealing content only when the user looks at the appropriate location—represents a design pattern applicable to attention training across various contexts. Limitations include the small sample size (n=20 total) and short training duration (3 weeks vs. the 5-10 weeks common in the literature). Future work should examine whether improvements transfer to real-world social functioning using established measures like the Bell Lysaker Emotion Recognition Task.

Tags: schizophrenia · emotion recognition · eye tracking · virtual reality · social cognition · gaze-sensitive systems · social skills training · mental health