iSET: Enabling In Situ & Post Hoc Video Labeling
Mish Madsen, Abdelrahman N. Mahmoud, Youssef Kashef · 2009 · Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '09) · doi:10.1145/1639642.1639710
Summary
This demonstration paper presents the interactive Social-Emotional Toolkit (iSET), a portable video recording and annotation system designed to support behavioral interventions for people with Autism Spectrum Disorders (ASD). People with ASD frequently struggle to recognize and interpret facial expressions and emotional states during social interactions, and behavioral interventions often use video of unfamiliar actors to teach affect recognition. iSET takes a different approach by enabling real-time capture and labeling of naturalistic social interactions in educational settings. The system consists of a Samsung ultramobile computer with an outward-facing camera running a custom Visual C++ interface that allows caregivers or observers to press color-coded buttons to label emotional expressions (such as "happy," "sad," "surprised," "agreeing," "concentrating") as they occur during live recording. The tool was developed iteratively through participatory design sessions with students with ASD and their caregivers in school settings. Videos can also be reviewed and re-labeled post hoc on any portable or desktop platform, with a timeline view showing color-coded markers for each labeled expression.
Key findings
The iSET system collected over 15 hours of video containing more than 4,000 short segments of single emotions, automatically segmented through use of the labeling system. Both students with ASD and their caregivers were enthusiastic users of the system. The tool was used in a 15-week intervention where students labeled their own and others' emotional expressions during social interactions each week. The final evaluation metric involved clips demonstrating each of 8 emotions (validated by inter-rater consensus) for each of the 21 participants, with pre- and post-testing to measure improvement in identifying emotions in themselves, teachers, and peers. The post hoc review feature proved particularly valuable — allowing the recorded participant to watch alongside labelers and engage in dialogue about how their facial expressions were perceived by others, creating a reflective learning opportunity about the function of emotional expression in social interaction.
Relevance
iSET addresses a real gap in autism intervention tools: existing video recording systems were not designed for the naturalistic, in-context capture needed for social-emotional interventions. By making annotation possible during live interactions rather than only through painstaking post hoc review, the tool dramatically reduces the effort required to build annotated video corpora for behavioral research and intervention. The participatory design approach — developing the interface with ASD students and caregivers — ensured the system was intuitive and appropriate for its users. For accessibility practitioners, iSET demonstrates how relatively simple technology (a portable computer with a camera and customizable buttons) can enable powerful interventions when designed around actual use contexts. The tool also has broader applications beyond autism, including usability assessments and interaction studies where real-time event labeling of video is valuable.
Tags: autism spectrum disorder · video annotation · affective computing · social-emotional learning · behavioral intervention · participatory design · assistive technology