FEPS: A Sensory Substitution System for the Blind to Perceive Facial Expressions
Md. Iftekhar Tanveer, A.S.M. Iftekhar Anam, A.K.M Mahbubur Rahman, Sreya Ghosh, Mohammed Yeasin · 2012 · Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2012) · doi:10.1145/2384916.2384956
Summary
This demonstration paper presents FEPS (Facial Expression Perception through Sound), a visual-to-auditory sensory substitution system that enables blind individuals to perceive their conversation partner's facial expressions through sound. The inability to perceive facial expressions is a significant barrier for visually impaired people in social settings, since over 80% of communication is carried through non-verbal channels. The system uses a cellphone worn around the neck that captures video of the interlocutor's face and transmits frames to a server for analysis. The server performs face tracking, extracting landmark points to calculate four key facial features: eyebrow height (BrowH), eye openness (EyeO), lip height (LipH), and lip corner distance (LipCD). These features capture movements of the eyelids, eyebrows, lips, and mouth. Each facial feature is then mapped to a distinct auditory feedback — the sonification approach converts the detected facial movements into sounds that the blind user hears through the phone. The entire processing pipeline takes approximately 500 milliseconds with stable network connectivity.
Key findings
FEPS makes a critical design decision that distinguishes it from earlier systems like iFeeling and Team F.A.C.E.: it sonifies facial movements directly rather than attempting to infer and communicate emotions. The authors argue this is superior for three reasons: (1) expression-to-emotion mapping varies across cultures, introducing bias; (2) the multitude of possible emotions derivable from limited facial movements means that providing feedback on only a subset of inferred emotions under-utilizes the system's potential; and (3) building a robust natural emotion prediction system is extremely difficult due to lack of ground truth data. By providing direct feedback on facial muscle movements rather than interpreted emotions, FEPS gives users richer, less biased information and lets them draw their own conclusions about the person's emotional state. The system's usability was validated through a user study that confirmed blind users' ability to understand facial expressions through the auditory feedback. The design builds on the finding that focused sensory substitution devices (extracting specific information) are more commercially successful and usable than systems attempting to map the entire visual field.
Relevance
This paper addresses a profoundly important but often overlooked aspect of accessibility: enabling blind people to access the non-verbal social cues that sighted people take for granted. Most accessibility research focuses on navigation, reading, or task completion, but social communication and emotional connection are equally vital to quality of life. For accessibility practitioners, the design philosophy of conveying raw perceptual data rather than interpreted meanings is instructive — it respects user autonomy by letting individuals form their own judgments rather than relying on potentially biased algorithmic interpretations. The mobile-based architecture (phone camera + server processing) anticipated the modern paradigm of smartphone-based assistive technology. As facial recognition and expression analysis have advanced significantly since 2012, the core concept of sonifying facial movements for blind users remains relevant and increasingly feasible for real-time mobile applications.
Tags: visual impairment · blind users · sensory substitution · sonification · facial expression recognition · non-verbal communication · social interaction · face tracking · computer vision · assistive technology