A Gesture Recognition Architecture for Sign Language
Annelies Braffort · 1996 · Proceedings of the Second Annual ACM Conference on Assistive Technologies (Assets '96) · doi:10.1145/228347.228364
Summary
This paper from LIMSI-CNRS in France presents a gesture recognition architecture specifically designed for sign languages, grounded in a detailed linguistic analysis of French Sign Language (LSF). The key insight is that sign languages differ fundamentally from oral languages in their use of simultaneous, spatially organized information — whereas speech is sequential, signs convey multiple streams of meaning simultaneously through five co-occurring parameters: configuration (hand shape), movement, orientation, location (relative to the body), and facial expression. Prior gesture recognition systems had treated signs like items in a predefined lexicon (similar to words in speech recognition), but this approach fails for sign languages because many signs — called classifiers — are created dynamically during discourse based on context rather than belonging to a fixed dictionary. The paper provides an extensive linguistic analysis of each parameter: configuration represents the shape of the action object (e.g., changing hand shape alone changes "drinking from a glass" to "drinking from a bowl" to "drinking from a cup"); movement differentiates verb aspects (brief = "see briefly", slow and large = "see duratively", repetition = "see often"); orientation conveys spatial relationships and verb conjugation (changing movement direction modifies who gives to whom); location integrates body parts into verbs and allocates spatial positions to discourse participants; and facial expression marks discourse mode and manner.
Key findings
The proposed architecture has two main modules: a Classifier module and an Interpreter module. The Classifier uses three parallel classification tools — one for conventional signs (from a predefined lexicon), one for hand configurations, and one for movement trajectories — running simultaneously because sign language parameters are co-occurring and cannot be processed sequentially. The Classifier uses Hidden Markov Models for the conventional sign recognition component. The Interpreter has two sub-tools: an Analyser that determines the type of gesture (directional verb, classifier, conventional sign) and retrieves relevant spatial data, and an Integrater that constructs a Virtual Scene representing the spatial arrangement of discourse participants and objects. This Virtual Scene is progressively built during recognition, storing the locations and orientations of entities mentioned in the discourse — essential for interpreting classifier signs and directional verbs that reference spatial context. The paper walks through a complete example of recognizing the sentence "I give an apple to the boy at my right," showing how four signs (garçon/boy, a vertical classifier, pomme/apple, and donner/give) are processed through classification, analysis, and integration to build the final interpretation with correct spatial relationships.
Relevance
This paper represents an early and linguistically sophisticated approach to sign language recognition that remains relevant to current research. Its key contribution is recognizing that sign language recognition cannot simply apply speech recognition techniques because sign languages operate on fundamentally different linguistic principles — simultaneity rather than sequentiality, spatial grammar, and productive (context-created) signs alongside conventional vocabulary. For accessibility practitioners, the work highlights the complexity of building sign language translation technology and why progress has been slow compared to speech recognition. The Virtual Scene concept — maintaining a spatial model of discourse context — anticipates approaches now being explored with deep learning systems. The distinction between conventional signs (learnable from dictionaries) and classifiers (created in context) remains a central challenge in sign language technology, as current systems still struggle with the productive, context-dependent aspects of sign language communication.
Tags: sign language · gesture recognition · French Sign Language · data glove · sign language recognition · machine learning · Hidden Markov Model · deaf accessibility · linguistics