Using Convolutional Neural Networks for Visual Sign Language Recognition: Towards a system that provides instant feedback to learners of sign language

Rami Aldahir, Ronald R. Grau · 2024 · Proceedings of the 21st International Web for All Conference (W4A) · doi:10.1145/3677846.3677848

Summary

This short paper presents a prototype system that uses computer vision and a convolutional neural network (CNN) to recognize finger-spelled letters in British Sign Language (BSL), providing real-time feedback to learners. The system addresses a gap in sign language instruction: while online videos and tutorials have made learning materials more accessible, they lack the real-time feedback that is essential for developing correct psycho-motor skills in signing. The authors specifically targeted BSL rather than the more commonly researched American Sign Language (ASL), noting that BSL fingerspelling uses both hands while ASL uses one. The system uses the YOLOv5 (You Only Look Once) object detection model with transfer learning, trained on a custom dataset of 934 images representing BSL letter gestures. Images were augmented with variations in rotation, brightness, noise, and exposure to handle different skin tones, hand sizes, and environments. The resulting tool features a TKinter-based GUI where users see their webcam feed alongside reference images, attempt to sign randomly selected or user-chosen letters within a 10-second timer, and receive a feedback score displayed on screen.

Key findings

A small pilot study with two participants — one BSL expert (Person A) and one complete novice (Person B) — tested the system across two phases. In Phase 1, participants signed all 26 letters with scores recorded across three tests. In Phase 2, after 30 minutes of practice using the tool, they repeated the tests. The expert signer maintained consistent scores around 0.73-0.76 across both phases, confirming the recognition system's reliability for correct signing. The novice improved from an average accuracy of 45% in Phase 1 to 70% in Phase 2, approaching the expert's scores and exceeding expectations for such a short practice period. Skin tone did not impact recognition performance due to the brightness augmentations applied during training. Key challenges included balancing model speed and accuracy to avoid overfitting on the small dataset, and ensuring the model could generalize across different users' hand shapes. The system is open-source and available on GitHub. The authors note that most current research focuses on ASL, highlighting the need for more work supporting other sign languages used worldwide.

Relevance

This prototype demonstrates the feasibility of using commodity hardware (a standard webcam) and transfer learning to create affordable, self-paced sign language learning tools — important because sign language instruction traditionally requires access to qualified human trainers or deaf educators, which can be a barrier to wider adoption. For accessibility practitioners, the work highlights the often-overlooked need for tools that help hearing people learn to communicate with deaf individuals, not just tools for deaf people to access hearing content. The study is very preliminary — only two participants, a basic GUI, and limited to individual letter recognition rather than words or phrases. Future work planned includes word prediction from letter sequences (combining CV with NLP), support for additional sign languages, and larger-scale user studies. Despite its early stage, the project provides a replicable template for developing similar tools for other sign languages and could evolve into a practical learning aid that supports greater inclusion of deaf communities.

Tags: sign language · British Sign Language · computer vision · deep learning · fingerspelling · deaf accessibility · education · assistive technology