A Portable Hong Kong Sign Language Translation Platform with Deep Learning and Jetson Nano

Zhenxing Zhou, Yisiang Neo, King-Shan Lui, Vincent W.L. Tam, Edmund Y. Lam, Ngai Wong · 2020 · Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/3373625.3418046

Summary

This demonstration paper presents a portable platform for translating Hong Kong Sign Language (HKSL) into spoken language using deep learning and edge computing hardware. The system addresses a significant communication gap: Hong Kong has over 155,000 deaf or hard of hearing residents who use HKSL, yet fewer than 100 registered sign language interpreters serve the entire region. HKSL has distinct linguistic rules from Chinese and Cantonese, making it difficult for hearing residents to understand without professional training. The platform consists of two components: a mobile iOS application that captures and preprocesses sign language video, and an NVIDIA Jetson Nano device that runs a pre-trained deep learning model for recognition. The mobile app selects the clearest frames from recorded video, constructs video clips, and transmits them over a local WiFi network to the Jetson Nano. The Jetson Nano then processes the clips using a (3+2+1)D ResNet model — a hybrid architecture combining 2D and 3D convolutional neural network layers — and returns the recognized sign word to the mobile app for display. The authors created their own HKSL dataset containing 1,500 videos across 45 common sign words, with at least 30 videos per word recorded from different signers. The choice of edge computing over cloud-based processing was deliberate, motivated by concerns about network latency, data costs, and privacy implications of uploading personal sign language videos to remote servers.

Key findings

The platform achieved 93.3% recognition accuracy on a test set of 30 randomly selected sign videos, with an average total translation time of 5.82 seconds from button press to result display. The processing pipeline breaks down as follows: 0.22 seconds for phone-side preprocessing, 0.54 seconds for round-trip network transmission, and 5.06 seconds for recognition on the Jetson Nano. The recognition step accounts for the vast majority of the processing time, reflecting a trade-off between accuracy and speed inherent in using 3D convolution operations. The underlying (3+2+1)D ResNet model achieved 94.6% accuracy on the full HKSL dataset in prior work. The authors note that processing time could be reduced by using more powerful edge hardware, or by personalizing the model to a single signer rather than building a generic recognizer. The system currently operates at word level, recognizing individual sign words rather than continuous sentences.

Relevance

This work demonstrates a practical approach to bridging the communication gap between deaf sign language users and the hearing population, moving beyond theoretical recognition research into a deployable system. The edge computing approach is particularly relevant for accessibility applications where privacy matters — users do not need to upload sensitive video data to the cloud. However, the system has notable limitations: a 45-word vocabulary is extremely small for real-world use, the 5.82-second translation time is too slow for fluid conversation, and the platform currently requires a separate Jetson Nano device. The one-directional translation model (sign-to-spoken language only) also reflects a common bias in sign language technology research that positions hearing people as the audience rather than empowering deaf users. Despite these constraints, the work provides a useful proof of concept for portable, privacy-preserving sign language recognition and highlights the potential for edge AI in accessibility applications.

Tags: sign language recognition · deep learning · edge computing · mobile accessibility · deaf and hard of hearing · assistive technology · Hong Kong Sign Language