Humsher: A Predictive Keyboard Operated by Humming

Ondrej Polacek, Zdenek Mikovec, Adam J. Sporka, Pavel Slavik · 2011 · The Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2011) · doi:10.1145/2049536.2049552

Summary

This paper presents Humsher, a novel text entry system controlled entirely by humming — short melodic and rhythmic vocal gestures distinguished by pitch (high/low) and duration (short/long). The system targets people with severe upper-limb motor impairments such as quadriplegia from spinal cord injury, stroke, or cerebral palsy, who cannot use conventional keyboards but retain the ability to produce vocal sounds. Unlike speech recognition, which struggles with accented speech and speech impairments, humming requires only the ability to produce tones at different pitches, making it language-independent and more robust. Humsher combines these vocal gestures with an adaptive language model based on Prediction by Partial Match (PPM) that learns from the user's input history and offers n-gram predictions — sequences of probable next characters — sorted by probability. The authors designed and compared four user interfaces. Three use dynamic layouts where character predictions change based on context: the Direct interface presents a 4-cell active column with the most probable n-grams selected via six vocal gestures; the Matrix interface arranges n-grams in a 4x4 grid where users select a column then a row; and the List interface presents eight n-grams in a scrollable list navigated with just three gestures (up, down, confirm). The fourth interface, Binary, uses a static alphabetical layout where characters are located via a modified binary search algorithm that splits the alphabet into probability-balanced halves at each step, requiring only three gestures.

Key findings

In a study with 17 able-bodied participants over four sessions, the Direct interface was significantly fastest at 14.4 characters per minute (CPM) mean, with the fastest user reaching 30 CPM. The Matrix interface achieved 11.8 CPM, List 13.0 CPM, and Binary 11.7 CPM. The Direct interface required the fewest vocal gestures per character (GPC: 1.8) compared to Binary (3.4), List (3.5), and Matrix (1.9). However, most novice users preferred the Binary interface for its static, predictable layout and simple gestures, despite it being slower. Expert users preferred dynamic layouts where multiple characters could be entered at once. Four case studies with motor-impaired participants validated the system's accessibility: Participant 1 (30-year-old IT specialist, quadriplegic) reached 15-30 CPM across interfaces and found Humsher faster and more responsive than his cell phone; Participant 2 (19-year-old, quadriplegic) reached 21 CPM with Direct interface; Participant 3 (58-year-old woman, cerebral palsy, declining motor function) reached 8-15 CPM across interfaces and expressed interest in purchasing it, commenting "This is much better than speech for me"; Participant 4 (51-year-old, quadriplegic, 22 years post-accident) reached 14 CPM with the modified List interface. Notably, the system requires no additional hardware beyond a standard PC with microphone, and participants reported minimal fatigue even after extended use.

Relevance

Humsher fills an important niche in the assistive technology landscape for people who cannot use their hands but can produce vocal sounds. While other text entry systems for this population exist — such as Dasher (up to 100 CPM but requires eye trackers), the NVVI Keyboard (16 CPM), and CHANTI (15 CPM) — Humsher achieves comparable or better speeds (14-22 CPM for disabled users) with no specialised hardware. For accessibility practitioners, the key design insight is that different users benefit from different interfaces: novices and those with limited vocal control prefer the predictable Binary interface, while experienced users prefer faster dynamic layouts. The adaptive language model means the system improves with use, learning individual writing patterns. The participant feedback highlights that even modest typing speeds can represent a significant improvement over a user's current methods, and that fatigue management is a critical factor — one participant noted vocal chord tiredness after 40 minutes. The approach could be extended to other non-verbal vocal interactions beyond text entry.

Tags: text entry · alternative input · assistive technology · motor disability · non-verbal vocal interface · predictive keyboard · adaptive language model · quadriplegia · cerebral palsy · humming