Non-speech Input and Speech Recognition for Real-time Control of Computer Games
Adam J. Sporka, Sri H. Kurniawan, Murni Mahmud, Pavel Slavik · 2006 · Proceedings of the 8th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '06) · doi:10.1145/1168987.1169023
Summary
This paper compares two acoustic input methods — speech recognition and non-speech humming — for controlling the arcade game Tetris in real time. The motivation is that while turn-based and strategic games are not greatly affected by motor impairments (they allow time for deliberation), arcade games require rapid, time-critical commands that disadvantage people who cannot use keyboards. Speech recognition and non-speech input differ in two key ways: response delay (speech recognisers must wait for an utterance to finish before processing, while non-speech sounds can be interpreted on-the-fly) and domain size (speech can trigger a wider range of commands, while non-speech features like pitch, volume, and timbre offer fewer distinct signals). The humming interface maps pitch changes to game commands: a falling tone means "left," a rising tone means "right," sustained flat tones of different lengths map to "turn" and "drop," and silence means "stop." These gestures are detected by a sound analyser and gesture recogniser using pitch autocorrelation, with a minimum pitch change of two semitones required to filter out unintentional fluctuations. The study comprised a qualitative pilot (7 participants) followed by quantitative tests (12 participants) measuring lateral speed and episode accuracy.
Key findings
In the lateral speed test, humming-controlled movement averaged 3.5 cells per second — approximately 2.5 times faster than speech-controlled movement at 1.4 cells per second. Speech recognition performance degraded significantly with increasing distance (ANOVA F=32.1, p<.001), while humming performance was faster at longer distances and showed no degradation — a finding the authors attribute to the continuous nature of humming versus the repeated discrete commands needed for speech. In the episode accuracy test across three difficulty levels (slow, medium, fast), humming was significantly more accurate than speech at all levels (p<0.05). At the fast level, humming accuracy averaged 0.50 while speech averaged 0.22 (keyboard baseline: 0.90). The accuracy advantage of humming over speech grew more pronounced at higher difficulty levels. The pilot study revealed that users needed visual feedback showing their humming pitch and the recognised gesture, and required more training time for humming than speech. After implementing these changes (pitch profile display and gesture recognition feedback), the revised interface was positively received. Participants commented that the game would be entertaining and highly appreciated by people with motor impairments, and that humming provided more precise control than speech.
Relevance
This research makes two important contributions to accessible gaming and voice-based interaction. First, it provides empirical evidence that non-speech acoustic input (humming) outperforms speech recognition for real-time, continuous control tasks — a finding with implications beyond gaming for any interface requiring rapid, precise cursor or object movement. The theoretical advantage of non-speech input (no waiting for utterance completion) is confirmed experimentally and quantified. Second, the paper highlights game accessibility as a legitimate and important area of assistive technology research. Computer games are significant for social participation, entertainment, and quality of life, and excluding people with motor impairments from arcade and action games represents a real accessibility gap. The humming interface is particularly interesting because it requires no specialised hardware (just a standard microphone), produces no recognisable words (preserving privacy in shared spaces), and can be sustained comfortably over extended periods. The finding that visual feedback of the acoustic signal was essential for usability is an important design lesson for any non-speech acoustic interface.
Tags: game accessibility · speech recognition · non-speech input · motor impairment · voice control · real-time interaction · arcade games · alternative input
Standards referenced: WCAG