A system for teaching speech to profoundly deaf children using synthesized acoustic and articulatory patterns
E. Keate, H. Javkin, N. Antonanzas-Barroso, R. Zou · 1994 · Proceedings of the First Annual ACM Conference on Assistive Technologies (Assets '94) · doi:10.1145/191028.191032
Summary
This paper describes a PC-based computer-assisted speech training system for profoundly deaf children that integrates a text-to-speech (TTS) synthesizer to generate both acoustic and articulatory models for any typed utterance. The system addresses a fundamental limitation of existing speech training tools: they depend on a teacher to provide production models, restricting practice time to scheduled sessions with pre-stored utterances. By integrating the STLTalk TTS system, children can type any word or sentence and receive synthesized speech output along with visual models of the correct articulatory patterns, enabling independent practice with an infinite range of utterances. The system combines multiple sensor inputs — a dynamic palatograph (an artificial palate with electrodes that detects tongue contact), a nasal accelerometer, a neck-mounted voicing sensor, an airflow meter, and a headset microphone — to extract training parameters including tongue-palate contact patterns, nasalization, pitch, voicing, amplitude, frication, and spectral shape. The paper explains why neither purely acoustic nor purely articulatory approaches are sufficient alone: acoustic feedback can mislead deaf speakers toward incorrect articulations that happen to sound similar, while articulatory patterns vary by individual physiology and cannot capture all relevant vocal tract configurations.
Key findings
The system's key innovation is synthesizing tongue-palate contact patterns from TTS output, using timing information from the synthesizer (onset, maximum amplitude, decay, and offset) coordinated with stored palatographic images for each sound in context. This provides children with visual articulatory targets they can compare against their own real-time palatographic data. The system includes thirteen training programs presented as motivational video games designed for young children (as young as 3 years old), including palatographic display games, a multi-parameter program showing pitch via eyebrows, amplitude via mouth size, nasalization via nose size, and voicing via Adam's apple expansion, plus pitch contour and airflow programs. A significant finding is that optimal contact patterns depend on each individual's palate and teeth configuration, which the system addresses by storing good productions from each student as personalized models. Testing with deaf children learning Japanese had shown computer-based training to be effective for individual phonemes, and the authors expected the synthesized system to offer additional benefits for connected speech.
Relevance
This paper represents early work in multimodal, computer-assisted speech training for deaf children, combining visual representations of both acoustic and articulatory information with gamification to sustain motivation in young learners. The integration of TTS to generate unlimited training utterances was a significant advance over systems limited to pre-recorded teacher models. The core insight — that effective speech training requires coordinated acoustic and articulatory feedback personalized to the individual learner — remains relevant to modern speech therapy technologies. The gamification approach, with age-appropriate visual metaphors for abstract speech parameters, anticipates current trends in therapeutic gaming and app-based speech therapy tools. The work also highlights persistent challenges in accessibility technology: the gap between what hearing children learn incidentally through constant acoustic exposure versus the focused, limited training time available to deaf children.
Tags: deaf education · speech training · text-to-speech · palatography · visual feedback · acoustic analysis · articulation · speech synthesis · deaf children · computer-aided instruction · formant analysis