Toward the Use of Speech and Natural Language Technology in Intervention for a Language-Disordered Population
Jill Fain Lehman · 1998 · Proceedings of the Third International ACM Conference on Assistive Technologies (Assets '98) · doi:10.1145/274497.274503
Summary
This paper from Carnegie Mellon University describes the design of Simone Says, an interactive software environment for language remediation in children with Autism Spectrum Disorder (ASD). The system brings together speech recognition, natural language processing, and computer-aided instruction to create an intervention tool that uses speech as the primary input modality — a significant departure from existing language intervention software that relied on mouse- or keyboard-based interaction. The design is grounded in the clinical reality that children with ASD typically have characteristic delays in lexical, syntactic, and semantic language development, with particular difficulty in pragmatic aspects of language — understanding when, how, and why language is used to achieve communicative goals. The basic interactive loop consists of presenting visually engaging animated stimuli, using speech recognition to accept the child's spoken response, employing natural language understanding to assess whether the utterance is referentially meaningful, and rewarding appropriate responses with consequent animations. The system progressively challenges children along four developmental dimensions: vocabulary, syntax, semantics, and pragmatics. For example, a child might begin by simply labelling objects ("apple"), progress to plurals ("apples"), then noun-verb pairs ("eat apple"), and eventually to pragmatically complex utterances requiring theory of mind ("Eat the apple!" directed at an animated character). The system uses the CMU Sphinx-II speech recognition engine, with a language model trained to handle the specific speech patterns of children with ASD.
Key findings
The paper identifies three critical design principles for technology-based language intervention: every interaction must be rewarding (the game must reinforce engagement), children must be active participants rather than passive observers (motivating active involvement), and the system must balance realism with fun. The progressive scaffolding approach — reusing the same visual stimuli across increasingly complex linguistic contexts — allows a single scene to teach vocabulary, morphology, syntax, and pragmatics sequentially. The theory-of-mind modelling component is particularly innovative: thought bubbles above animated characters make explicit the connection between mental state and communication, teaching children that language expresses intentions to others. The natural language understanding component uses CHAMP, an adaptive parsing system that learns user-specific grammar through interaction, allowing the system to model each child's developing linguistic competence. The system handles the challenge of imperfect speech recognition by compensating for mispronunciations at the NLU stage rather than requiring perfect acoustic recognition. The authors planned a longitudinal Wizard-of-Oz evaluation to establish whether the technology could support the intervention before full deployment, with measures including acquisition of appropriate responses, decreased reliance on prompts, and generalisation of language across contexts.
Relevance
This paper represents an early and sophisticated attempt to apply speech and NLP technology to language intervention for children with autism — a research direction that has become increasingly relevant as speech recognition and natural language processing have matured dramatically. The core design principles remain directly applicable to modern language intervention apps and AAC tools: making interaction inherently rewarding, treating the child as an active communicator rather than a passive recipient, and progressively scaffolding linguistic complexity. The approach of modelling each child's developing grammar through adaptive parsing anticipates modern personalised learning systems that adjust to individual users. For accessibility practitioners, the paper highlights that speech technology in assistive contexts must handle non-standard speech patterns — children with ASD may have atypical pronunciation, prosody, and language structure — a challenge that remains relevant for inclusive speech recognition design. The theory-of-mind scaffolding through visual thought bubbles offers a design pattern that could be adapted for modern social communication apps targeting pragmatic language development.
Tags: autism spectrum disorder · language remediation · speech recognition · natural language processing · computer-aided instruction · language development · pragmatic language · theory of mind · assistive technology
Standards referenced: DSM-IV