Wizard-of-Oz Test of ARTUR: a Computer-Based Speech Training System with Articulation Correction
Olle Bälter, Olov Engwall, Anne-Marie Öster, Hedvig Kjellström · 2005 · Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '05) · doi:10.1145/1090785.1090795
Summary
This paper from KTH Royal Institute of Technology in Stockholm presents a Wizard-of-Oz evaluation of ARTUR (the ARticulation TUtoR), a computer-based speech training system designed to help children with language disorders improve their articulation. ARTUR's distinguishing feature is its ability to provide specific corrective feedback on how to improve pronunciation, not just whether a pronunciation was correct. The system uses a 3D animated talking head that can display internal articulatory features — tongue position, palate, and jaw — which are normally hidden during speech. This is significant because many articulatory and acoustic features of speech are not easily accessible from visual observation alone, and acoustically each speech sound is unique but visually many sounds are difficult to discriminate. The system pipeline involves audio-visual detection of mispronounced speech, marker-less tracking of facial features from video, articulatory inversion (recovering vocal tract shape from the speech signal using MRI and Electromagnetic Articulography data), speaker model adaptation, and feedback display through the animated talking head. In the Wizard-of-Oz setup, a phonetically trained human wizard replaced the automatic mispronunciation detection and articulatory inversion components, selecting from ten pre-generated feedback options based on tongue height and position, plus three encouragement utterances.
Key findings
Six children participated in the study: three older children (ages 9-14) with extensive CBST experience and ICD-10 F80.1 ABC classifications (expressive and impressive language disorders, largely reduced by the time of the study), and three younger children (age 6) with ICD-10 F80.2B (general language disorder) and limited CBST experience. All older children were very positive about ARTUR, describing it as 'very intelligent and good,' with the correction feedback on pronunciation being the most valued feature. One child rated it 'twice as good as SpeechViewer and Box-of-Tricks.' Importantly, all three older children followed ARTUR's instructions without any prior training. The younger children also responded positively but revealed important design issues: since they could not read, the text-only button labels were unusable, requiring therapist assistance. The study identified several key design recommendations: feedback should be based on the current and preceding words rather than given after a specific error; the system needs a confidence score rather than a deterministic feedback matrix to handle classification uncertainty; game-like features would increase engagement for younger children; the hard palate drawing needs improvement as children had difficulty interpreting it; and the animation speed of articulatory feedback needs adjustment to separate the practiced articulation from the rest of the word.
Relevance
This study makes a valuable contribution to accessible speech training technology by demonstrating that children with language disorders can use a sophisticated articulatory feedback system without prior training — a critical usability threshold for assistive technology aimed at children. The use of a 3D talking head with visible internal articulators addresses a genuine gap in speech therapy: children who are born with severe auditory deficits have limited acoustic speech targets to imitate, and visual articulatory feedback through a talking head offers an alternative sensory channel. For accessibility practitioners, the study highlights important design considerations for child-facing assistive technology: the need for non-text interfaces for pre-literate users, the value of game-like elements for engagement, and the importance of graduated feedback that handles uncertainty gracefully rather than presenting potentially incorrect binary classifications. The Wizard-of-Oz methodology itself is instructive — by testing the interface before building the full automatic system, the researchers identified critical design issues early, avoiding costly development of features that would not work for the target users.
Tags: speech technology · speech training · articulation · language disorder · child development · assistive technology · Wizard-of-Oz · computer vision · talking head · speech and language therapy
Standards referenced: ICD-10