Automatic Assessment of Speech Capability Loss in Disordered Speech

Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Jérôme Farinas, Charlotte Alazard-Guiu, Marina Robert, Peggy Gatignol · 2015 · ACM Transactions on Accessible Computing · doi:10.1145/2739051

Summary

This paper investigates whether the Goodness of Pronunciation (GOP) algorithm, originally developed for computer-assisted language learning to detect non-native speaker mispronunciations, can be repurposed to assess speech capability loss in people with speech disorders. The researchers conducted two experiments: a primary study with 32 French speakers with unilateral facial palsy (UFP) at varying clinical severity grades, and a secondary exploratory study with six speakers who had speech impairments from cancer surgery or neurological disorders. The GOP algorithm works by comparing automatic speech recognition outputs from two phases: a "free" recognition where the system determines the most likely phone sequence, and a "forced" alignment constrained to the expected pronunciation. The difference between these log-likelihoods, normalized by duration, produces a GOP score where higher values indicate greater deviation from standard pronunciation. The key insight is that since GOP relies solely on native phone models rather than comparing to non-native speech, it should generalize to any non-typical speech production. For the UFP experiment, speakers were graded using the House-Brackmann (H&B) scale, which assesses facial muscle mobility but doesn't directly measure speech production. UFP particularly affects bilabial consonants (/p/, /b/, /m/) and sibilants (/s/) due to impaired lip control and airflow management.

Key findings

In the UFP study, manual transcription of the most severely impaired speakers (grades V-VI) revealed that 8.3% of phones differed from standard pronunciations—primarily substitutions of bilabial /p/ with labiodental /f/ (a compensation strategy). The GOP algorithm detected 70.2% of these mispronunciations when operating at equal false rejection and false acceptance rates of approximately 30%. When thresholds were set to limit false rejections to 10% on control speech, correct acceptance reached 84.6% with correct rejection at 49.6%. Average GOP scores increased with clinical severity: control group (G1) averaged 1.62, while the most impaired group (G4) averaged 2.24. However, the relationship between GOP scores and H&B grades was not strictly monotonic—speakers rated at grade III sometimes had better pronunciation scores than grade II speakers, suggesting GOP captures different information than the physiological H&B assessment. In the second experiment with diverse speech disorders, GOP scores showed strong correlations with both objective measures (reaction times to oral commands, r = 0.786) and subjective comprehensibility ratings from speech pathologists (r = -0.684). Higher GOP scores corresponded to longer listener reaction times and lower comprehensibility judgments.

Relevance

This research demonstrates that automatic pronunciation assessment tools can provide objective, scalable measures of speech capability loss that complement traditional clinical evaluations. For accessibility practitioners, this has several implications. First, GOP-based assessment could enable remote monitoring of speech rehabilitation progress without requiring in-person clinical visits—particularly valuable for people with mobility limitations or in underserved areas. The technique requires only standard speech recordings and automatic processing, making it accessible for telehealth applications. Second, the finding that GOP captures different information than physiological assessments (like the H&B scale) suggests that communication-focused measures should be used alongside traditional clinical scales. A person may have significant facial muscle impairment but develop compensatory strategies that maintain comprehensibility—or vice versa. For speech-based assistive technology developers, these findings indicate that pronunciation quality metrics could help predict ASR accuracy for individual users, inform when to suggest alternative communication methods, or guide personalized speech recognition adaptation. The strong correlation with comprehensibility also suggests GOP could help identify when speech has degraded to the point where augmentative communication support should be considered.

Tags: disordered speech · speech assessment · automatic speech recognition · facial palsy · pronunciation assessment · speech pathology · rehabilitation