Intelligibility Assessment and Speech Recognizer Word Accuracy Rate Prediction for Dysarthric Speakers in a Factor Analysis Subspace

David Martínez, Phil Green, Heidi Christensen · 2015 · ACM Transactions on Accessible Computing · doi:10.1145/2746405

Summary

This paper presents a novel approach to assessing speech intelligibility and predicting automatic speech recognition (ASR) accuracy for speakers with dysarthria using iVectors, a technique from speaker verification research. The authors address a critical challenge in assistive technology: dysarthric speech varies enormously between individuals, making it difficult to develop universal systems that work well across all users. The research uses the UAspeech database, which contains recordings from 15 speakers with cerebral palsy whose intelligibility ranges from 2% to 95% as rated by naive listeners. The core innovation is applying factor analysis to compress acoustic information from entire utterances into 400-dimensional iVectors. These compact representations capture speaker-specific characteristics that correlate with both human intelligibility judgments and ASR performance. The study explores two experimental conditions: one where some data from the target speaker is available during training (simulating an enrollment process), and one that is completely speaker-independent. The methodology involves training a Universal Background Model on healthy speech, then using factor analysis to extract total variability matrices that project speech features into a low-dimensional subspace. Support vector regression then maps iVectors to intelligibility or accuracy predictions.

Key findings

The results demonstrate that iVectors are highly effective for both intelligibility assessment and ASR accuracy prediction, particularly when speaker-specific data is available. With user enrollment data, the system achieved correlations of r=0.91 for intelligibility prediction and r=0.89 for ASR accuracy prediction. In speaker-independent scenarios, correlations dropped to r=0.74 for intelligibility and r=0.55 for ASR accuracy. Binary classification into high and low intelligibility groups achieved 80% precision and recall in speaker-independent conditions. Importantly, iVectors consistently outperformed simpler acoustic features (PLP means) and more complex supervector representations when no speaker-specific data was available. The research revealed that predicting ASR word accuracy is more challenging than predicting intelligibility, likely because ASR errors depend on specific acoustic-phonetic patterns that vary across speakers. The authors also found that intelligibility ratings from unfamiliar listeners correlate better with ASR performance than ratings from familiar listeners, suggesting naive listener judgments better reflect the challenges faced by automated systems.

Relevance

This research has direct implications for designing adaptive AAC systems and speech-based assistive technologies. The finding that even brief enrollment data dramatically improves prediction accuracy suggests that systems should incorporate user-specific calibration rather than relying solely on universal models. For practitioners developing speech interfaces for people with dysarthria, this work provides a framework for automatically assessing whether ASR will work reliably for a given user before deployment. The iVector approach could enable "intelligibility screening" that helps users and clinicians set appropriate expectations for speech technology performance. The speaker-independent results, while less accurate, demonstrate that useful predictions can be made without requiring extensive user data collection—important for reducing barriers to adoption. Future AAC systems might use similar techniques to dynamically adjust between speech-based and alternative input methods based on predicted recognition accuracy.

Tags: dysarthric speech · speech recognition · intelligibility assessment · iVectors · factor analysis · cerebral palsy · AAC