Cognitively Motivated Features for Readability Assessment

Lijun Feng, Noémie Elhadad, Matt Huenerfauth · 2009 · Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009) · doi:10.5555/1609067.1609092

Summary

Feng, Elhadad, and Huenerfauth (2009) develop and evaluate an automatic readability-assessment tool targeted specifically at adults with intellectual disabilities (ID) — people in the 'mild' category (IQ 55-70), who comprise roughly 3% of the U.S. population. Rather than reusing readability formulas designed for children, the authors ground their feature design in the cognitive literacy profile of this user group: adults with mild ID decode words reasonably well but struggle with comprehension, because of limitations in working memory and in building cohesive discourse representations. They hypothesise that 'entity density' — the number of distinct entities a reader must track across a text — is therefore the critical predictor of readability for this audience. To test this, they assembled a four-part corpus: Britannica and LiteracyNet articles in paired complex/simplified forms, the Weekly Reader corpus graded for school grades 2-5, and a novel 'LocalNews' corpus of 20 local-news articles rated for comprehension by 14 adults with mild ID in a reading study where each article was presented on screen and read aloud via text-to-speech. The authors implement 28 features spanning shallow metrics (Flesch-Kincaid, FOG), parse-tree features (noun phrases, verb phrases, SBARs from the Charniak parser), and novel discourse-level features based on entity mentions, unique entities, and lexical chain statistics. They use paired t-tests on the Britannica/LiteracyNet corpora to identify statistically significant features and combine them in linear regression models trained on Weekly Reader.

Key findings

Entity-density and lexical-chain features separated complex from simplified text with significant p-values (p<0.00001) on both paired corpora, validating the cognitive hypothesis that the number of entities a reader must track reflects reading difficulty. The authors evaluate three models against a Flesch-Kincaid baseline on two test sets: grade-level prediction on Weekly Reader, and correlation to actual reader-comprehension scores on LocalNews. On grade-level prediction the combined 'Basic + Cognitively-Motivated Features' model performed best, predicting grade level to within 0.565 grade levels on average versus 2.569 for Flesch-Kincaid. The more striking result came from the LocalNews correlation with adult-ID comprehension: the model trained only on cognitively-motivated features had the highest negative correlation with comprehension scores (Pearson's R = −0.352), outperforming both the combined model (−0.342) and the basic-features-only model (−0.283), and far beyond Flesch-Kincaid (−0.270). In other words, features designed for children's grade-level prediction do not transfer well to assessing readability for adults with ID, but features grounded in the cognitive profile of this population do. Entity-density averaged per sentence and named-entity counts were the most consistently significant discriminators. Two features — average lexical-chain length and number of lexical chains spanning more than half the document — showed no significant effect.

Relevance

This paper is a milestone for cognitive-accessibility research because it demonstrates, with empirical rigour, that one-size-fits-all readability formulas underserve readers with intellectual disabilities, and that audience-specific cognitive profiles should drive feature engineering. For practitioners, three takeaways: (1) Flesch-Kincaid and similar formulas should not be used to decide whether content is accessible to readers with ID — they correlate poorly with actual comprehension; (2) entity density — how many people, places, organisations, and concepts a reader must track — is a concrete, actionable lever for content simplification, and editors reducing entity count should expect comprehension gains; (3) user-comprehension data from the target population, not grade-level proxies, is the defensible ground truth when designing or evaluating simplification tools. Limitations: the LocalNews test corpus is small (20 articles, 14 readers), the work focuses on mild ID rather than other cognitive profiles (aphasia, dementia, dyslexia), and the features do not capture visual layout, pacing, or interactive supports.