Comparing Evaluation Techniques for Text Readability Software for Adults with Intellectual Disabilities

Matt Huenerfauth, Lijun Feng, Noémie Elhadad · 2009 · Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '09) · doi:10.1145/1639642.1639646

Summary

This paper investigates how to properly evaluate text simplification software designed for adults with mild intellectual disabilities (ID). The authors are developing a system that automatically simplifies news articles, displays them, and reads them aloud using text-to-speech — aiming to give adults with ID access to current news they would otherwise find too difficult to read. Most available reading material at their level is written for children and does not match their interests as adults. The central research question is methodological: what types of comprehension questions actually measure whether a simplified text is easier to understand than a complex one? Using a Wizard-of-Oz prototype (where a human editor performed the simplifications rather than software), 20 adults with mild ID from a day-habilitation program in New York City read both Simple and Complex versions of 12 news articles. The simplification process reduced articles from 300-400 words to 150-250 words by replacing infrequent vocabulary with common terms, splitting complex sentences, reordering information, and removing non-essential content. Participants answered six comprehension questions per article across three question types — ClipArt (multiple-choice with illustrated answers), MultipleChoice (text-only multiple choice), and TrueFalse — plus three Likert-scale self-assessment questions about perceived difficulty.

Key findings

ClipArt questions — multiple-choice with three answer options each illustrated by clip-art or photographs — were the most effective at distinguishing between Simple and Complex texts. The ratio of correct answers between Complex and Simple versions was 1.17 for ClipArt questions, compared to 1.06 for MultipleChoice and 1.01 for TrueFalse. The differences for ClipArt questions were statistically significant, while TrueFalse showed almost no ability to distinguish text difficulty levels. A refined variant called "ClipArtOptional" (collapsing ClipArt and MultipleChoice responses based on whether the answer involved numbers/non-referring words) produced an even larger statistically significant difference between Simple and Complex articles. Likert-scale self-assessment questions performed poorly: participants consistently rated articles as "Easy" to understand regardless of actual complexity, and differences between Simple and Complex versions were not statistically significant. Yes/no questions also proved ineffective. The findings confirm that adults with ID have difficulty accurately self-reporting comprehension difficulty, and that illustrated multiple-choice questions are the most reliable evaluation instrument for this population.

Relevance

This paper makes a crucial methodological contribution to accessibility research involving people with intellectual disabilities. Researchers developing text simplification, easy-read content, or other cognitive accessibility tools need valid ways to measure whether their interventions actually improve comprehension — and this study shows that common evaluation approaches (self-report scales, true/false questions) simply do not work for this population. The finding that Likert scales fail is particularly important because self-report measures are ubiquitous in usability research; researchers must recognize they cannot be reliably transferred to studies with adults with ID. The practical recommendation — use illustrated multiple-choice questions with three options — gives future researchers a concrete, validated instrument. Beyond text simplification, these findings apply to any usability study or survey involving adults with cognitive disabilities, including evaluations of websites, apps, or other assistive technologies where comprehension assessment is needed.

Tags: intellectual disability · text simplification · readability · natural language processing · evaluation methodology · comprehension · cognitive accessibility · assistive technology