← All reviews

Making It Simplext: Implementation and Evaluation of a Text Simplification System for Spanish

Horacio Saggion, Sanja Štajner, Stefan Bott, Simon Mille, Luz Rello, Biljana Drndarevic · 2015 · ACM Transactions on Accessible Computing · doi:10.1145/2738046

Summary

This paper presents Simplext, the first comprehensive automatic text simplification (ATS) system for Spanish. The research addresses a fundamental accessibility barrier: textual content written in complex language excludes people with cognitive disabilities, low literacy, and non-native speakers. While ATS research has primarily focused on English, languages with fewer resources have been underserved. Simplext uses a modular architecture combining three approaches: syntactic simplification using graph transduction grammars, synonym-based lexical simplification (LexSiS) using distributional semantics, and rule-based lexical simplification for specific patterns. The system was developed empirically, grounded in analysis of a parallel corpus of 200 news texts manually simplified according to "Make it Simple" guidelines for people with learning disabilities. The corpus analysis identified common simplification operations: sentence splitting, vocabulary substitution, parenthetical deletion, reporting verb normalization, and numerical expression clarification. The syntactic module operates on dependency parse trees, splitting complex sentences containing relative clauses, gerund constructions, and coordinated verbs into simpler units. LexSiS performs word sense disambiguation using word vectors and selects simpler synonyms based on frequency and length. Rule-based components handle consistent patterns like replacing diverse reporting verbs with the universal "decir" (say) and transforming numerical expressions into more readable formats.

Key findings

Automatic evaluation using seven Spanish readability metrics showed statistically significant reductions in complexity across all measures (p < 0.001 for six of seven metrics). Lexical Complexity dropped 17%, Average Sentence Length decreased 22.3%, and syntactic embedding depth reduced 13.1%. The system produces measurably simpler text, though not reaching the simplicity levels of human editors who apply paraphrasing and content deletion that are computationally difficult to replicate. Human evaluation with 25 native Spanish speakers found that 70% of automatically simplified sentences preserved meaning, while simplified sentences were rated significantly simpler than originals (Median 3.5 vs. 3, p < 0.001). However, original sentences were rated more grammatical—a common tradeoff in ATS systems where simplification operations can introduce errors. A target user evaluation with 44 adults with Down syndrome showed positive qualitative reception. Participants appreciated the tool's accessibility across devices and perceived differences between original and simplified versions, though the small sample size (three texts) prevented statistical confirmation of comprehension improvements. Comparison with English ATS systems showed Simplext achieves competitive results: meaning preservation of 3.86 on a 5-point scale ranks among the best, while simplicity (3.20) and fluency (3.25) are comparable to mid-tier English systems.

Relevance

This research demonstrates that robust text simplification is achievable for languages beyond English, providing a template for extending accessibility tools to the 500+ million Spanish speakers worldwide. The modular architecture—separating syntactic, semantic, and rule-based components—allows adaptation to other languages by modifying language-specific resources while retaining the processing pipeline. For accessibility practitioners, the study validates the "Make it Simple" European guidelines as a foundation for computational simplification, while revealing the gap between automated and human simplification. The finding that 30% of simplifications didn't preserve meaning highlights the need for human oversight in high-stakes accessibility contexts—automatic simplification is a tool to assist, not replace, human editors. The error analysis offers practical insights: syntactic simplification errors stem primarily from parser failures on complex constructions, while lexical errors arise from word sense disambiguation limitations. Organizations considering text simplification tools should expect these tradeoffs and plan for post-editing workflows. The research also establishes Spanish-specific readability metrics that can guide evaluation of other Spanish accessibility tools.

Tags: text simplification · natural language processing · cognitive accessibility · readability · Spanish · intellectual disability · Down syndrome · automatic text simplification

Standards referenced: WCAG 2.0