← All reviews

A Spellchecker for Dyslexia

Luz Rello, Miguel Ballesteros, Jeffrey P. Bigham · 2015 · ASSETS '15: Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility · doi:10.1145/2700648.2809850

Summary

This paper introduces Real Check, a spellchecker designed specifically to detect real-word errors—spelling mistakes that result in unintended but valid words (e.g., "form" instead of "from"). While dyslexia affects approximately 10% of the population and is the most frequent language-based learning disability, people with dyslexia do not consciously detect their spelling errors, making spellcheckers crucial tools. However, conventional spellcheckers (Microsoft Word, LibreOffice, Pages) fail to detect real-word errors, which constitute 17-21% of errors made by people with dyslexia. The sentence "We *sow *quit a *big *miss *take we *maid" contains six spelling errors, but standard spellcheckers flag none because each misspelling is itself a valid word. Real Check, developed for Spanish, employs a multi-stage algorithm. First, it generates "confusion sets"—groups of words that are commonly confounded, differing by one or two characters or in character order (e.g., casa/saca, meaning "house"/"take out"). Because Spanish has shallow orthography (transparent spelling-to-pronunciation mapping), the authors automatically generated 1,250,781 confusion sets using Levenshtein distance calculations on a Spanish dictionary. For each word in a sentence, the system generates candidate sentences with all confusion set substitutions, then matches these against the Google Books Ngram Corpus (over 8 million books, 84 billion Spanish tokens) to find statistically more frequent alternatives. Two filters reduce false positives: a probabilistic language model trained on web text filters ungrammatical suggestions, and a dependency parser applies morphosyntactic rules (e.g., filtering suggestions that create subject-verb number disagreement). To develop and evaluate the system, the authors crowdsourced real-word errors by collecting texts written by people with dyslexia—notebooks, school essays, and computer-written texts without spellcheckers—yielding 366 sentences with at least one real-word error each (averaging 1.12 errors per sentence).

Key findings

System evaluation on 344 sentences showed Real Check achieved 50.42% precision and 65.93% recall (F1=57.14) for error detection—significantly outperforming commercial spellcheckers in recall. Microsoft Word detected only 5.82% of real-word errors (with 100% precision on what it did detect); Google Docs performed best among commercial tools with 37.40% recall. For error correction, Real Check achieved 25.64% precision and 49.59% recall. The key tradeoff is that Real Check finds far more errors but with more false positives, while commercial tools find few errors but rarely flag correct words incorrectly. A user evaluation with 34 participants (17 with dyslexia, 17 "strong readers") tested three conditions: no spellchecker, error detection only (highlighting), and error suggestions. Participants with dyslexia showed significantly higher writing accuracy when using suggestions (median accuracy 100 vs. 78 baseline), and completed corrections significantly faster (median 8.4 seconds vs. 10.3 seconds without assistance). Critically, even detection-only highlighting improved accuracy for people with dyslexia (p=0.004 compared to no assistance)—simply knowing where errors exist helps, even without suggestions. Strong readers showed no significant accuracy improvement from any condition, though they corrected sentences faster with suggestions. Participants with dyslexia also perceived that they wrote more accurately and faster when using suggestions, with subjective ratings significantly higher than the no-assistance condition. Limitations include inability to detect semantically plausible but unintended substitutions (where context doesn't disambiguate meaning) and word boundary errors ("más cara" vs. "máscara"). Some dyslexic writing patterns were not covered in the Google n-gram corpus.

Relevance

This work demonstrates that assistive technology for dyslexia requires understanding the specific error patterns of the target population—generic spellcheckers optimized for typical users systematically miss the error types most common in dyslexic writing. The finding that real-word errors constitute nearly 20% of dyslexic spelling errors, yet are the least-detected category by commercial tools, reveals a significant gap in writing support. The decision to evaluate on actual text written by people with dyslexia, rather than artificially generated error corpora, strengthens ecological validity. For practitioners, the key insight is that error detection alone provides substantial benefit for people with dyslexia—they cannot consciously detect their own errors, so simply highlighting problem areas improves accuracy even without correction suggestions. This suggests that even imperfect detection systems with higher false positive rates may be preferable to high-precision systems that miss most errors. The underlying NLP techniques (n-gram language models, confusion sets, dependency parsing) are applicable to other languages, though the authors note that morphologically rich languages like Spanish require additional filters for grammatical agreement that simpler languages might not need.

Tags: dyslexia · spelling · natural language processing · real-word errors · assistive technology · writing support · machine learning