Estimating dyslexia in the web

Ricardo Baeza-Yates, Luz Rello · 2011 · Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/1969289.1969300

Summary

This paper presents the first attempt to estimate the prevalence of dyslexic writing errors in English-language web content. The authors developed a classification system distinguishing five types of lexical errors: dyslexic errors (multi-character reversals, transpositions, and substitutions characteristic of dysphonetic dyslexia), regular spelling errors by non-impaired native speakers, keyboard typos caused by adjacent key presses, OCR errors from similar-looking characters, and errors by non-native English speakers. The methodology focused on dysphonetic dyslexia — the largest subtype — characterised by difficulty associating symbols with sounds, particularly involving confusable letters like b/d, p/q, m/n, and u/w. To avoid ambiguity between error types, the authors selected only "multi-errors" (words differing from the intended word by more than one letter) and carefully generated all possible error variants for each of ten sample words (e.g., "comparison" yielding *comaprsion as the dyslexic variant versus *vomparison as a typo). They used document frequency counts from Bing, Google, and Yahoo to estimate error prevalence across the English web.

Key findings

Among all lexical errors found in the web, regular spelling errors dominated at 63.7% on average, followed by typos at 28.2%, non-native speaker errors at 6.6%, OCR errors at 0.8%, and dyslexic errors at 0.7%. The estimated lower bound for web pages containing dyslexic errors was 0.005% — far lower than the 10-17% dyslexia prevalence in the US population. Even using conservative estimates, this translates to at least one million pages with dyslexic errors for every 20 billion web pages. The authors attribute the relatively low presence of dyslexic errors to the widespread use of spell checkers that correct many dyslexic misspellings before publication. The methodology only captured multi-errors (39% of all dyslexic errors in a reference corpus), deliberately underestimating the true figure. The study also noted that dyslexic error patterns are language-specific because they depend on orthographic depth — English, with its deep orthography (inconsistent letter-to-sound mapping), produces more widespread dyslexic errors than languages with transparent orthography.

Relevance

This study makes two important contributions to accessibility practice. First, it establishes that dyslexic-authored content exists on the web at measurable scale, demonstrating that web accessibility for dyslexia must address both interface design (readability, font choice, layout) and content quality (error detection and correction). Second, it shows that specific error patterns can identify likely dyslexic text, opening the door to NLP tools that could automatically detect dyslexic writing and offer targeted spell-checking. The error classification framework — distinguishing dyslexic errors from typos, OCR artifacts, and non-native speaker errors — remains a useful analytical tool. The finding that spell checkers substantially reduce visible dyslexic errors online highlights how assistive tools embedded in everyday software can have population-level accessibility impact without requiring users to self-identify as having a disability.

Tags: dyslexia · cognitive accessibility · web content quality · natural language processing · spelling errors · readability

Standards referenced: WCAG 2.0