Steering Generative Models for Accessibility: EasyRead Image Generation

Nicolas Dickenmann, Yanis Merzouki, Sonia Laguna, Thy Nowak-Tran, Emanuele Palumbo, Julia E. Vogt, Gerda Binder · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26) · doi:10.1145/3772363.3798916

Summary

Dickenmann and colleagues (ETH Zurich and UNICEF Digital Impact Division) present the first open-source pipeline for generating Easy Read-style pictograms using a fine-tuned diffusion model. Easy Read pairs short sentences with simple, high-contrast pictograms to support comprehension for people with intellectual disabilities, cognitive impairments, or limited literacy, but hand-drawing new pictograms is slow and expensive. Off-the-shelf text-to-image models like Stable Diffusion produce images that are too visually complex and stylistically unstable to serve as pictograms. The authors curate a training corpus from three public pictogram datasets — ARASAAC (11,972 AAC symbols), OpenMoji (4,295 icons), and LDS (927 pictograms used in accessibility contexts) — and address the label-sparsity problem by re-captioning every image with the BLIP vision-language model to extract concrete visual content. They additionally augment ARASAAC by exploiting its built-in customisation API, rendering each pictogram across combinations of background, skin, and hair colours so the model learns to treat these as controllable attributes rather than baked-in style. Stable Diffusion v1.5 is then fine-tuned using rank-16 LoRA adapters on the cross- and self-attention projection matrices of the UNet, with all other weights frozen. The paper also contributes the EasyRead Score (ERS), a weighted aggregate of six pixel-based metrics — palette complexity, edge density, saliency concentration, foreground-background contrast, relative stroke thickness, and centering error — designed to quantify how well an image conforms to Easy Read design principles without requiring costly expert review.

Key findings

Evaluating on 55 validation prompts x 5 random seeds (275 images per model), the LoRA-fine-tuned model improves the EasyRead Score from 0.40 to 0.47 over the Stable Diffusion v1.5 baseline, and simultaneously improves CLIP text-image alignment from 24.33 to 31.15 — showing that accessibility-aligned outputs and semantic fidelity can be jointly optimised rather than traded off. Qualitative comparisons against two closed-source systems (Global Symbols and Google's Nano Banana Pro) indicate the fine-tuned open model produces more stylistically uniform pictograms across seeds and more consistently follows attribute constraints (background colour, skin tone, hair colour), though general-purpose commercial models sometimes render complex scenes with more detail. The model reliably disregards skin and hair attributes when no human figure is present, indicating it has learned when those properties apply. Remaining limitations include difficulty preserving fine details in crowded scenes (for example a firefighter on a ladder), reliance on ARASAAC's legacy and non-inclusive ethnicity categories ('White', 'Black', 'Asian', 'Mulatto', 'Aztec'), and the absence of human-subjects validation with Easy Read end users.

Relevance

Easy Read content is a legally mandated accessibility feature in many jurisdictions (for example EU Web Accessibility Directive guidance, UK health-service practice) but its production is chronically under-resourced because every new document needs bespoke pictograms. A reliable generative pipeline could scale Easy Read from high-visibility documents to everyday web content, signage, healthcare leaflets, and AAC boards, substantially expanding access for people with intellectual disabilities, aphasia, early-stage dementia, and non-native language users. The release of the EasyRead Score as a transparent, automated metric is as significant as the model itself: it gives the community a way to benchmark future pictogram generators without relying on expert review, and it operationalises design principles (low clutter, strong separation, centered content) that were previously only described qualitatively. Practitioners should nonetheless treat AI-generated pictograms as draft material requiring review by Easy Read specialists and, ideally, co-design validation with users who have intellectual disabilities — the paper itself calls out that human-centric validation remains a critical frontier.

Tags: diffusion models · stable diffusion · LoRA · generative AI · cognitive accessibility · intellectual disability · pictograms · Easy Read · AAC · symbol communication · image generation