Collaborative AI Scaffolding for Structured Drawing in Dementia Care: A Feasibility Study

Hui-Lien Huang, I-Ping Chen · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26) · doi:10.1145/3772363.3798912

Summary

This CHI 2026 Extended Abstract reports a one-on-one feasibility study of a tablet-based AI drawing tool for older adults with dementia, conducted at a day-care centre in Taiwan. The system pairs a Google Gemini vision-language model (VLM) with Azure text-to-speech to deliver short, canvas-aware voice cues spoken in the persona of an art therapist, alongside a low-clutter UI (large canvas, eight high-contrast colour buttons, eraser, clear-canvas button) and a 2D animated companion whose mouth moves while it speaks. The authors frame the system as human-in-the-loop (HITL) collaborative scaffolding, not a labour-saving device: the AI handles pacing and structure, and a human facilitator translates abstract semantic prompts into executable micro-actions when breakdowns occur. A key engineering contribution is a turn-taking voice mechanism that locks new TTS triggers during audio playback, marks intervening strokes as pending, and batches them into a single post-playback update so the AI does not talk over itself. Eleven participants were recruited; three completed a pilot, two were excluded for physical limitations, and the core study analysed six older adults with dementia (CDR 1–4). Each session ran 20–50 minutes across warm-up, AI-guided flower-drawing task, and a simplified SUS-based interview.

Key findings

Under a lenient completion criterion (visually identifiable or explicitly labelled), 5/6 participants completed all four flower elements (petals, leaves, stem, pistil) when AI guidance was paired with optional human facilitation; under a stricter visual-only criterion, element-level completion was 17/24 (~71%). Without any human help, only 2/6 (P04, P08) completed the elements via AI voice alone. Task time ranged 1:20–13:43 (median 6:46). Three behavioural pathways emerged: AI-independent (2/6) treated the AI as a collaborator; human-supported (3/6) needed a facilitator to convert prompts like "add a petal" into pointing gestures or "tap green here"; and task-divergent (1/6) got stuck in repetitive scribbling that voice scaffolding could not interrupt. The central qualitative finding is a "semantics-to-action gap": participants reliably followed concrete cues (shapes, colour keywords, deictic actions like "tap here") but struggled with abstract part labels ("petal," "change colour"). The team also observed state misalignment, where the VLM repeated prompts for elements that already appeared complete, attributed to screenshot/queueing latency. Subjective ratings were positive (satisfaction M=3.83, enjoyment M=4.17 on a 5-point scale), though enjoyment did not predict willingness to reuse the tool independently.

Relevance

For accessibility practitioners working with cognitive disability, this paper is a useful counterweight to "AI replaces the caregiver" narratives. The authors take a HITL stance and treat the human facilitator as a designable collaboration layer, with concrete implications: scaffolding prompts should default to concrete verb + target ("tap green and add a leaf") rather than part labels; systems should expose state so the facilitator can repair AI/user mismatches; and a single caregiver may be able to supervise multiple semi-autonomous stations because the AI carries the pacing load. The paper also contributes a generalisable framing — the semantics-to-action gap — that applies well beyond dementia (e.g., aphasia, intellectual disability, early literacy). Limitations are upfront: N=6, single coder for completion (no inter-rater reliability), no latency logs, single structured task (open-ended art may behave very differently), and CDR 1–4 is a wide spread without enough participants to compare stages. The flower-drawing constraint enables clean coding but limits generalisation to free creative expression.

Tags: dementia · older adults · human-AI collaboration · human-in-the-loop · vision-language model · text-to-speech · art therapy · cognitive accessibility · feasibility study

Standards referenced: Clinical Dementia Rating (CDR) · System Usability Scale (SUS)