Look Here, Click Me: Improving Older Adults’ Perception of Manipulable User Interface Components through AI-Based Perceptual Guidance

Sera Park, Seoyeon Kim, Sangyeon Kim · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26) · doi:10.1145/3772363.3798878

Summary

This CHI 2026 Extended Abstracts paper from Sookmyung Women’s University (Seoul) tackles a concrete gap in older-adult digital literacy: existing programs teach step-by-step procedures ("tap here, then here, then here") that collapse the moment an app updates its layout. The authors argue that older users need foundational perceptual skill at recognizing *which* on-screen elements are manipulable in the first place — a skill transferable across apps and devices. They build an AI pipeline that ingests mobile UI screenshots, runs a YOLO v12 detector trained on the VINS dataset (14 UI classes, mAP 0.650), extracts perceptual features per bounding box (dominant color via k-means, text via PaddleOCR, and a 3×3 spatial-grid location label), and auto-generates two kinds of textual instructions: a plain "Click the icon labeled MY" baseline and a perceptual guidance form — "Click the icon at the top right of the screen that is black labeled MY." The design is grounded in perceptual learning theory, where repeated exposure to task-relevant features (color, location) under top-down attention produces durable detection gains. The authors ran a within-subjects study with 13 Korean smartphone users (mean age 65.6, 9F/4M) using 130 real-app screenshots yielding 1,305 filtered tasks; each participant completed 120 training tasks (60 per condition, counterbalanced) with pre/post detection tests and pre/post semi-structured interviews, splitting participants by self-rated confidence (low ≤5, high ≥6 on a 10-point scale).

Key findings

Overall detection accuracy improved from 6.08 to 7.08 out of 9 (one-tailed paired t p=0.045, though the effect did not survive a two-tailed test — a limitation the authors flag). The headline result is an interaction between instruction type and confidence level (F(1,11)=12.03, p=0.005): perceptual guidance helped low-confidence older adults (M=52.3 vs 49.7, Tukey p=0.053) but not high-confidence ones. Qualitative data reinforced this split cleanly: all 6 low-confidence participants preferred perceptual guidance ("since my eyesight is not good, it was easier when the guidance pointed to the bottom-left area"), whereas high-confidence participants were divided — some found the extra words an obstacle that slowed response, and two top performers reported they could not even perceive a difference between conditions ("it felt the same whether or not information was provided"). Within the perceptual guidance, low-confidence users valued *location* cues (for focusing attention on a screen region), while high-confidence users preferred *color* cues (red/blue "stood out immediately"). Interview themes surfaced additional accessibility-relevant friction: unfamiliar positional terminology (up/above/top/upper increased response time), inability to distinguish red from yellow due to reduced eyesight, and privacy anxiety around authentication flows. The pipeline itself shipped with caveats — 5 of 14 YOLO classes (Card, Spinner, CheckBox, BottomNavigation, Modal) were excluded for misclassification, and tasks needed manual validation for OCR errors and ambiguous colors (black vs dark gray).

Relevance

The paper is useful for accessibility teams building training content, onboarding, or assistive overlays for older users: it supplies empirical evidence that one-size-fits-all instruction harms proficient users and that guidance must be tailored by confidence level, with *location cues for novices* and *color cues for more confident users*. It also offers a concrete automation recipe — YOLO on VINS plus k-means color plus OCR plus a 3×3 grid — for generating per-screenshot accessible instructions at scale, which is directly applicable to computer-vision-assisted accessibility tooling (screen-reader augmentation, digital-literacy curricula, older-user help systems). Limitations are significant: n=13 Korean participants, short-term only (no longitudinal retention data), static screenshots rather than live interactive apps, color and location only (not size, shape, edge, orientation), and the core improvement may be partly a repeated-exposure effect rather than perceptual guidance per se. Read this as a design-direction paper, not a validated intervention.

Tags: older adults · digital literacy · perceptual learning · accessibility · UI components · computer vision · YOLO · mobile accessibility · aging · digital divide