Mnemonic Tracing: Using Eye Gaze to Search for Visual Memories

Wazeer Zulfikar, Yasith Samaradivakara, Paul Pu Liang, Pattie Maes · 2026 · Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26) · doi:10.1145/3772363.3799025

Summary

Mnemonic Tracing is a non-verbal image-retrieval interaction in which a user, wearing eye-tracking glasses, deliberately retraces the contents of a remembered image with their gaze on a blank surface. The paper builds on gaze-reinstatement research, which shows that when people mentally visualize a previously seen scene, their eye movements partially recreate the spatiotemporal pattern produced when the scene was first encoded. Prior work treated this reinstatement as an involuntary signal; the authors instead make it intentional, asking users to actively trace remembered objects, layout, and detail. The system represents each eye-tracking trial as a sequence of fixations, projects them onto a duration-weighted Gaussian attention map, computes a prefix-based map from the first K fixations to capture early scene structure, and compares encoding and recall maps using the Bhattacharyya coefficient. No supervised training, no per-user calibration, and no language input are required. A pilot study with 11 sighted participants used Pupil Labs Neon eye-tracking glasses and 30 AI-generated images of people, objects, and locations across six experimental sets. Each trial had a 7-second encoding phase, a 1-minute distractor task, and a 7-second recall phase in which participants traced the remembered image onto a blank screen. The Vividness of Visual Imagery Questionnaire (VVIQ), NASA-TLX, and a Technology Acceptance Model survey were administered. The authors explicitly position the technique as a private, hands-free, inclusive retrieval channel for users who cannot or do not want to use speech or text input.

Key findings

On 295 valid trials out of 330, the gaze-based retrieval algorithm achieved Top-1 accuracy of 30.5%, Top-3 of 51.2%, and Top-5 of 60.0% against a 30-image candidate set, well above the 3.3%, 10.0%, and 16.7% chance baselines. Within-set retrieval (5 images per set) reached 53.2% Top-1 against a 20% chance baseline. The proposed Gaussian Attentional Map Retrieval method outperformed an Earth Mover's Distance baseline (Top-1 30.5% vs 24.7%; Top-3 51.2% vs 42.3%) because voluntary tracing produces denser, more structured fixation patterns than involuntary recall gaze. Performance correlated positively with VVIQ (r=0.25, vivid imagers retrieved better) and surprisingly with NASA-TLX (r=0.44, higher subjective workload tracked higher accuracy, interpreted as task engagement). Per-participant Top-3 ranged from 34.5% to 70.4%. TAM perceived usefulness averaged 3.58/5 (SD=0.82), with most participants rating intentional eye movement helpful. Qualitative themes included tension between intentional control and involuntary saccades, descriptions of the process as a "mindful activity," and four envisioned applications: memory aids, photo/video retrieval, spatial navigation in remembered scenes, and accessibility uses such as retrieval for people who are "mute and unable to use your hands" or for forensic facial recall.

Relevance

Although the pilot was run with non-disabled participants, the authors frame Mnemonic Tracing primarily as an accessibility technique, and the framing is well-grounded. Verbal image retrieval excludes anyone whose speech or language production is impaired (aphasia, anomia, locked-in syndrome, late-stage ALS, severe motor speech disorders), and existing eye-tracking interfaces typically rely on dwell-and-click pointer mechanics that are slow and visually busy. Mnemonic Tracing replaces both with a single 7-second gesture executed on blank space, which could matter for AAC users searching personal photo collections, for memory-support tools for people with dementia or aphasia, and for hands-free retrieval in lifelogging contexts. The training-free, calibration-free design lowers deployment cost, and using only fixation sequences sidesteps the privacy cost of streaming raw video to the cloud. Real limitations: the study used a 4K display with a chin rest, sighted young adults (mean age 27.3), short encoding-to-recall delays, and AI-generated stimuli, none of which represent the target accessibility population the authors invoke. Whether the technique survives naturalistic head movement, longer memory delays, age-related microsaccade drift, or aphantasia (mentioned implicitly via VVIQ correlation) is unanswered.

Tags: eye tracking · gaze interaction · gaze reinstatement · episodic memory · image retrieval · hands-free interaction · implicit interaction · wearable technology · multimodal AI