Developing usable CAPTCHAs for blind users
Jonathan Holman, Jonathan Lazar, Jinjuan Heidi Feng, John D'Arcy · 2007 · Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '07) · doi:10.1145/1296843.1296894
Summary
This short ASSETS 2007 poster from a Towson University team led by Jonathan Lazar — joined by Jonathan Holman, Jinjuan Heidi Feng, and John D'Arcy of Notre Dame — addresses the long-standing accessibility problem that text-based CAPTCHAs (the distorted-letter image puzzles used to keep web bots out of forms) completely exclude blind users, while audio CAPTCHAs read distorted spoken text that becomes unusable as background noise is added to defeat speech recognition. After interviewing a group of blind users to identify their biggest security and privacy challenges — CAPTCHA emerged as a leading concern — the authors prototype a new picture-plus-sound CAPTCHA. Each puzzle presents a randomly chosen real-world object as both an image and an audio clip of the sound the object makes (e.g. a photograph of a train paired with a chugging-train sound; a bird, cat, drum, or piano), and the user picks the object's name from a drop-down list seeded with plausible distractors. Fifteen picture/sound pairs across four categories — transportation, animals, weather, and musical instruments — were chosen for cross-cultural recognisability. The prototype was tested with both Window-Eyes and JAWS for screen-reader compatibility and the authors note that, in principle, only the drop-down labels would need translating to make the system multilingual. The paper situates itself relative to CMU's ESP-game-style image-CAPTCHA work while extending it with the audio modality.
Key findings
A preliminary evaluation with five blind and five sighted participants found the new CAPTCHA usable for both groups. Sighted users averaged 3.5 minutes (SD 35 s) to solve all fifteen puzzles with no errors; notably, most chose to listen to the sound clips even though they could see the images. Blind users averaged 8.8 minutes (SD 88 s), with three participants making one error and two making two errors — all errors were resolved on the second attempt. Subjective feedback was strongly positive from both groups and especially from blind participants, all of whom said they would prefer the new CAPTCHA to existing text-based ones. The authors acknowledge two security limitations: a finite catalogue of easily identifiable image/sound pairs makes brute-force enumeration plausible (mitigated, as with text CAPTCHAs, by attempt-count lockouts), and increasing the cognitive demand of the puzzle is identified as future work. They also note that current image-recognition and sound-recognition software cannot reliably identify natural objects from photos and ambient sounds, providing the asymmetry CAPTCHAs depend on.
Relevance
For accessibility practitioners, this paper sits in the critical pre-history of multimodal-CAPTCHA design and is one of the early ASSETS contributions on usable security for blind users — a research thread that Lazar and colleagues have continued in later work. The picture-plus-sound design anticipates the direction the wider CAPTCHA field eventually took (e.g. Google reCAPTCHA's image-grid challenges and audio-fallback combinations) and the practical takeaway — that any security challenge presented to users must offer non-visual paths that are not just bolted on as a degraded fallback — remains directly relevant to today's accessible-security work. Limitations are substantial and the authors are honest about them: only ten participants total, no longitudinal use, no measurement of bot resistance against real adversarial attacks, and a small fixed catalogue of fifteen objects. Modern advances in computer vision and audio classification have largely closed the human-machine asymmetry the paper relies on, so the specific design is now obsolete, but the underlying insight — that accessible CAPTCHAs must be designed in from the beginning, not retrofitted — is more important than ever as machine learning erodes every CAPTCHA modality in turn.
Tags: CAPTCHA · audio CAPTCHA · blindness and low vision · screen readers · web accessibility · security · usability · universal usability · Turing test · visual impairment · multimodal · JAWS · Window-Eyes