Deaf Users' Preferences Among Wake-Up Approaches during Sign-Language Interaction with Personal Assistant Devices

Vaishnavi Mande, Abraham Glasser, Becca Dingman, Matt Huenerfauth · 2021 · Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA '21) · doi:10.1145/3411763.3451592

Summary

This CHI 2021 Extended Abstract investigates a narrow but previously unexplored question: if future personal-assistant devices (Alexa, Google Assistant, etc.) could recognise sign language, how should Deaf users wake them up? Current wake-up mechanisms — speaking a wake-word like 'Alexa' or 'Ok Google' — are inaccessible to Deaf and Hard-of-Hearing (DHH) users, and text-input workarounds on these devices do not offer the hands-free equivalence hearing users enjoy. Computer-vision-based sign-language recognition is maturing, so the authors (from Rochester Institute of Technology's ASL-fluent research lab) frame this as a design-space exploration rather than a product evaluation. The work is structured as two linked studies. Study 1 is a formative interview study with 21 DHH ASL signers (18-25 years old, mostly college-educated, recruited via university poster advertisements, interviewed face-to-face in ASL) to elicit wake-up ideas, analysed via affinity mapping. Six approaches emerged: four 'talk-to-talk' methods (signing an ASL sign-name for the device, fingerspelling its English name, waving in its direction, clapping) and two 'push-to-talk' methods (a smartphone app, a physical remote). Study 2 used a Wizard-of-Oz protocol to film video prototypes of all six approaches with a DHH actor and an Amazon Echo Show, which were then shown to 12 DHH ASL signers (21-29, within-subjects, Latin-Square counterbalancing, $40 compensation) for ranking and open-ended discussion, again analysed via affinity mapping.

Key findings

Ranking from most to least preferred: (1) ASL sign-name for the device, (2) waving in its direction, (3) clapping, (4) physical remote, (5) phone app, (6) fingerspelling the English name. Sign-name won because it feels fast, specific enough to avoid accidental wake-ups (unlike generic waving), culturally natural for Deaf users, and does not require carrying an extra device; its main downside is environmental dependency (lighting, camera distance). Waving was appealing because it mirrors the culturally acceptable Deaf convention for getting attention and because users' hands stay in position to sign the command, but it was seen as prone to false positives from background motion. Fingerspelling ranked lowest because it is slower, error-prone for misspelled device names, and uncomfortable for signers with finger-motor difficulties. Two cross-cutting factors dominated preferences: convenience (hands-free, no extra device, comparable to hearing users' wake-word experience) and reliability/privacy (avoiding accidental wake-ups and being concerned about an always-on camera listening in on ASL conversations). Users prioritised convenience over the privacy-reliability benefits of push-to-talk methods.

Relevance

For accessibility practitioners and designers of conversational AI, this paper is a rare worked example of what it takes to genuinely match DHH users' expectations of equivalent access — not just a text-input retrofit but a full-stack sign-first interaction, including the attention-getting phase. The findings reinforce long-standing HCI wisdom that users will tolerate small reliability losses for large convenience gains, but also surface a distinctly Deaf-cultural concern: an always-on camera is a much more invasive surveillance vector than an always-on microphone for people whose primary language is visual. Limitations are significant: sample size is small (21 + 12), participants were homogenous (young, college-educated, mostly RIT-affiliated), and the prototypes were Wizard-of-Oz videos rather than working systems, so preferences may shift once users encounter real recognition errors. Future designers of sign-aware assistant devices should treat sign-name registration and a talk-to-talk default as the baseline expectation.

Tags: deaf and hard of hearing · sign language · personal assistants · voice interface · conversational user interfaces · american sign language · deaf culture · privacy · qualitative research