← All reviews

Exploring Smart Speaker User Experience for People Who Stammer

Anna Bleakley, Daniel Rough, Abi Roper, Stephen Lindsay, Martin Porcheron, Minha Lee, Stuart Alan Nicholson, Benjamin R. Cowan, Leigh Clark · 2022 · Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '22) · doi:10.1145/3517428.3544823

Summary

This paper investigates how people who stammer experience smart speakers in their daily lives over a three-week period. The researchers deployed Google Nest devices in the homes of 11 participants (mean age 33, recruited through the STAMMA charity in the UK) and collected data through a two-phase diary study followed by semi-structured reflective interviews. Phase one (11 days) focused on barriers to interaction, while phase two (10 days) explored how devices integrated into daily routines. Midway through, participants were assigned unmoderated tasks — adding calendar events, playing interactive games, and managing shopping lists — to probe differences across interaction types. The study used inductive thematic analysis via NVivo and Miro to identify three core themes: how participants used devices in daily routines and speech therapy practice, the interplay between contextual and technological barriers, and strategies participants developed to adapt to device limitations. The research builds on prior work highlighting that speech-enabled technologies have not been designed with diverse speech patterns in mind, and that people who stammer are largely excluded from the design and testing processes that shape these products.

Key findings

Participants quickly integrated smart speakers into morning routines for weather, news, music, and home appliance control, but faced significant accessibility barriers. The presence of other people increased stammering during interactions, with some participants avoiding the device in communal spaces entirely. Stress, fatigue, and anxiety negatively impacted fluency, creating "good speech days" and "bad speech days" that affected interaction success. Shorter commands were more successful, while longer utterances increased stammering likelihood. Specific sounds and letters — particularly wake words beginning with hard consonants — caused consistent difficulties. The device's timeout behaviour was especially problematic during blocks, cutting users off mid-utterance and creating additional pressure. Past negative experiences with the device created a feedback loop, increasing anticipation of stammering and worsening subsequent interactions. Error recovery was critically lacking — the device did not show what it had interpreted, leaving users unable to diagnose whether failures were caused by their speech pattern or the device's limitations. Participants tended to blame themselves rather than the device. Crucially, several participants identified potential for smart speakers as speech therapy tools, using them to practise difficult sounds in a controlled, non-judgmental environment that eliminated the social pressures of human conversation.

Relevance

This study provides essential evidence for why voice-only interfaces must be designed with speech diversity in mind. The findings have direct implications for smart speaker manufacturers and ASR developers: extending timeout durations, providing transparent error feedback showing what the device heard, offering a "what went wrong" diagnostic command, and detecting stammering events rather than smoothing over them would significantly improve accessibility. The research challenges the assumption that speech is inherently a more accessible input modality — for the estimated 8% of children and 2% of adults who stammer, voice-first interfaces can actually create new barriers. The study also highlights an underexplored opportunity: smart speakers as accessible speech therapy tools that could complement clinical practice, particularly valuable given limited access to speech and language therapists. For accessibility practitioners, this work underscores that inclusive voice interface design requires involving people with diverse speech patterns in design and testing, not just retrofitting accommodations after launch.

Tags: stammering · stuttering · speech disfluency · smart speakers · voice interface · speech recognition · conversational user interface · user experience · diary study · speech and language therapy