ScratchThat: Supporting Command-Agnostic Speech Repair in Voice-Driven Assistants

Jason Wu, Karan Ahuja, Richard Li, Victor Chen, Jeffrey Bigham · 2019 · Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies · doi:10.1145/3328934

Summary

This paper presents ScratchThat, a command-agnostic speech repair system for voice-driven virtual assistants that allows users to naturally correct voice commands without repeating entire queries. Current voice assistants like Alexa, Google Assistant, and Siri lack natural methods for correcting mistakes — users must cancel and restate their entire command. ScratchThat addresses this by enabling users to append a corrective clause to their original command using natural language phrases like "actually make that" or "I meant." The system works by automatically identifying replaceable entities (parameters) in the original query through part-of-speech chunking and named entity recognition, then using a combination of semantic similarity (via the Universal Sentence Encoder) and syntactic similarity (via dependency tree analysis) to match correction entities to the appropriate original parameters. The algorithm formulates this as a minimum cost assignment problem solved with the Hungarian algorithm. Crucially, ScratchThat is command-agnostic — it works as a layer on top of any voice assistant command without requiring developers to program correction handling into each skill individually.

Key findings

The researchers conducted three evaluations. First, an elicitation study with 120 crowdworkers (573 valid responses) revealed that speech repair patterns in voice commands differ significantly from conversational speech repair models — 40% of responses contained no edit terms at all, and edit terms appeared at the beginning of correction clauses only 49% of the time, meaning traditional models that rely on fixed edit term placement would misinterpret at least 30% of corrections. Second, a user study with 10 participants comparing four correction methods (repeat, delete, replace, ScratchThat) found that ScratchThat was significantly faster than repeat (p < 0.05), replace (p < 0.01), and delete (p < 0.01), while also scoring lower on NASA TLX measures of effort and frustration compared to the delete method. Third, the algorithm achieved 77% human-rated accuracy and a 0.94 BLEU score on corrected commands, demonstrating effective automated repair despite using no training data or command-specific logic.

Relevance

This research directly addresses a significant usability barrier in voice interfaces that disproportionately affects people who rely on speech as their primary input method, including people with motor disabilities who cannot easily use touchscreens or keyboards to edit misrecognized commands. The inability to naturally correct voice commands forces users to repeat entire utterances, which is particularly burdensome for people with speech disabilities, fatigue-related conditions, or those using voice control in accessibility contexts like screen reader navigation. ScratchThat's command-agnostic design is especially valuable because it could improve error correction across all voice assistant skills without requiring individual developers to implement correction handling. For practitioners building voice interfaces, the finding that traditional speech repair models fail for command interactions suggests that correction mechanisms need to be specifically designed for the voice assistant context rather than borrowed from conversational speech research.

Tags: voice interface · speech recognition · conversational agents · error correction · natural language processing · smart speakers · virtual assistants · usability

Standards referenced: W3C Web Accessibility Initiative