Chorus: A Crowd-Powered Conversational Assistant

Walter S. Lasecki, Rachel Wesley, Jeffrey Nichols, Anand Kulkarni, James F. Allen, Jeffrey P. Bigham · 2013 · UIST '13: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology · doi:10.1145/2501988.2502057

Summary

This paper presents Chorus, a crowd-powered conversational assistant that enables users to hold natural, continuous conversations with what appears to be a single partner, but is actually backed by multiple crowd workers operating in real-time. The system addresses a fundamental limitation of automated dialogue systems of the era: their inability to handle the full complexity of human conversation, including context-dependent references, memory of past interactions, common sense knowledge, and incomplete or ambiguous statements. Chorus tackles three core challenges in making crowd-powered conversation work. First, a collaborative reasoning system has workers both propose and vote on responses, with a majority-vote mechanism determining when a response is "locked in" and forwarded to the user. This filters out off-topic, unhelpful, or redundant responses. Second, a game-theoretic incentive mechanism uses a multi-tiered point system (20 points per interaction, 1000 for a vote toward an accepted response, 3000 for proposing the accepted response) that rewards quality contributions while diminishing returns for repeated inputs without user feedback, discouraging spam. Third, a curated memory system maintains a shared "Working Memory" space where a separate set of crowd workers highlight important facts and conversation history, enabling new workers joining mid-conversation to quickly get up to speed. The system was evaluated through a formative study with a single user and 33 workers, followed by a larger experiment with 12 end users and 100 crowd workers across four conditions: Filtered (full Chorus), Unfiltered (no voting), Solo (single worker), and a Google baseline.

Key findings

Chorus demonstrated strong performance across multiple metrics. In consistency experiments, the system answered 92.85% of user queries appropriately, with 95.60% of responses visible to the user being on-topic after the voting filter removed 37.5% of crowd-generated messages. Only 3.77% of questions went fully unaddressed. Response speed improved dramatically with multiple workers: the Filtered condition averaged 50.13 seconds per response (51.5% faster than the Solo condition at 103.4 seconds, p < 0.05), while the Unfiltered condition was fastest at 44.6 seconds (56.9% reduction, p < 0.01). Importantly, the crowd-powered conditions met 100% of information goals across all 24 trials, while the Solo condition failed in a third of cases — the single worker was unresponsive in 2 tests and failed to finish within the 30-minute bound in 2 more. Memory experiments showed the crowd could recall facts from working memory in 8 of 10 conversations, with prompted crowds displaying a cascade effect where highlighting forgotten facts helped workers recover previously missed information. Users preferred Chorus over Google search, noting the conversational format surfaced suggestions they would not have found through keyword queries alone, such as asking clarifying questions and proactively offering related information.

Relevance

While Chorus is not explicitly an accessibility tool, it has significant implications for accessible technology design. The paper explicitly discusses situational disability — the driving scenario where a user cannot use their hands or eyes — and envisions Chorus as a voice-enabled assistant for such contexts. The concept of combining human intelligence with machine scalability to create conversational interfaces prefigures modern AI assistants, but with humans providing the understanding that automated systems of 2013 could not. For accessibility practitioners, several insights are transferable: the memory curation system addresses a challenge common to any assistive service involving rotating support personnel (such as relay services or remote support); the incentive design for maintaining quality in real-time crowd work applies to crowdsourced accessibility services like audio description or captioning; and the finding that conversational interfaces surface information that keyword search misses is relevant to designing accessible search alternatives. The paper also raises important privacy considerations, noting that crowd-powered systems require users to share personal information with strangers — a concern especially relevant when the information involves disability-related needs or medical details.

Tags: crowdsourcing · conversational assistants · human computation · dialog systems · real-time systems · situational disability