Adaptive Time Windows for Real-Time Crowd Captioning

Matthew J. Murphy, Christopher D. Miller, Walter S. Lasecki, Jeffrey P. Bigham · 2013 · CHI EA '13: CHI '13 Extended Abstracts on Human Factors in Computing Systems · doi:10.1145/2468356.2468360

Summary

This paper addresses a key barrier to real-time captioning access for deaf and hard of hearing people: the high cost of professional stenographers, who can charge up to $200 per hour. Building on the Legion:Scribe system, which demonstrated that groups of non-expert crowd workers can collectively caption speech in real-time by each transcribing different portions of audio that are then automatically merged, the authors tackle a specific limitation of that approach — its use of fixed-length audio segments assigned to each worker regardless of individual typing ability or fatigue levels. The paper introduces an adaptive time window method that dynamically adjusts the duration of audio segments presented to each crowd worker based on their ongoing performance. The system uses a weight-learning algorithm that calculates a performance score for each worker after every segment, factoring in typing speed, accuracy, and an error penalty. This weight is then used to compute personalized in-periods (when audio is louder and the worker should type) and out-periods (when audio is quieter and the worker rests). Faster, more accurate typists receive longer in-periods with more content to transcribe, while workers who are slower or showing signs of fatigue receive shorter, more manageable segments. The system operates within the existing Legion:Scribe web interface where workers hear audio at varying volumes to indicate when they should be actively captioning.

Key findings

In experiments with 24 Mechanical Turk workers (12 per condition), the adaptive segment approach produced substantial improvements over fixed segments. Average coverage — the proportion of spoken words successfully captured — increased by 54.15%, rising from 14.76% to 22.76% (p < 0.05). The F1 score, measuring the harmonic mean of precision and recall, improved by 44.33%, from 0.242 to 0.349 (p < 0.05). Accuracy declined slightly from 84.33% to 80.11%, though this change was not statistically significant, and latency improved marginally from 5.05 to 4.98 seconds. The authors note that while 22.76% coverage may appear low in absolute terms, the baseline assumption for any single worker covering their assigned content is approximately 25% due to speaker speed variations, meaning the adaptive approach achieved roughly 91% of the theoretical per-worker coverage goal. The weight-learning formula uses an exponentially weighted moving average (controlled by discount factor alpha) to balance historical and recent performance, allowing the system to respond to fatigue or changes in speaking pace.

Relevance

This research represents an early and important exploration of making real-time captioning more affordable and accessible through crowdsourcing. While professional CART services remain the gold standard, their cost creates significant access barriers, particularly for informal settings like classroom discussions, meetings, or social events where booking a stenographer is impractical. The adaptive approach demonstrates that even non-expert typists can contribute meaningfully to captioning when the system accommodates individual differences in skill and stamina. For accessibility practitioners, this work highlights the potential of human-computation approaches to supplement or replace expensive professional services, democratizing access to real-time text alternatives for spoken content. The paper is a 2013 work-in-progress poster, so the sample size is small and the system relied on ground-truth transcripts for performance measurement — a limitation the authors acknowledge. Nevertheless, the core insight that adaptive task allocation improves crowd worker performance has broad implications for designing any accessibility system that relies on distributed human effort.

Tags: real-time captioning · crowdsourcing · deaf and hard of hearing · assistive technology · human computation · CART alternatives