Scopist: Building a Skill Ladder into Crowd Transcription
Jeffrey P. Bigham, Kristin Williams, Nila Banerjee, John Zimmerman · 2017 · Proceedings of the 14th International Web for All Conference (W4A) · doi:10.1145/3058555.3058562
Summary
This paper introduces Scopist, a JavaScript application designed to teach crowd workers stenotype — a chording-based text entry method used by professional real-time captioners — while they perform audio transcription microtasks. The research addresses a fundamental problem in crowd work platforms: while traditional employment provides skill ladders (progressive opportunities to acquire new skills leading to advancement), crowd transcription work typically does not. Audio transcription is one of the most common tasks on platforms like Amazon Mechanical Turk, comprising 26% of all microtasks, yet workers earn only $0.01-0.02 per sentence with no path to advancement. In contrast, professional real-time captioners using stenotype can earn up to $300 per hour, but learning stenotype traditionally requires 1-3 years of dedicated training. Scopist bridges this gap by introducing stenotype chords one at a time during regular transcription microtasks, allowing workers to gradually incorporate chording into their touch-typing workflow. The system uses an algorithm that intercepts keyboard events and distinguishes between touch-typing and stenotype input, accepting both as valid entries. Scopist maps stenotype chords to a standard QWERTY keyboard using the Open Steno Project dictionaries, and provides visual guidance through an on-screen keyboard showing which keys to press for each chord. The tool is built as an in-browser application on Express/Node.js, designed to integrate seamlessly into existing crowdsourcing platforms.
Key findings
Three crowd studies were conducted using Amazon Mechanical Turk. Study 1 (150 crowdworkers) demonstrated that Scopist can distinguish touch-typing from stenotype input with 94% accuracy — only 9 of 150 collected keysets contained errors. Studies 2 and 3 (20 and 30 participants respectively) examined the impact of learning a new chord on transcription performance. Results showed a significant learning cost: in the text transcription study, workers completed tasks faster in the baseline condition (M=30.0s) than the chording condition (M=36.8s), used more backspace corrections with chording (M=7.0 vs M=4.9), and pressed more total keys (M=175.2 vs M=167.4). The audio transcription study showed even greater performance costs, with baseline completion at M=53.4s versus M=80.0s for chording. All differences were statistically significant (p<0.025). These results confirm that learning a new skill during work imposes a short-term performance cost, consistent with prior research on keyboard shortcut adoption. However, this initial slowdown is expected to diminish as workers develop muscle memory, and the long-term efficiency gains from stenotype (200-300 WPM versus 90 WPM touch-typing) represent a substantial career advancement opportunity. The researchers designed an improved UI based on established principles for learning keyboard shortcuts: maintaining visibility of chord mappings, supporting physical rehearsal, leveraging spatial memory through visual cues, and ensuring stable shortcuts.
Relevance
This research sits at the intersection of accessibility and labour economics in a way that benefits both communities. For people who are deaf or hard of hearing, the availability of real-time captioning is critical for accessing live events, education, and workplace communication — yet there is a chronic shortage of qualified captioners. By creating a pathway from low-skilled crowd transcription to professional real-time captioning, Scopist could help address this supply gap. The skill ladder concept also has broader implications for crowd work design: rather than treating microtasks as dead-end labour, platforms could be designed to progressively build workers' capabilities toward valuable accessibility careers including captioning, audio description, and image description. The honest acknowledgment that learning on the job imposes performance costs is important for platform designers — workers need information about predicted long-term gains to make informed decisions about investing effort in skill development. The open-source approach using the Open Steno Project makes this work accessible to the broader community.
Tags: crowdsourcing · captioning · stenography · deaf accessibility · transcription · skill development · text entry · real-time captioning