Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds

Gierad Laput, Walter S. Lasecki, Jason Wiese, Robert Xiao, Jeffrey P. Bigham, Chris Harrison · 2015 · Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI 2015) · doi:10.1145/2702123.2702416

Summary

Zensors presents a novel sensing platform that combines real-time human intelligence from online crowd workers with machine learning to create adaptive, rapidly deployable intelligent sensors. The system addresses a fundamental gap in smart environment technology: traditional sensors are expensive to install, produce raw data that does not align with the natural language questions people actually want to ask, and computer vision systems require significant development effort while remaining brittle in changing environments. Zensors allows users to repurpose old smartphones, tablets, or inexpensive WiFi cameras as sensor hosts. Through a mobile app, users select a region of the camera view, type a natural language question (such as "how many glasses need a refill?" or "is the door open?"), and choose a data type (yes/no, number, scale, or multiple choice). The system dispatches camera images to Mechanical Turk crowd workers who provide immediate human-level answers. These crowd-labeled responses simultaneously serve as training data for machine learning classifiers that gradually take over the sensing task. When ML accuracy reaches a sufficient threshold, the system hands off to automated classification, with periodic crowd validation to detect accuracy degradation from environmental changes. The researchers conducted an application space study with 13 interaction designers, interviewed nine freelance developers to estimate CV development costs (averaging ,044 and 4.5 weeks per sensor), and deployed 16 sensors across four real-world environments over periods of 10 days to 3 weeks.

Key findings

The crowd-ML hybrid approach achieved mean crowd accuracy of 77.4% (median 76.0%), with the best sensor reaching 96.8% accuracy (kappa 0.859). The image similarity rejection mechanism filtered out an average of 61.2% of redundant images, significantly reducing crowd costs. Quantitative, context-free questions ("is the door closed?") performed substantially better than subjective ones ("how orderly is the line?"), highlighting the importance of question design. ML classifiers could approximate crowd-level accuracy using roughly one week of training data, enabling automated sensors costing as little as .40 to train. A fully crowd-powered sensor taking images every 10 minutes cost approximately /month, but costs could be reduced by 67% by scaling down from three workers to one as crowd-ML agreement increased. Sensors that failed did so primarily due to poorly designed questions or subjective judgments, not computer vision limitations. The system demonstrated resilience to environmental changes by automatically reverting from ML to crowd power when accuracy dropped, such as after the first snowfall in a parking lot.

Relevance

This research is significant for accessibility because it was co-authored by Jeffrey P. Bigham, a leading accessibility researcher, and builds on concepts from VizWiz, which enables blind users to get visual questions answered by crowd workers. Zensors demonstrates how human-in-the-loop AI systems can bridge the gap between what automated systems can perceive and what people actually need to know about their environments. For accessibility practitioners, the approach offers a model for creating adaptive assistive technologies that combine human judgment with machine learning — starting with reliable human intelligence and gradually automating where possible. The system also raises important considerations about privacy in camera-based sensing, offering subregion masking and image obfuscation as mitigation strategies. The finding that question design significantly impacts accuracy has direct implications for designing accessible interfaces that rely on crowd or AI-generated descriptions of visual content.

Tags: crowdsourcing · human computation · computer vision · smart environments · machine learning · assistive technology · sensor systems