A Conversational Interface to Web Automation

Tessa Lau, Julian Cerruti, Guillermo Manzato, Mateo Bengualid, Jeffrey P. Bigham, Jeffrey Nichols · 2010 · UIST '10: Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology · doi:10.1145/1866029.1866067

Summary

This paper presents CoCo (CoScripter Concierge), a system that automates web tasks on a user's behalf through short textual commands delivered via low-bandwidth channels such as Twitter, SMS, email, or a command line. Given a natural language command like "get road conditions for highway 88" or "forward phone calls to home," CoCo synthesizes a plan by searching two repositories of web interaction knowledge: the CoScripter public script repository (human-created web automation scripts in a sloppy natural language) and the user's personal CoScripter Reusable History (CRH) logs of their actual web browsing activity. CoCo uses TF/IDF vector-space ranking to find relevant scripts, executes them in a pool of real Firefox browsers on a server using the Highlight browser automation framework, and returns extracted results to the user. The system features a hybrid parser that balances structured syntax for parameters (using keywords like "for," "using," "as") with freeform text for the core command, allowing graceful degradation when users don't follow exact syntax. For mining scripts from browsing history, CoCo segments the continuous stream of actions by time gaps (5-minute threshold) and URL changes, then ranks segments against the user's query. A novel auto-clipping algorithm automatically extracts the most relevant content from the final web page visited during script execution, using geometric clustering of DOM nodes and bag-of-words scoring against the original command. CoCo also maintains conversational context: it confirms unfamiliar scripts before execution, remembers previously approved scripts to skip confirmation next time, and recalls parameters from prior interactions to streamline repeated tasks.

Key findings

A Mechanical Turk study with 1,219 trials evaluated whether users could determine if CoCo's textual script representations correctly matched intended tasks — a key trust question since CoCo executes automation server-side without visual feedback. Across 20 tasks with matched and mismatched task/visualization pairings, participants were correct 77.3% of the time overall. The script-only condition (text description of steps, which is what CoCo would provide) achieved a 76.0% success rate — comparable to conditions that included screenshots (75.1-77.2%) and only marginally below the video-only condition (85.1%). This demonstrated that CoCo's textual confirmation interface provides users with sufficient information to evaluate automation correctness, performing no worse than visual alternatives. The system successfully completed most of the described usage scenarios in practice, including forwarding phone calls via a VOIP web application, retrieving highway conditions, and logging bugs in an enterprise tracker. The auto-clipping algorithm successfully extracted relevant information from web pages, including cases where the most useful content appeared on intermediate pages rather than the final page (handled by incremental auto-clipping). However, the history-mining feature saw limited use because CRH logs lack descriptive titles and the search algorithm could only match on link text and button labels, making recall poor.

Relevance

CoCo is a remarkably prescient system that anticipated the rise of conversational AI assistants by several years — predating Siri (2011), Google Now (2012), and Alexa (2014). While not explicitly an accessibility tool, the system has significant accessibility implications. By enabling web tasks through short text commands on low-bandwidth channels, CoCo effectively makes complex multi-step web interactions accessible to anyone who cannot easily use a visual browser — including blind users navigating inaccessible websites, people with motor impairments who find multi-step form filling difficult, and situationally disabled users (driving, hands busy). The connection to the TrailBlazer system (which guided blind users through web tasks using the same CoScripter script repository) makes this lineage explicit. For accessibility practitioners, CoCo demonstrates several enduring principles: that natural language interfaces can abstract away inaccessible visual interfaces; that repositories of human-demonstrated procedures can power automation without requiring AI to understand arbitrary web pages from scratch; and that trust is a critical design concern when systems act on users' behalf — particularly relevant for people who cannot visually verify that automation is proceeding correctly. The security concerns the authors raise about automation via public channels (Twitter) foreshadow ongoing debates about AI assistant safety and authorization.

Tags: web automation · conversational interfaces · natural language · intelligent assistants · programming by demonstration · task automation