Automation of Repetitive Web Browsing Tasks with Voice-Enabled Macros
Yevgen Borodin · 2008 · Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '08) · doi:10.1145/1414471.1414552
Summary
This paper proposes an approach for automating repetitive web browsing tasks through personalized macros with a speech-enabled interface, implemented within the HearSay non-visual web browser at Stony Brook University. The core problem is that non-visual aural web browsing remains significantly less efficient than visual browsing, and many everyday web tasks — checking email, weather, news, paying bills, shopping — are performed repeatedly. While macro tools exist for visual interfaces and productivity software like Microsoft Office, no solutions existed for automating these tasks with screen readers. The macro recorder tracks keyboard shortcuts, mouse clicks, and content changes in form elements, storing actions in XML format with each action associated with an HTML DOM tree element identified by XPath. Rather than using absolute XPath (which breaks when page structure changes), the system learns the shortest relative XPath that uniquely addresses a given DOM element, providing fault tolerance against structural changes in web pages.
Key findings
The system supports two types of macro recordings: page-specific (tied to a particular URL) and page-independent (replayable on any page with the same structure). For example, a macro recorded for purchasing from Amazon.com starting at a product description page can be replayed for any Amazon product. Macros can handle multi-step transactions end-to-end without user intervention, or can pause at interaction points where human input is needed — such as entering a username and password, or confirming payment details. Users can customize macros by marking page content as "variables" to be filled in at replay time, and can add custom prompts or specify which parts of pages should be read aloud during replay. The paper illustrates the approach through a detailed use case of a legally blind user paying a cell phone bill at AT&T Wireless, where a recorded macro automatically opens a new tab, navigates the payment flow, and reads back confirmation details. Macros can also be shared among users contributing to the Social Accessibility project, creating a shared repository that increases the chances of common tasks being automated for the broader blind user community.
Relevance
This work addresses the significant productivity gap between visual and non-visual web browsing by recognizing that automation of routine tasks is one of the most practical ways to improve efficiency. For blind users who may spend many times longer than sighted users completing the same web transactions, macros that reduce multi-minute tasks to a single voice command represent a substantial quality-of-life improvement. For accessibility practitioners, the approach demonstrates that accessibility solutions need not always focus on making every element of every page accessible — sometimes the most impactful intervention is to bypass the page-by-page navigation entirely for known tasks. The integration with the Social Accessibility project's shared repository is particularly notable, as it enables a community-driven library of automated tasks that any blind user can benefit from. The relative XPath addressing strategy also provides a practical solution to the fragility of web automation tools when pages are updated.
Tags: screen reader · web accessibility · non-visual browsing · web macro · voice interface · web automation · blind users · assistive technology