Morae: Proactively Pausing UI Agents for User Choices

Yi-Hao Peng, Dingzeyu Li, Jeffrey P. Bigham, Amy Pavel · 2025 · Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST '25) · doi:10.1145/3746059.3747797

Summary

This paper introduces Morae, a UI agent that proactively pauses during automated task execution to involve blind and low-vision (BLV) users in critical decisions, rather than completing tasks end-to-end without user input. The work is motivated by a field study with four BLV participants using UI agents for everyday web tasks over one week, which revealed that while agents dramatically improved task efficiency (completing 40% of tasks versus 25% with screen readers alone, five times faster), they frequently made arbitrary choices without consulting users — in 95% of cases where multiple valid options existed, participants were never informed of alternatives. Among 576 feasible automation queries across 40 platforms, 19% were underspecified and 13% presented multiple options the agent chose between silently. Morae addresses this through Dynamic Verification of Ambiguous Choices, a three-stage framework that at each automation step: (1) identifies critical actions requiring user preferences, (2) generates and answers ambiguity-verification questions using a self-ask-then-answer strategy based on the user query, current UI state, and action history, and (3) combines these assessments in a decision function that determines whether to proceed, pause for clarification, or gather more UI details. When pausing, Morae dynamically generates accessible UI components (radio buttons, text fields) with proper heading levels for screen reader navigation, allowing users to express preferences in a structured manner.

Key findings

In technical evaluation across 256 tasks covering 8 UI types, Morae achieved the highest task success rate of 55.2%, outperforming OpenAI Operator (53.1%) and all other baselines. The key improvement came from pause-required tasks, where Morae increased success from 50.8% (Operator) to 65.6%. In a user study with 10 BLV participants across Target, Google Calendar, and Google Docs tasks, Morae significantly outperformed both TaxyAI and OpenAI Operator across all eight subjective measures — Satisfaction with Choices (6.80/7), Awareness of Choices (6.50/7), Control over Choices (6.70/7), and Perceived Usefulness (6.50/7). Participants made significantly more preference-aligned choices with Morae (mean 4.03) compared to Operator (2.98) and TaxyAI (1.92). Decision entropy was notably higher with Morae (1.58) versus Operator (0.86) and TaxyAI (0.22), indicating greater diversity and autonomy in user selections. Morae provided real-time audio feedback — distinct sounds for clicking, typing, encountering ambiguity, and completing actions — plus screen-reader-specific task guidance (e.g., NVDA shortcuts for navigating Gmail). One participant successfully used the agent to interact with a Chinese website while issuing commands in Polish, demonstrating cross-language potential. Participants also observed that making interfaces accessible to screen reader users simultaneously makes them easier for AI agents to navigate.

Relevance

This paper addresses a fundamental tension in AI-powered accessibility: the trade-off between automation efficiency and user agency. For accessibility practitioners, it demonstrates that simply automating away inaccessible interfaces is insufficient — users need to remain active decision-makers, not passive recipients of agent choices. The five design opportunities identified (active choice at decision points, clear preference input, real-time feedback, result verification, and task literacy) provide a practical framework for designing any accessible AI agent. The finding that 19% of real-world user commands are underspecified is a crucial data point for anyone building AI assistive tools — agents must handle ambiguity gracefully rather than making silent assumptions. The connection between screen reader accessibility and AI agent navigability ("if you make an interface accessible to screen reader users, you are also likely making it easier for AI to navigate") has significant strategic implications for the business case for web accessibility. The paper also contributes the first dataset of real-world BLV interactions with UI agents (638 queries), enabling future research.

Tags: UI agents · blind and low vision · large language models · human-agent interaction · user agency · mixed-initiative interaction · web accessibility · screen readers · artificial intelligence