Voice over Workplace (VoWP): Voice Navigation in a Complex Business GUI
Frankie James, Jeff Roelands · 2002 · Proceedings of the Fifth International ACM Conference on Assistive Technologies (Assets 02) · doi:10.1145/638249.638285
Summary
This paper explores the design of voice navigation interfaces for complex business GUIs, specifically SAP Workplace, to support physically disabled users who cannot use a mouse or keyboard. The authors conducted two user studies examining the fundamental trade-off in voice interfaces between user efficiency (keeping commands short and interactions fast) and ambiguity handling (dealing with situations where a spoken command could refer to multiple interface targets). The SAP Workplace GUI presents particular challenges because it contains multiple screen areas (LaunchPad, WorkSpace, mini-apps) where elements can share identical labels across different contexts. The first study compared two common approaches: rigid scoping, which requires users to navigate explicitly through a hierarchy to reach targets, and relaxed scoping, a more flexible approach that allows users to name targets directly with the system resolving ambiguities. Ten participants with mixed SAP experience completed tasks using both interfaces. The second study built on the first by introducing implicit scoping with a prioritization scheme and RESOLV (Representational Enumerated Semi-transparent Overlaid Labels for Voice) icons — visual overlays that appear when ambiguous targets exist, showing users the number and location of matching items to help them disambiguate. The research was motivated by ADA requirements and the growing number of people with physical disabilities in the workforce.
Key findings
In the first study, users strongly preferred relaxed scoping over rigid scoping for its efficiency, despite rigid scoping being more effective at preventing ambiguities. Users found rigid scoping too time-consuming because it required navigating through hierarchical levels before reaching a target. However, relaxed scoping introduced ambiguity problems that users found confusing — the interface would sometimes open duplicate copies of transactions when ambiguous commands were not properly resolved. The second study found that implicit scoping with prioritized focus resolution successfully maintained the efficiency of relaxed scoping while substantially improving ambiguity handling. RESOLV icons were rated highly for learnability (mean 7.6/10), visibility (3.76/5), and effectiveness (3.67/5), though their attractiveness and size received neutral ratings. Users appreciated that RESOLV icons clearly indicated when ambiguities existed and how many alternative targets were available. A key finding was that most users did not naturally notice ambiguity situations — they needed explicit visual cues to recognize when their commands could match multiple targets. Users also requested audio feedback to confirm that the system heard and understood their commands, highlighting the importance of multimodal feedback even in voice-primary interfaces.
Relevance
This paper addresses a critical gap in accessibility: making complex enterprise software usable through voice alone. While much accessibility work focuses on screen readers for blind users, this research targets people with physical disabilities who can see the screen but cannot use traditional input devices. The findings have direct relevance for modern voice control systems like Dragon NaturallySpeaking, macOS Voice Control, and Windows Speech Recognition, all of which face the same fundamental trade-offs between efficiency and ambiguity handling. The concept of RESOLV icons — visual disambiguation overlays — anticipated features now common in voice control systems where numbered labels appear next to clickable targets. The research demonstrates that voice interfaces for complex GUIs require thoughtful scoping strategies rather than simple command-and-control approaches. For practitioners building accessible enterprise applications, the key lesson is that voice navigation must balance discoverability (users need to know what they can say) with efficiency (commands must be concise), and that visual feedback for voice interactions is essential for usability.
Tags: voice interface · speech recognition · physical disability · GUI accessibility · navigation · enterprise software · user study
Standards referenced: ADA