A Comparison of Voice Controlled and Mouse Controlled Web Browsing
Kevin Christian, Bill Kules, Ben Shneiderman, Adel Youssef · 2000 · Proceedings of the Fourth International ACM Conference on Assistive Technologies (Assets '00) · doi:10.1145/354324.354345
Summary
This within-subjects study from the University of Maryland compared voice-controlled web browsing with traditional mouse-based browsing using 18 participants. The researchers used Conversa, a commercial voice browser from Conversational Computing that renders pages visually while accepting spoken navigation commands — users could speak the text of a link or an associated number to follow it. The study tested three common hypertext navigation structures: a linear slide show (sequential pages with First/Next/Last/Previous links), a grid/tiled map (4x4 geographic map navigated by cardinal directions), and a hierarchical tree menu (64 pages about Cyprus organized in a 4x3 branching structure). Each structure was tested under three input conditions: mouse only, voice with textual links, and voice with numbered links. The experiment was carefully designed with counterbalanced treatment sequences to compensate for order effects, custom-built web pages to evoke specific navigation patterns, and paper-based test administration to avoid interference with task performance. A pilot study led to procedure refinements including explicitly directing participants to read all questions before beginning tasks, as users tended to start before fully understanding instructions.
Key findings
Voice control added approximately 50% to task completion times compared to mouse navigation for slide show and hierarchical menu tasks, with statistically significant differences (slide show: p=.011; menu: p<.001). However, for the tiled map navigation task, there was no significant difference between voice and mouse (p=.76), suggesting that voice is equally effective when navigation commands are a small, predictable vocabulary (North, South, East, West). No significant differences were found between the two voice treatments (text links vs. numbered links) in completion time, but subjective satisfaction ratings strongly favored textual links over numbered links (p<.001 for all three satisfaction questions). Users found numbered links required an extra cognitive step — reading the number, converting text to a number, then speaking it — whereas textual links allowed natural speech of the desired destination. Error rates for both voice treatments were low, with misinterpreted commands being negligible. The researchers observed that users adapted their speech patterns during voice browsing, pausing between words and occasionally needing to repeat commands, and that the "Go Back" command was frequently confused with navigating to a previous slide, revealing the dual meaning of "back" in different navigation contexts.
Relevance
This early empirical study of voice-controlled web browsing produced insights that remain relevant to modern voice interface design. The finding that voice navigation works best with small, predictable command vocabularies anticipates the design principles behind successful modern voice assistants. The strong user preference for textual over numbered links is particularly important for practitioners designing voice-navigable interfaces today — natural language commands aligned with visible content are more intuitive than arbitrary numbering schemes. The 50% time overhead for voice versus mouse is a useful baseline for understanding the performance cost of voice interaction, though modern speech recognition has likely narrowed this gap. For web developers, the paper reinforces that navigation structure matters for voice usability: flat, predictable structures with clear directional commands work well, while deep hierarchies with many similarly-named links create friction. The observation about the ambiguity of "back" in different contexts remains a persistent challenge in voice interface design.
Tags: voice browser · speech recognition · web navigation · voice control · hypertext · user research · usability testing · web accessibility · input methods
Standards referenced: VoiceXML