Speech-Based Cursor Control
Azfar S. Karimullah, Andrew Sears · 2002 · Proceedings of the Fifth International ACM Conference on Assistive Technologies (Assets '02) · doi:10.1145/638249.638282
Summary
This paper from UMBC investigates the effectiveness of speech-based cursor control for navigating graphical user interfaces, comparing a standard cursor with a predictive cursor designed to compensate for speech recognition delays. While speech recognition is well-studied for dictation, relatively little research had examined its use for cursor control — the ability to move and click a pointing cursor by voice, which is essential for people with physical disabilities who cannot use a mouse or keyboard. The study employed continuous direction-based navigation, where users speak directional commands ("Move left," "Move right," "Move up," "Move down") to start cursor movement and "Stop" to halt it, followed by "Click" to select a target. Three inherent delays affect this interaction: the user must initiate a verbal response, the speech recognition system must process the utterance, and the system must recognise the specific word. The predictive cursor attempted to address these delays by displaying an arrow ahead of the actual cursor position, offset by the average distance the cursor would travel during the recognition delay, so users could see where the cursor would likely be when their "Stop" command was actually processed.
Key findings
Twenty-eight participants (no prior speech recognition experience, no disabilities) completed target selection tasks across three target sizes (32mm, 64mm, 128mm), three distances (per side), and eight directions. The predictive cursor did not provide statistically significant benefits over the standard cursor, likely because the recognition delays were relatively short and consistent enough that users could compensate naturally. However, the study revealed critical relationships between target size, distance, and performance. For large targets, selection time was strongly linearly related to distance (r²=0.95), with a slope of 5.56 seconds per centimetre. As targets became smaller, this relationship weakened (r²=0.58 for small targets) because positioning time — getting the cursor precisely over the target — began to dominate over movement time. Errors increased dramatically for small targets (0.71 errors/target vs. 0.06 for large), and "Stop" command usage tripled (3.32 commands/target vs. 0.24 for large). Diagonal movements required two positioning activities (one horizontal, one vertical) and became more difficult for smaller targets, but interestingly did not increase error rates — users simply spent more time to maintain accuracy. The cursor moved at 20 pixels/second (approximately 53mm/second).
Relevance
This research provided foundational empirical data on the factors affecting speech-based cursor control that remains relevant for voice-controlled interfaces. The key practical insight is that there is a minimum reliable target size determined by the interaction between cursor speed and speech recognition delay — if the cursor moves too fast relative to the delay, users cannot stop it accurately on small targets. This creates a design trade-off: faster cursor speeds reduce movement time but make positioning harder, while slower speeds improve accuracy at the cost of longer task completion times. The finding that users compensated for diagonal movement difficulty by trading speed for accuracy (rather than accepting more errors) suggests that speech-based cursor users prioritise precision when they perceive the task as difficult. For interface designers, the results argue for larger clickable targets when speech-based navigation is expected, and for exploring adaptive cursor speeds that slow down as the cursor approaches potential targets. The work also highlighted the fundamental challenge of continuous speech-based cursor control: unlike a mouse where stopping is instantaneous, the delay between saying "Stop" and the cursor halting creates an inherent overshoot problem.
Tags: speech recognition · cursor control · motor disability · pointing devices · assistive technology · voice interface · human-computer interaction