Computer-Using Agent
Also known as: CUA
An AI agent, typically built on a Large Multimodal Model, that perceives a computer's graphical user interface through screenshots, reasons about on-screen context, and directly manipulates the interface by clicking, typing, scrolling, and navigating between applications. Unlike script-based or template-driven automation, a CUA interprets layouts it has never seen before and adapts its actions in real time, enabling tasks such as online shopping, form-filling, or booking to be delegated through natural language. For accessibility, CUA offers the potential to replace rather than narrate inaccessible visual interfaces — but also introduces new risks around hallucination, verification, and user oversight when disabled users cannot directly see what the agent did.
Category: AI · Artificial Intelligence · Generative AI · Assistive Technology · Accessibility
Related: Large Multimodal Model · Large Language Model · Voice User Interface · Generative AI