Agent S: an open agentic framework that uses computers like a human
-
Updated
Oct 31, 2025 - Python
Agent S: an open agentic framework that uses computers like a human
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
[AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615
Official implementation of UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
[CVPR 2025] Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.
[ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
💻 Control AI agents to automate tasks on computers, enabling true autonomy with browser, terminal, and desktop interaction. Perfect for developers.
Add a description, image, and links to the gui-agents topic page so that developers can more easily learn about it.
To associate your repository with the gui-agents topic, visit your repo's landing page and select "manage topics."