awesome-interface-agents

List of AI tools that can interact with user interfaces. PRs welcome.

Models

VLMs

These are VLMs that support pointing / bounding boxes for user interaction.

Open source

Qwen 2.5-VL (Jan 2025)
Moondream
Llama 3.2 (Sep 2024): The two largest models of the Llama 3.2 collection, 11B and 90B, support image reasoning use cases, such as document-level understanding including charts and graphs, captioning of images, and visual grounding tasks such as directionally pinpointing objects in images based on natural language descriptions.
Molmo (Sep 2024): VLM that matches GPT-4V performance with pointing ability.
CogAgent (Dec 2023): CogAgent is an open-source visual language model that can identify regions and points of UIs to interact with.
Florence 2 (Nov 2023): Vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks including producing bounding boxes.

Closed source

OpenAI Operator (Jan 2025): Backed by a Computer-Using Model.
Claude 3.5 Computer Use (Oct 2024): Version of the Claude 3.5 model which supports computer use structured text and image tool inputs and actionable text outputs.

Segmenters

Complete solutions

Operating system

Open source

Qwen 2.5-VL Cookbook
OpenAdapt.AI: AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
ScreenAgent
Mobile-Agent
UI-ACT: An AI agent for interacting with a computer using the graphical user interface
OpenInterpreter: Uses code to interact with operating system.
AIOS: Can interact with operating system as backend.

Closed source

Manus AI March 2025
Claude 3.5 Computer Use Cookbook
Adept: Company looking to automate user interface interaction through ML

Web browser

These are still mostly text-based

Open source

Skyvern: Browser automation software
AgentLLM
LaVague

Closed source

OpenAI Operator: A system using the Computer-Using Agent (CUA) model to interact with the user interface and ask for clarification from the user in your browser.
Google Project Mariner: Browser extention to interact with pages.
HyperWrite AI Agent

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

awesome-interface-agents

Models

VLMs

Open source

Closed source

Segmenters

Complete solutions

Operating system

Open source

Closed source

Web browser

Open source

Closed source

Papers

About

Uh oh!

Releases

Packages

Uh oh!

lectrician1/awesome-interface-agents

Folders and files

Latest commit

History

Repository files navigation

awesome-interface-agents

Models

VLMs

Open source

Closed source

Segmenters

Complete solutions

Operating system

Open source

Closed source

Web browser

Open source

Closed source

Papers

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages