Skip to content

lectrician1/awesome-interface-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 

Repository files navigation

awesome-interface-agents

List of AI tools that can interact with user interfaces. PRs welcome.

Models

VLMs

These are VLMs that support pointing / bounding boxes for user interaction.

Open source

  • Qwen 2.5-VL (Jan 2025)
  • Moondream
  • Llama 3.2 (Sep 2024): The two largest models of the Llama 3.2 collection, 11B and 90B, support image reasoning use cases, such as document-level understanding including charts and graphs, captioning of images, and visual grounding tasks such as directionally pinpointing objects in images based on natural language descriptions.
  • Molmo (Sep 2024): VLM that matches GPT-4V performance with pointing ability.
  • CogAgent (Dec 2023): CogAgent is an open-source visual language model that can identify regions and points of UIs to interact with.
  • Florence 2 (Nov 2023): Vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks including producing bounding boxes.

Closed source

  • OpenAI Operator (Jan 2025): Backed by a Computer-Using Model.
  • Claude 3.5 Computer Use (Oct 2024): Version of the Claude 3.5 model which supports computer use structured text and image tool inputs and actionable text outputs.

Segmenters

Complete solutions

Operating system

Open source

Closed source

Web browser

These are still mostly text-based

Open source

Closed source

Papers

About

List of AI tools that can interact with user interfaces

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published