Desktop and Mobile AI Automation Agent
Based on Zhipu Open-AutoGLM, supporting desktop/Android/iOS/HarmonyOS cross-platform automation
- 🖥️ Desktop Automation - Screenshot recognition, mouse clicks, keyboard input
- 📱 Mobile Support - Android (ADB), iOS (XCTest), HarmonyOS (HDC)
- 🤖 AI Driven - Screen understanding based on GLM-4V / GPT-4o vision models
- 🎯 Goal Oriented - Describe tasks in natural language, auto-plan execution steps
- 📸 Real-time Screenshots - Auto-capture screen state after each action
- 🔄 Step Control - Support single-step execution and continuous run modes
- 🌐 Web UI - Vue 3 visual control panel
| Layer | Technology | Version | Purpose |
|---|---|---|---|
| Frontend UI | Vue + TypeScript + Vite | 3.5+ | Visual Control Panel |
| Backend API | Python + HTTP Server | 3.10+ | Agent Service |
| Desktop Ops | PyAutoGUI + Pillow | - | Screenshots and Input Simulation |
| Android | ADB | - | Device Control |
| iOS | XCTest | - | Device Control |
| HarmonyOS | HDC | - | Device Control |
| AI Model | Zhipu GLM-4V | - | Visual Understanding |
Glimmer/
├── Glimmer-UI/ # Vue 3 Frontend
│ ├── src/
│ │ ├── components/
│ │ │ ├── ChatPanel.vue # Chat Panel
│ │ │ ├── InputBar.vue # Input Bar
│ │ │ ├── ScreenshotViewer.vue # Screenshot Display
│ │ │ └── StatusIndicator.vue # Status Indicator
│ │ ├── App.vue
│ │ └── main.ts
│ └── package.json
│
├── Glimmer-Web/ # Python Backend API
│ ├── core/
│ │ ├── actions/ # Action Handlers
│ │ ├── config/ # Config and Prompts
│ │ ├── desktop/ # Desktop Operations Module
│ │ ├── model/ # Model Client
│ │ └── agent.py # Agent Core
│ ├── server.py # HTTP API Service
│ └── requirements.txt
│
└── Open-AutoGLM/ # Open Source Automation Library
├── glimmer/ # Desktop Agent
├── phone_agent/ # Mobile Agent
│ ├── adb/ # Android Control
│ ├── xctest/ # iOS Control
│ └── hdc/ # HarmonyOS Control
├── glimmer_ui/ # Original UI
└── examples/ # Usage Examples
- Node.js 18+
- Python 3.10+
- Zhipu AI GLM-4V API Key
cd Glimmer-Web
pip install -r requirements.txt
python server.py --host localhost --port 5000cd Glimmer-UI
npm install
npm run devOr use one-click startup scripts:
Windows:
cd Glimmer-UI
.\start.batLinux/macOS:
cd Glimmer-UI
chmod +x start.sh && ./start.sh- Frontend UI: http://localhost:5173
- Backend API: http://localhost:5000
- Health Check: http://localhost:5000/api/health
| Method | Path | Description |
|---|---|---|
| GET | /api/health |
Check service status |
| GET | /api/screenshot |
Get current screenshot |
| POST | /api/execute |
Execute agent step |
| POST | /api/reset |
Reset agent state |
| POST | /api/config |
Update configuration |
{
"goal": "Open Notepad and type Hello World",
"model_url": "http://localhost:8000/v1",
"model_name": "glm-4v"
}{
"ui_thought": "I see the desktop, need to open the start menu first",
"ui_focus_box": [100, 200, 150, 250],
"status": "WORKING",
"operation": {
"action": "click",
"params": {"x": 125, "y": 225}
},
"screenshot": "base64...",
"confidence": 0.95
}Configure via /api/config endpoint or at startup:
{
"model_url": "https://open.bigmodel.cn/api/paas/v4",
"model_name": "glm-4v",
"api_key": "your-zhipu-api-key",
"lang": "en"
}| Model | Provider | Description |
|---|---|---|
| GLM-4V | Zhipu AI | Recommended, excellent Chinese understanding |
| GLM-4V-Plus | Zhipu AI | Enhanced version, better for complex scenarios |
# Ensure adb device is connected
adb devices
# Use phone_agent
cd Open-AutoGLM
python main.py --device androidSee iOS Setup Guide
# Ensure hdc device is connected
hdc list targets
python main.py --device harmonyosfrom glimmer import GlimmerAgent, AgentConfig
from glimmer.model.client import ModelConfig
# Configure model
model_config = ModelConfig(
base_url="https://open.bigmodel.cn/api/paas/v4",
model_name="glm-4v",
api_key="your-zhipu-api-key"
)
# Create agent
agent = GlimmerAgent(model_config, AgentConfig())
# Execute task
while True:
result = agent.step("Open browser and search for weather")
print(f"Thought: {result.thought}")
print(f"Action: {result.action_type}")
if result.finished:
breakThis project is licensed under the MIT License.
Made with ❤️ using Vue 3, Python and Vision AI