-
Notifications
You must be signed in to change notification settings - Fork 281
Description
Is your feature request related to a problem? Please describe.
Currently, vLLM Semantic Router provides powerful intelligent routing capabilities including BERT-based classification, semantic caching, PII detection, and tool selection through OpenAI-compatible APIs. However, users must interact with these features through curl commands or direct API calls, which creates barriers for:
- Non-technical users who want to benefit from semantic routing
- Teams that need to visualize routing decisions and model selections
- Administrators who want to configure routing rules through a GUI
- Users who need to monitor cache hit rates and performance metrics
- Organizations that want to manage multiple model endpoints and their weights easily
The lack of a modern web interface limits adoption and makes it difficult to fully leverage the semantic router's capabilities.
Describe the solution you'd like
Would like to integrate vLLM Semantic Router with OpenWebUI to provide a comprehensive web-based interface for semantic routing capabilities. The solution should include:
Core Integration Features:
- Configure OpenWebUI to use vLLM Semantic Router as the backend API endpoint
- Enable model discovery through the
/v1/modelsendpoint including the special "auto" model - Support for all existing OpenAI-compatible API functionality through the web interface
- Docker Compose setup for easy deployment of the integrated stack
Enhanced User Interface:
- Visual indicators showing which model was selected for each request
- Display of classification confidence scores and routing decisions
- Cache hit/miss indicators in the chat interface
- Model selection interface showing routing capabilities and weights
Configuration and Management:
- Web-based GUI for editing semantic routing rules and intent categories
- Threshold adjustment controls for classification confidence
- Model weight and priority management interface
- Filter configuration for PII detection and prompt guard settings
Monitoring and Analytics:
- Dashboard widgets showing cache hit rates and performance metrics
- Model usage statistics and response time analytics
- Classification accuracy metrics and routing decision logs
- Real-time health monitoring of model endpoints
Additional context
Existing Foundation:
- vLLM Semantic Router already provides OpenAI-compatible endpoints at port 8801
Expected Benefits:
- Lower barrier to entry for non-technical users
- Visual feedback on routing decisions and performance optimization
- Centralized management of multiple model endpoints
- Cost optimization through visible cache efficiency metrics
- Enhanced security through integrated PII detection and prompt guard features
Implementation Phases:
- Basic Integration: OpenWebUI configuration, Docker setup, model discovery
- Enhanced Features: Routing visualization, configuration interface, metrics dashboard
- Advanced Integration: Custom UI components, admin panel, advanced monitoring