Skip to content

Integrate vLLM Semantic Router with OpenWebUI #231

@Xunzhuo

Description

@Xunzhuo

Is your feature request related to a problem? Please describe.

Currently, vLLM Semantic Router provides powerful intelligent routing capabilities including BERT-based classification, semantic caching, PII detection, and tool selection through OpenAI-compatible APIs. However, users must interact with these features through curl commands or direct API calls, which creates barriers for:

  • Non-technical users who want to benefit from semantic routing
  • Teams that need to visualize routing decisions and model selections
  • Administrators who want to configure routing rules through a GUI
  • Users who need to monitor cache hit rates and performance metrics
  • Organizations that want to manage multiple model endpoints and their weights easily

The lack of a modern web interface limits adoption and makes it difficult to fully leverage the semantic router's capabilities.

Describe the solution you'd like

Would like to integrate vLLM Semantic Router with OpenWebUI to provide a comprehensive web-based interface for semantic routing capabilities. The solution should include:

Core Integration Features:

  • Configure OpenWebUI to use vLLM Semantic Router as the backend API endpoint
  • Enable model discovery through the /v1/models endpoint including the special "auto" model
  • Support for all existing OpenAI-compatible API functionality through the web interface
  • Docker Compose setup for easy deployment of the integrated stack

Enhanced User Interface:

  • Visual indicators showing which model was selected for each request
  • Display of classification confidence scores and routing decisions
  • Cache hit/miss indicators in the chat interface
  • Model selection interface showing routing capabilities and weights

Configuration and Management:

  • Web-based GUI for editing semantic routing rules and intent categories
  • Threshold adjustment controls for classification confidence
  • Model weight and priority management interface
  • Filter configuration for PII detection and prompt guard settings

Monitoring and Analytics:

  • Dashboard widgets showing cache hit rates and performance metrics
  • Model usage statistics and response time analytics
  • Classification accuracy metrics and routing decision logs
  • Real-time health monitoring of model endpoints

Additional context

Existing Foundation:

  • vLLM Semantic Router already provides OpenAI-compatible endpoints at port 8801

Expected Benefits:

  • Lower barrier to entry for non-technical users
  • Visual feedback on routing decisions and performance optimization
  • Centralized management of multiple model endpoints
  • Cost optimization through visible cache efficiency metrics
  • Enhanced security through integrated PII detection and prompt guard features

Implementation Phases:

  1. Basic Integration: OpenWebUI configuration, Docker setup, model discovery
  2. Enhanced Features: Routing visualization, configuration interface, metrics dashboard
  3. Advanced Integration: Custom UI components, admin panel, advanced monitoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions