-
Notifications
You must be signed in to change notification settings - Fork 241
feat:support for two long-context embedding models (Qwen3 and Gemma) #453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat:support for two long-context embedding models (Qwen3 and Gemma) #453
Conversation
….6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com> feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com>
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
Test failure is fixed in main branch, merging it for testing. |
fa4f5c7
into
vllm-project:feat-candle-refactoring
….6B and EmbeddingGemma-300M) (vllm-project#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com>
….6B and EmbeddingGemma-300M) (#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com>
….6B and EmbeddingGemma-300M) (vllm-project#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com> Signed-off-by: Huamin Chen <hchen@redhat.com>
….6B and EmbeddingGemma-300M) (vllm-project#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com> Signed-off-by: Huamin Chen <hchen@redhat.com>
….6B and EmbeddingGemma-300M) (vllm-project#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com> Signed-off-by: Huamin Chen <hchen@redhat.com>
What type of PR is this?
support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M)
What this PR does / why we need it:
1. Qwen3-Embedding-0.6B
Model Specifications:
Key Features:
2. EmbeddingGemma-300M
Model Specifications:
Key Features:
3. Intelligent Routing System
Routing Logic:
quality_priority > 0.7
latency_priority > 0.7
latency ≤ 0.7
)512 < seq_len ≤ 4096
+ Balancedseq_len > 4096
Features:
4. Enhanced API Endpoints
4.1 Embedding Generation API
Endpoint:
POST /api/v1/embeddings
Request:
Response:
4.2 Cosine Similarity Calculation API
Endpoint:
POST /api/v1/similarity
Request:
Response:
4.3 Batch Similarity Matching API
Endpoint:
POST /api/v1/similarity/batch
Request:
Response:
4.4 Embedding Models Information API
Endpoint:
GET /api/v1/embeddings/models
Request:
Response:
Configuration
config.yaml
Which issue(s) this PR fixes:
part of #266
Release Notes: Yes/No