-
Notifications
You must be signed in to change notification settings - Fork 223
fix: update response model field to match routing decision #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update response model field to match routing decision #431
Conversation
The response JSON model field now correctly reflects the semantic router's decision instead of using the model name from the vLLM endpoint. Changes: - Parse response JSON and update model field to ctx.RequestModel - Re-marshal modified response for cache and client - Only modify non-streaming responses - Fallback to original response on marshal errors This ensures API consumers can determine which model was selected by examining the standard model field, rather than requiring custom headers or log inspection. Fixes vllm-project#430 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
this is weird, why does this happen? the response model should be the same with the real called ones? |
observability.Errorf("Error parsing tokens from response: %v", err) | ||
metrics.RecordRequestError(ctx.RequestModel, "parse_error") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we should update the model field, it is returned by vLLM.
I will validate if there was a configuration issue in my openshift deployment ( maybe i had mismatch between model-a/b ) |
OK! Good news. the issue was incorrect envoy configuration. I will document it here for the future. The OpenShift Envoy config I used (deploy/openshift/envoy-openshift.yaml) was using static clusters with header-based routing:
The Correct Fix: Use ORIGINAL_DST Cluster
clusters:
|
OK! Good news. the issue was incorrect envoy configuration. I will document it here for the future. The OpenShift Envoy config I used (deploy/openshift/envoy-openshift.yaml) was using static clusters with header-based routing:
The Correct Fix: Use ORIGINAL_DST Cluster
clusters:
|
Ran tests with and without the fix using the same configuration to prove effectiveness:
From testing session on 2025-10-14 at 14:32:
{"msg":"routing_decision","selected_model":"Model-A","category":"economics","selected_endpoint":"127.0.0.1:8000"}
{"msg":"routing_decision","selected_model":"Model-B","category":"psychology","selected_endpoint":"127.0.0.1:8001"}
Router correctly routes economics to Model-A (endpoint 127.0.0.1:8000)
Router correctly routes psychology to Model-B (endpoint 127.0.0.1:8001)
BUT no log line showing "Updating response model field" (fix not present)
API Response Evidence:
Economics query returned: "model": "Model-B" ❌ (WRONG - should be Model-A)
Psychology query returned: "model": "Model-A" ❌ (WRONG - should be Model-B)
With the fix:
{"msg":"routing_decision","selected_model":"Model-A","category":"economics","selected_endpoint":"127.0.0.1:8000"}
{"msg":"Updating response model field from 'Model-B' to 'Model-A' (routing decision)"}
{"msg":"routing_decision","selected_model":"Model-B","category":"psychology","selected_endpoint":"127.0.0.1:8001"}
{"msg":"Updating response model field from 'Model-A' to 'Model-B' (routing decision)"}
API Response Evidence:
Economics query returned: "model": "Model-A" ✅ (CORRECT)
Psychology query returned: "model": "Model-B" ✅ (CORRECT)