Skip to content

Conversation

yossiovadia
Copy link
Collaborator

Ran tests with and without the fix using the same configuration to prove effectiveness:

From testing session on 2025-10-14 at 14:32:
{"msg":"routing_decision","selected_model":"Model-A","category":"economics","selected_endpoint":"127.0.0.1:8000"}
{"msg":"routing_decision","selected_model":"Model-B","category":"psychology","selected_endpoint":"127.0.0.1:8001"}
Router correctly routes economics to Model-A (endpoint 127.0.0.1:8000)
Router correctly routes psychology to Model-B (endpoint 127.0.0.1:8001)
BUT no log line showing "Updating response model field" (fix not present)
API Response Evidence:
Economics query returned: "model": "Model-B" ❌ (WRONG - should be Model-A)
Psychology query returned: "model": "Model-A" ❌ (WRONG - should be Model-B)

With the fix:

{"msg":"routing_decision","selected_model":"Model-A","category":"economics","selected_endpoint":"127.0.0.1:8000"}
{"msg":"Updating response model field from 'Model-B' to 'Model-A' (routing decision)"}

{"msg":"routing_decision","selected_model":"Model-B","category":"psychology","selected_endpoint":"127.0.0.1:8001"}
{"msg":"Updating response model field from 'Model-A' to 'Model-B' (routing decision)"}
API Response Evidence:
Economics query returned: "model": "Model-A" ✅ (CORRECT)
Psychology query returned: "model": "Model-B" ✅ (CORRECT)

The response JSON model field now correctly reflects the semantic router's
decision instead of using the model name from the vLLM endpoint.

Changes:
- Parse response JSON and update model field to ctx.RequestModel
- Re-marshal modified response for cache and client
- Only modify non-streaming responses
- Fallback to original response on marshal errors

This ensures API consumers can determine which model was selected by
examining the standard model field, rather than requiring custom headers
or log inspection.

Fixes vllm-project#430

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Copy link

netlify bot commented Oct 14, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 4203082
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68eef21693b9a30008c47a8e
😎 Deploy Preview https://deploy-preview-431--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

github-actions bot commented Oct 14, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/extproc/response_handler.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 15, 2025

this is weird, why does this happen? the response model should be the same with the real called ones?

observability.Errorf("Error parsing tokens from response: %v", err)
metrics.RecordRequestError(ctx.RequestModel, "parse_error")
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should update the model field, it is returned by vLLM.

@yossiovadia
Copy link
Collaborator Author

I will validate if there was a configuration issue in my openshift deployment ( maybe i had mismatch between model-a/b )
I'll update.

@yossiovadia
Copy link
Collaborator Author

OK! Good news. the issue was incorrect envoy configuration. I will document it here for the future.

The OpenShift Envoy config I used (deploy/openshift/envoy-openshift.yaml) was using static clusters with header-based routing:
routes:

  • match: { prefix: "/", headers: [{ name: "x-gateway-destination-endpoint", exact_match: "127.0.0.1:8000" }] }
    route: { cluster: model_a_cluster }
  • match: { prefix: "/", headers: [{ name: "x-gateway-destination-endpoint", exact_match: "127.0.0.1:8001" }] }
    route: { cluster: model_b_cluster }
  • match: { prefix: "/" }
    route: { cluster: model_b_cluster } # DEFAULT - always hit!
    Problem: In Envoy's processing pipeline:
    Route selection happens FIRST (based on headers)
    ExtProc filter runs SECOND (sets the x-gateway-destination-endpoint header)
    So the header was never set when routing was evaluated → always fell to default route (model_b_cluster) → everything went to Model-B!

The Correct Fix: Use ORIGINAL_DST Cluster
The local development config (config/envoy.yaml) already had the correct approach:
routes:

  • match: { prefix: "/" }
    route: { cluster: vllm_dynamic_cluster }

clusters:

  • name: vllm_dynamic_cluster
    type: ORIGINAL_DST
    lb_policy: CLUSTER_PROVIDED
    original_dst_lb_config:
    use_http_header: true
    http_header_name: "x-gateway-destination-endpoint"
    With ORIGINAL_DST, the cluster reads the header value after ExtProc sets it and routes to the correct endpoint dynamically.

@yossiovadia
Copy link
Collaborator Author

OK! Good news. the issue was incorrect envoy configuration. I will document it here for the future.

The OpenShift Envoy config I used (deploy/openshift/envoy-openshift.yaml) was using static clusters with header-based routing:
routes:

  • match: { prefix: "/", headers: [{ name: "x-gateway-destination-endpoint", exact_match: "127.0.0.1:8000" }] }
    route: { cluster: model_a_cluster }
  • match: { prefix: "/", headers: [{ name: "x-gateway-destination-endpoint", exact_match: "127.0.0.1:8001" }] }
    route: { cluster: model_b_cluster }
  • match: { prefix: "/" }
    route: { cluster: model_b_cluster } # DEFAULT - always hit!
    Problem: In Envoy's processing pipeline:
    Route selection happens FIRST (based on headers)
    ExtProc filter runs SECOND (sets the x-gateway-destination-endpoint header)
    So the header was never set when routing was evaluated → always fell to default route (model_b_cluster) → everything went to Model-B!

The Correct Fix: Use ORIGINAL_DST Cluster
The local development config (config/envoy.yaml) already had the correct approach:
routes:

  • match: { prefix: "/" }
    route: { cluster: vllm_dynamic_cluster }

clusters:

  • name: vllm_dynamic_cluster
    type: ORIGINAL_DST
    lb_policy: CLUSTER_PROVIDED
    original_dst_lb_config:
    use_http_header: true
    http_header_name: "x-gateway-destination-endpoint"
    With ORIGINAL_DST, the cluster reads the header value after ExtProc sets it and routes to the correct endpoint dynamically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants