[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile #27764

Lucaskabela · 2025-10-29T18:51:13Z

Purpose

See title - this PR fixes an error caused by #23207 where the SDPA backend could not be compiled.

This was because a fix for tracing SliceVariables containing single element tensors is not yet landed (but should be in Pytorch2.10)

Until then, we shall use a custom op to prevent graph break

Test Plan

python examples/offline_inference/vision_language.py -m qwen2_5_vl

With edit to the vision_langauge.py file to use TORCH_SDPA (as this is not supported on commandline)

E2E:

with-proxy vllm serve Qwen/Qwen2.5-VL-3B-Instruct --gpu_memory_utilization=.85 --mm-encoder-attn-backend TORCH_SDPA

and

with-proxy vllm bench serve   --backend openai-chat   --model Qwen/Qwen2.5-VL-3B-Instruct   --endpoint /v1/chat/completions   --dataset-name hf   --dataset-path lmarena-ai/VisionArena-Chat   --hf-split train   --num-prompts 1000

Test Results

--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white petals covering the branches, and the sky is
--------------------------------------------------
This image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white flowers covering the branches. The sky is clear
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in the Odaiba district of Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white petals covering the branches, adding
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees, which are in full bloom, creating a picturesque scene against the clear blue sky. The cherry blossoms are in various stages of bloom, with some fully open and others still
--------------------------------------------------

E2E

============ Serving Benchmark Result ============
Successful requests:                     1000      
Failed requests:                         0         
Benchmark duration (s):                  144.15    
Total input tokens:                      94327     
Total generated tokens:                  106396    
Request throughput (req/s):              6.94      
Output token throughput (tok/s):         738.11    
Peak output token throughput (tok/s):    14908.00  
Peak concurrent requests:                1000.00   
Total Token throughput (tok/s):          1392.49   
---------------Time to First Token----------------
Mean TTFT (ms):                          71754.34  
Median TTFT (ms):                        71486.27  
P99 TTFT (ms):                           138089.08 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          694.75    
Median TPOT (ms):                        665.18    
P99 TPOT (ms):                           1462.31   
---------------Inter-token Latency----------------
Mean ITL (ms):                           603.27    
Median ITL (ms):                         58.55     
P99 ITL (ms):                            1621.47   
==================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2025-10-29T18:51:57Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Lucaskabela.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

huachenheli · 2025-10-29T18:54:38Z

Can you update the PR description? I think if you put "```" and text on the same line it won't show any more.

Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

huachenheli · 2025-10-30T03:05:05Z

vllm/attention/ops/vit_attn_wrappers.py

+    for i in range(1, len(cu_seqlens)):
+        start_idx = cu_seqlens[i - 1]
+        end_idx = cu_seqlens[i]
+        q_i = q[:, start_idx:end_idx]


A dumb question: How does this fix the tensor slicing issue? It seems that you simply put the whole SDPA into a separate function but the code remains identical?

The problem is when torch.compile tries to trace this code - by wrapping it in a custom_op and calling that, the internals of the function are not traced (so the tracing bug does not trigger)

So it is not just that we move it into a function, but that we also leverage the custom_op mechanism here to make it opaque

Lucaskabela · 2025-10-30T21:28:55Z

@ywang96 @ProExpertProg @zou3519 for review

ProExpertProg · 2025-10-30T21:47:39Z

This was because a fix for tracing SliceVariables containing single element tensors is not yet landed (but should be in Pytorch2.10)

Could we add this fix to the 2.9.1 list?

ProExpertProg

LGTM

Lucaskabela · 2025-10-30T22:01:34Z

This was because a fix for tracing SliceVariables containing single element tensors is not yet landed (but should be in Pytorch2.10)

Could we add this fix to the 2.9.1 list?

The particular fix is from @laithsakka - see pytorch/pytorch#165074

Let me check on the ability to cherry pick this; it falls in a gray area since it is improving traceability but not necessarily fixing a critical bug introduced by 2.9 AFAIK so it may be out of scope

…able compile (vllm-project#27764) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

mergify bot added the qwen Related to Qwen models label Oct 29, 2025

mergify bot added the needs-rebase label Oct 29, 2025

Lucaskabela mentioned this pull request Oct 29, 2025

[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. #27760

Merged

5 tasks

Lucaskabela added 2 commits October 29, 2025 17:00

Move sdpa to custom op until tensor slicing supported

13454ad

Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

Reenable compiling vision block

50ffd98

Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

Lucaskabela force-pushed the lucaskabela/qwen2_5_compile_sdpa_fix branch from 7c50ed1 to 50ffd98 Compare October 30, 2025 00:01

Lucaskabela marked this pull request as ready for review October 30, 2025 00:01

Lucaskabela requested review from LucasWilkinson and sighingnow as code owners October 30, 2025 00:01

mergify bot removed the needs-rebase label Oct 30, 2025

huachenheli reviewed Oct 30, 2025

View reviewed changes

tjtanaa mentioned this pull request Oct 30, 2025

[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm #27748

Merged

5 tasks

Lucaskabela changed the title ~~[Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op until tensor slicing supported~~ [Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op until tensor slicing supported and reenable compile Oct 30, 2025

Lucaskabela changed the title ~~[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op until tensor slicing supported and reenable compile~~ [Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile Oct 30, 2025

ProExpertProg approved these changes Oct 30, 2025

View reviewed changes

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025

zou3519 approved these changes Oct 31, 2025

View reviewed changes

Merge branch 'main' into lucaskabela/qwen2_5_compile_sdpa_fix

27be675

simon-mo merged commit 55011ae into vllm-project:main Nov 3, 2025
55 checks passed

zhaozuy pushed a commit to zhaozuy/vllm that referenced this pull request Nov 4, 2025

[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reen…

6c0186f

…able compile (vllm-project#27764) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

omerpaz95 pushed a commit to omerpaz95/vllm that referenced this pull request Nov 4, 2025

[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reen…

54b8d5d

…able compile (vllm-project#27764) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

Kay-Tian mentioned this pull request Nov 4, 2025

vLLM PR #27764 变更核心文件提醒 Kay-Tian/vllm#83

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile #27764

[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile #27764

Lucaskabela commented Oct 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Oct 29, 2025

Uh oh!

huachenheli commented Oct 29, 2025

Uh oh!

huachenheli Oct 30, 2025

Uh oh!

Lucaskabela Oct 30, 2025

Uh oh!

Lucaskabela Oct 30, 2025

Uh oh!

Lucaskabela commented Oct 30, 2025

Uh oh!

ProExpertProg commented Oct 30, 2025

Uh oh!

ProExpertProg left a comment

Uh oh!

Lucaskabela commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile #27764

[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile #27764

Conversation

Lucaskabela commented Oct 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

E2E:

Test Results

E2E

Uh oh!

mergify bot commented Oct 29, 2025

Uh oh!

huachenheli commented Oct 29, 2025

Uh oh!

huachenheli Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Lucaskabela Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Lucaskabela Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Lucaskabela commented Oct 30, 2025

Uh oh!

ProExpertProg commented Oct 30, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Lucaskabela commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Lucaskabela commented Oct 29, 2025 •

edited by github-actions bot

Loading

Lucaskabela commented Oct 30, 2025 •

edited

Loading