-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. #27760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a temporary fix for a torch.compile issue with the Qwen2.5 VL vision model. The change involves commenting out the @support_torch_compile decorator for the Qwen2_5_VisionBlock, which effectively disables compilation for this block and avoids the Unsupported: Dynamic slicing with Tensor arguments error. This is a reasonable and effective short-term solution to unblock users while a more permanent fix for the underlying issue is investigated. The change is correct and I approve it.
Signed-off-by: Roger Wang <hey@rogerw.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @huachenheli I tried testing this pretty extensively, but it is my first major feature work in vLLM so I am not surprised I missed something
That said, I never observed this error in my testing so think more context is needed on this PR.
Specifically:
- What versions (torch/vllm primarily) are you using?
- What command are you running to get this error?
Once those are provided on the PR, please re-ping me so I can get to work on fixing :) Thanks!
Updated my PR description with more details. PTAL. |
You should be also able to do this without modifying the code by passing |
|
I should have a fix ready pretty soon (within the hour) - the issue here is that compile doesn't support slices with tensors yet (but @laithsakka has a PR supporting this on nightly - see pytorch/pytorch#165074) So for now, we can move this to a custom op, and once we upgrade torch version to include Laith's fix we can move this outside the custom op :) |
|
Please see #27764 @huachenheli @ywang96 |
|
That said we should land and cherrypick this PR into release - the compile integration on the VisionBlock specifically needs more hardening before we push it to general release |
…rily. (vllm-project#27760) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
|
@Lucaskabela @ywang96 I think we also need to a way to safe guard this as it requires a very new version of Pytorch with that specific PR to be able to handle the dynamic slicing. |
Purpose
After #23207, Qwen2.5 VL's vision model has dynamic slicing issue on cuda with torch.compile. Temporarily disabling it for now.
cc. @Lucaskabela
Repro:
Command:
with forced SDPA backend in layer.py:
vllm & torch versions:
Test Plan
local vllm
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.