-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile #27764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile #27764
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
|
Can you update the PR description? I think if you put "```" and text on the same line it won't show any more. |
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
7c50ed1 to
50ffd98
Compare
| for i in range(1, len(cu_seqlens)): | ||
| start_idx = cu_seqlens[i - 1] | ||
| end_idx = cu_seqlens[i] | ||
| q_i = q[:, start_idx:end_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dumb question: How does this fix the tensor slicing issue? It seems that you simply put the whole SDPA into a separate function but the code remains identical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is when torch.compile tries to trace this code - by wrapping it in a custom_op and calling that, the internals of the function are not traced (so the tracing bug does not trigger)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it is not just that we move it into a function, but that we also leverage the custom_op mechanism here to make it opaque
|
@ywang96 @ProExpertProg @zou3519 for review |
Could we add this fix to the 2.9.1 list? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The particular fix is from @laithsakka - see pytorch/pytorch#165074 Let me check on the ability to cherry pick this; it falls in a gray area since it is improving traceability but not necessarily fixing a critical bug introduced by 2.9 AFAIK so it may be out of scope |
…able compile (vllm-project#27764) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
…able compile (vllm-project#27764) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Purpose
See title - this PR fixes an error caused by #23207 where the SDPA backend could not be compiled.
This was because a fix for tracing SliceVariables containing single element tensors is not yet landed (but should be in Pytorch2.10)
Until then, we shall use a custom op to prevent graph break
Test Plan
With edit to the
vision_langauge.pyfile to use TORCH_SDPA (as this is not supported on commandline)E2E:
and
Test Results
E2E
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.