-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm #27748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+8
−5
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
002d181
fix vit rotary embedding on ROCm platform to be torch compile compatible
vllmellm e723dc2
pre-commit fixes
vllmellm bc6e808
remove rotary embedding func from flash_attn lib and retrieve the ori…
vllmellm 725f07d
keep triton rotary embedding for models that are not torch compile co…
vllmellm d4d130e
fix dropout_p dtype for glm vision models when aiter package enabled
vllmellm f0efc00
Merge remote-tracking branch 'origin/main' into vit-rotary-embed-fix
vllmellm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vllmellm @DarkLight1337 What do you think? Should we keep the triton rotary embedding for backwards compatibility at this moment, by adding a condition
torch.compiler.is_compiling()instead as there are other model definition files that are not usingtorch.compileyet.E.g. model definition files that are not using
torch.compilevllm/model_executor/models/dots_ocr.pyvllm/model_executor/models/ernie45_vl.pyvllm/model_executor/models/glm4_1v.pyvllm/model_executor/models/qwen2_vl.pyvllm/model_executor/models/siglip2navit.pyUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing to note at this point is that the
torch.compilehas been disabled as there are issues with dynamic slicing.So if we add this condition
torch.compiler.is_compiling(), then we don't need to postpone this PR while ensuring no performance regression for both casestorch.compileis enabled/disabled.Right now, they are exploring new fixes e.g. #27764
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two cents: I see the benefit of gating with
torch.compiler.is_compiling(), especially for models we don't have covered withtorch.compileyet, so I am in favor of that approach as opposed to completely removingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on adding a condition on checking if it's torch compiled.
On a side note - is this PR really considered a bugfix? I thought after #27760 things should already work on AMD. Did I miss something here?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ywang96 PR #27760 is a bugfix by reverting the
torch.compilePR. If thetorch.compileis reenable, AMD needs this bugfix PR to address the incompatibility of theflash_attn.rotary_embeddingtriton kernel withtorch.compile. The discussion of current comment thread is to address the case where other models still haven't hadtorch.compilesupport.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in this PR we found that
torch.compile-ed rotary embedding is faster than triton implementation on ROCm. I think it will be the same case on CUDA.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I was just confirming whether the current main branch is broken on AMD (because it'll affect our release), and it seems that it's not because we currently disabled it, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ywang96 On current main branch, the unit tests are healthy and the Qwen3 VL accuracy are normal. 👍
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tjtanaa @Lucaskabela those model path are not using this dispatching function. it is only used in
qwen2_vl.py. Whileqwen2.5_vl.pyandglm4_1v.pyare importing theapply_rotary_pos_emb_visionfunction fromqwen2_vl.pyand inside this function the dispatch is used.vllm/vllm/model_executor/models/qwen2_vl.py
Lines 314 to 320 in 3696050
@DarkLight1337 @Lucaskabela @tjtanaa we may decide to keep the triton embedding function here only for one case that is
glm4_1v.pyor not ?