-
Notifications
You must be signed in to change notification settings - Fork 466
[Multi-modality][performance] enable DP for ViT in Qwen-2.5-VL #2709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong <liujunhong11@huawei.com>
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables Data Parallelism (DP) for the Vision Transformer (ViT) in Qwen-2.5-VL, which can improve performance for smaller models by avoiding tensor parallelism overhead. The changes introduce a use_data_parallel
flag and a new execution path for DP. My review found a critical issue in the implementation where an incorrect attribute access would lead to a runtime error. I've provided a code suggestion to fix this.
vllm_ascend/models/qwen2_5_vl.py
Outdated
def _normalize_grid_thw(self, grid_thw: Union[torch.Tensor, list[list[int]]]) -> torch.Tensor: | ||
if isinstance(grid_thw, list): | ||
grid_thw = torch.tensor(grid_thw, device=self.device) | ||
elif not isinstance(grid_thw, torch.Tensor): | ||
raise TypeError(f"Expected input type is torch.Tensor or list of lists, got {type(grid_thw)}") | ||
return grid_thw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.nn.Module
does not have a .device
attribute, so calling self.device
will raise an AttributeError
at runtime. A more robust approach to get the module's device is to inspect one of its parameters, for example, by using next(self.parameters()).device
.
def _normalize_grid_thw(self, grid_thw: Union[torch.Tensor, list[list[int]]]) -> torch.Tensor: | |
if isinstance(grid_thw, list): | |
grid_thw = torch.tensor(grid_thw, device=self.device) | |
elif not isinstance(grid_thw, torch.Tensor): | |
raise TypeError(f"Expected input type is torch.Tensor or list of lists, got {type(grid_thw)}") | |
return grid_thw | |
def _normalize_grid_thw(self, grid_thw: Union[torch.Tensor, list[list[int]]]) -> torch.Tensor: | |
if isinstance(grid_thw, list): | |
device = next(self.parameters()).device | |
grid_thw = torch.tensor(grid_thw, device=device) | |
elif not isinstance(grid_thw, torch.Tensor): | |
raise TypeError(f"Expected input type is torch.Tensor or list of lists, got {type(grid_thw)}") | |
return grid_thw |
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong <liujunhong11@huawei.com>
Issue 2607 fix bug in test
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
This PR is associated with #2607 which enables DP for ViT in Qwen-2.5-VL.
There are multiple reasons that we should have ViT implemented as a DP:
The ViT are small models, the TP all reduce incurred a larger overhead than the gain from accelerating through TP.
ViT are not captured in cuda graphs or torch compile graph, thus the kernel overhead and all reduce overhead will be higher.
Does this PR introduce any user-facing change?
add the arg selection for mm-encoder-tp-mode for data-parallelism, below is an example for DP for ViT and TP4 for LLM backbone
vllm serve
/workspace/models/Qwen2.5-VL-3B-Instruct
--port 5580 --host 0.0.0.0
--max-num-seqs 128 --dtype bfloat16 --max-model-len=8192
--no-enable-prefix-caching --trust-remote-code -tp 4
--allowed-local-media-path /workspace/l00807937/
--gpu-memory-utilization=0.93
--enforce-eager
--mm-encoder-tp-mode data ##
How was this patch tested?
vllm: 0.10.0RC1
vllm-ascend: 0.10.0RC1
Benchmark test