-
Notifications
You must be signed in to change notification settings - Fork 333
[v0.9.1]add data preprocess functions to qwen2.5_vl_without_padding #1705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
31d0981
to
812ec8c
Compare
Ready for review. @wangxiyuan @leo-pony @as12138 |
q = torch_npu.npu_rotary_mul(q, cos, sin) | ||
k = torch_npu.npu_rotary_mul(k, cos, sin) | ||
|
||
q = torch_npu.npu_rotary_mul(q, cos.to(q.device), sin.to(q.device)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put this Tensor.to(device)
out of the model run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix RuntimeError:
Expected all tensors to be on the same device,
but found at least two devices, npu:0 and cpu! (when checking argument for argument r1 in method wrapper__npu_rotary_mul)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but since you already customize the modeling, can you make the cos sin cache to be npu tensor to prevent the h2d operation layer by layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but since you already customize the modeling, can you make the cos sin cache to be npu tensor to prevent the h2d operation layer by layer?
Thanks for your advice. In order to make the PR more concise, this PR is only used to supplement the missing function of vl_without_padding.py. For Tensor.to(device), the PR will be submitted later after resuming the experimental results.
I'm wondering how dose this PR been tested? just reuse the ut of qwenvl? |
Yes, reuse ut of qwenvl. |
Signed-off-by: zheliuyu <15750543867@163.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
…2148) ### What this PR does / why we need it? Cherry pick #1705 from v0.9.1-dev Compared qwen2_5_vl.py, qwen2_5_vl_without_padding.py missing some funtions. The purpose of this PR is to supplement these. add: - rot_pos_emb(self, grid_thw: torch.Tensor) - get_window_index(self, grid_thw) - _process_image_input(self, image_input) - _process_video_input(self, video_input) Co-authored-by: zheliuyu [15750543867@163.com](mailto:15750543867@163.com) Co-authored-by: wangli [wangli858794774@gmail.com](mailto:wangli858794774@gmail.com) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@207b750 Signed-off-by: wangli <wangli858794774@gmail.com>
bug report@as12138
What this PR does / why we need it?
Compared qwen2_5_vl.py, qwen2_5_vl_without_padding.py missing some funtions. The purpose of this PR is to supplement these.
add:
Does this PR introduce any user-facing change?
N/A
Same as qwen2_5_vl.py
How was this patch tested?
N/A
Same as qwen2_5_vl.py