Skip to content

Conversation

chris668899
Copy link
Contributor

What this PR does / why we need it?

This PR add new function of : npugraph_batch_size can dynamic adjust to different model; before this PR, the npugraph_batch_sizes given from vllm to vllm-ascend always too large, and that may result in ERROR while running on different, with the information: "The resources are insufficient".
Now, with this PR, the code can dynamic adjust npugraph_batch_sizes depend on the model hidden_layer_nums and parallel config, for example:
a. for Qwen2.5-7B, the npugraph_batch_size length is 33 total;
b. for Qwen2.5-72B, the npugraph_batch_size length is 11 total;

@chris668899 chris668899 changed the title Add Func: npugraph_batch_size auto-adjust to different model [WIP]Add Func: npugraph_batch_size auto-adjust to different model Apr 29, 2025
@chris668899 chris668899 force-pushed the main branch 4 times, most recently from 89b6a0f to d800e95 Compare April 30, 2025 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant