We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent f1fe025 commit 0b80c6aCopy full SHA for 0b80c6a
examples/run_dp_attention_etp16.sh
@@ -18,6 +18,6 @@ nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSee
18
--max-num-batched-tokens 32768 \
19
--block-size 128 \
20
--no-enable-prefix-caching \
21
- --additional-config '{"torchair_graph_batch_sizes":[24],"expert_tensor_parallel_size":16,"use_cached_npu_graph":true,"ascend_scheduler_config":{},"enable_graph_mode":true}' \
+ --additional-config '{"torchair_graph_config":{"enabled":true,"use_cached_graph":true,"graph_batch_sizes":[24]},"ascend_scheduler_config":{"enabled":true},"expert_tensor_parallel_size":16}' \
22
--gpu-memory-utilization 0.96 &> run.log &
23
disown
0 commit comments