Skip to content

Commit 090c4b0

Browse files
committed
add back chunked_prefill_for_mla param
Signed-off-by: whx-sjtu <2952154980@qq.com>
1 parent 75060b6 commit 090c4b0

File tree

3 files changed

+6
-0
lines changed

3 files changed

+6
-0
lines changed

docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,9 @@ msgid ""
148148
" to be passed in."
149149
msgstr "在为MOE模型使用专家负载均衡时,需要传入专家映射路径。"
150150

151+
#: ../../user_guide/configuration/additional_config.md
152+
msgid "`chunked_prefill_for_mla`"
153+
msgstr "`chunked_prefill_for_mla`"
151154

152155
#: ../../user_guide/configuration/additional_config.md
153156
msgid "`False`"

docs/source/user_guide/configuration/additional_config.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ The following table lists the additional configuration options available in vLLM
3030
| `ascend_scheduler_config` | dict | `{}` | The config options for ascend scheduler |
3131
| `refresh` | bool | `false` | Whether to refresh global ascend config content. This value is usually used by rlhf or ut/e2e test case. |
3232
| `expert_map_path` | str | `None` | When using expert load balancing for the MOE model, an expert map path needs to be passed in. |
33+
| `chunked_prefill_for_mla` | bool | `False` | Whether to enable the fused operator-like chunked_prefill. |
3334
| `kv_cache_dtype` | str | `None` | When using the kv cache quantization method, kv cache dtype needs to be set, currently only int8 is supported. |
3435
| `enable_shared_expert_dp` | bool | `False` | When the shared expert in DP, it has better performance but consumes more memory. Currently only DeepSeek series models are supported to use. |
3536

examples/disaggregated_prefill_v1/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ vllm serve /models/deepseek_r1_w8a8 \
7171
"engine_id": "0",
7272
"kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
7373
}' \
74+
--additional-config \
75+
'{"chunked_prefill_for_mla":true}'
7476
```
7577

7678
Run prefill server P2 on second node:

0 commit comments

Comments
 (0)