add back chunked_prefill_for_mla param

whx-sjtu · whx-sjtu · commit 090c4b0a68f1 · 2025-08-27T16:28:09.000+08:00
Signed-off-by: whx-sjtu &lt;2952154980@qq.com&gt;
diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po
@@ -148,6 +148,9 @@ msgid ""
 " to be passed in."
 msgstr "在为MOE模型使用专家负载均衡时，需要传入专家映射路径。"
 
+#: ../../user_guide/configuration/additional_config.md
+msgid "`chunked_prefill_for_mla`"
+msgstr "`chunked_prefill_for_mla`"
 
 #: ../../user_guide/configuration/additional_config.md
 msgid "`False`"
diff --git a/docs/source/user_guide/configuration/additional_config.md b/docs/source/user_guide/configuration/additional_config.md
@@ -30,6 +30,7 @@ The following table lists the additional configuration options available in vLLM
 | `ascend_scheduler_config`     | dict | `{}` | The config options for ascend scheduler                                                       |
 | `refresh`                     | bool | `false` | Whether to refresh global ascend config content. This value is usually used by rlhf or ut/e2e test case.     |
 | `expert_map_path`             | str  | `None` | When using expert load balancing for the MOE model, an expert map path needs to be passed in. |
+| `chunked_prefill_for_mla`     | bool | `False` | Whether to enable the fused operator-like chunked_prefill. |
 | `kv_cache_dtype`     | str | `None` | When using the kv cache quantization method, kv cache dtype needs to be set, currently only int8 is supported. |
 | `enable_shared_expert_dp`     | bool | `False` | When the shared expert in DP, it has better performance but consumes more memory. Currently only DeepSeek series models are supported to use. |
 
diff --git a/examples/disaggregated_prefill_v1/README.md b/examples/disaggregated_prefill_v1/README.md
@@ -71,6 +71,8 @@ vllm serve /models/deepseek_r1_w8a8 \
   "engine_id": "0",
   "kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
   }'  \
+  --additional-config \
+  '{"chunked_prefill_for_mla":true}'
 ```
 
 Run prefill server P2 on second node: