Skip to content

Conversation

whx-sjtu
Copy link
Collaborator

@whx-sjtu whx-sjtu commented Sep 12, 2025

Fix world size bug in model_runner.

Signed-off-by: whx-sjtu <2952154980@qq.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the MoE communication method selection by using world_size_across_dp instead of the total world_size, which is appropriate for data-parallel scenarios. My review identifies a critical oversight: the corresponding unit test has not been updated to reflect this logic change, leaving the fix unverified. An update to the test is required to ensure the correctness of this change.

moe_comm_method = "allgather"
elif soc_version in {AscendSocVersion.A2}:
if num_tokens <= self.mc2_tokens_capacity and self.parallel_config.world_size >= 16:
if num_tokens <= self.mc2_tokens_capacity and self.parallel_config.world_size_across_dp >= 16:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While the change to use world_size_across_dp is correct for selecting the MoE communication method in a data-parallel setup, the corresponding unit test needs to be updated.

The test test_select_moe_comm_method in tests/ut/worker/test_model_runner_v1.py still mocks parallel_config.world_size and will likely fail or pass incorrectly after this change.

To ensure this bug fix is properly tested, please update the unit test to mock parallel_config.world_size_across_dp instead.

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: whx-sjtu <2952154980@qq.com>
@Yikun
Copy link
Collaborator

Yikun commented Sep 13, 2025

(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720] Traceback (most recent call last):
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/executor/multiproc_executor.py", line 259, in collective_rpc
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     result = get_response(w, dequeue_timeout,
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/executor/multiproc_executor.py", line 239, in get_response
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     status, result = w.worker_response_mq.dequeue(
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/distributed/device_communicators/shm_broadcast.py", line 507, in dequeue
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     with self.acquire_read(timeout, cancel) as buf:
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 137, in __enter__
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     return next(self.gen)
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]            ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/distributed/device_communicators/shm_broadcast.py", line 469, in acquire_read
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     raise TimeoutError
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720] TimeoutError
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720] 
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720] The above exception was the direct cause of the following exception:
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720] 
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720] Traceback (most recent call last):
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/engine/core.py", line 711, in run_engine_core
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     engine_core.run_busy_loop()
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/engine/core.py", line 738, in run_busy_loop
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     self._process_engine_step()
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/engine/core.py", line 764, in _process_engine_step
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/engine/core.py", line 292, in step
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     model_output = self.execute_model_with_error_logging(
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/engine/core.py", line 278, in execute_model_with_error_logging
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     raise err
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/engine/core.py", line 269, in execute_model_with_error_logging
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     return model_fn(scheduler_output)
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/executor/multiproc_executor.py", line 176, in execute_model
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     (output, ) = self.collective_rpc(
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]                  ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]   File "/__w/vllm-ascend/vllm-ascend/vllm-empty/vllm/v1/executor/multiproc_executor.py", line 268, in collective_rpc
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720]     raise TimeoutError(f"RPC call to {method} timed out.") from e
(EngineCore_DP0 pid=3023) ERROR 09-13 04:19:06 [core.py:720] TimeoutError: RPC call to execute_model timed out.
Error: ed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
Error:  TBE Subprocess[task_distribute] raise error[], main process disappeared!
Error:  TBE Subprocess[task_distribute] raise error[], main process disappeared!
Error:  TBE Subprocess[task_distribute] raise error[], main process disappeared!
Error:  TBE Subprocess[task_distribute] raise error[], main process disappeared!
Error:  TBE Subprocess[task_distribute] raise error[], main process disappeared!
Error:  TBE Subprocess[task_distribute] raise error[], main process disappeared!
Error:  TBE Subprocess[task_distribute] raise error[], main process disappeared!
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
FAILED

=================================== FAILURES ===================================
________________ test_models_distributed_Qwen3_MOE_TP2_WITH_EP _________________

    def test_models_distributed_Qwen3_MOE_TP2_WITH_EP():
        example_prompts = [
            "Hello, my name is",
        ]
        max_tokens = 5
        with VllmRunner(
                "Qwen/Qwen3-30B-A3B",
                tensor_parallel_size=2,
                enable_expert_parallel=True,
                distributed_executor_backend="mp",
                enforce_eager=False,
        ) as vllm_model:
>           vllm_model.generate_greedy(example_prompts, max_tokens)

tests/e2e/multicard/test_qwen3_moe.py:56: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/e2e/conftest.py:230: in generate_greedy
    outputs = self.generate(prompts,
tests/e2e/conftest.py:165: in generate
    req_outputs = self.model.generate(inputs,
vllm-empty/vllm/entrypoints/llm.py:396: in generate
    outputs = self._run_engine(use_tqdm=use_tqdm)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-empty/vllm/entrypoints/llm.py:1512: in _run_engine
    step_outputs = self.llm_engine.step()
                   ^^^^^^^^^^^^^^^^^^^^^^
vllm-empty/vllm/v1/engine/llm_engine.py:248: in step
    outputs = self.engine_core.get_output()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vllm.v1.engine.core_client.SyncMPClient object at 0xfffd3474a890>

    def get_output(self) -> EngineCoreOutputs:
        # If an exception arises in process_outputs_socket task,
        # it is forwarded to the outputs_queue so we can raise it
        # from this (run_output_handler) task to shut down the server.
        outputs = self.outputs_queue.get()
        if isinstance(outputs, Exception):
>           raise self._format_exception(outputs) from None
E           vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

vllm-empty/vllm/v1/engine/core_client.py:670: EngineDeadError
------------------------------ Captured log call -------------------------------
WARNING  transformers.configuration_utils:configuration_utils.py:697 The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
WARNING  transformers.configuration_utils:logging.py:328 `torch_dtype` is deprecated! Use `dtype` instead!
=============================== warnings summary ===============================
<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

tests/e2e/multicard/test_qwen3_moe.py::test_models_distributed_Qwen3_MOE_TP2_WITH_EP
  /usr/local/python3.11.13/lib/python3.11/site-packages/pydantic/_internal/_dataclasses.py:123: DeprecationWarning: The 'task' option has been deprecated and will be removed in v0.13.0 or v1.0, whichever comes first. Please remove this option.
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/e2e/multicard/test_qwen3_moe.py::test_models_distributed_Qwen3_MOE_TP2_WITH_EP - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
================== 1 failed, 3 warnings in 467.98s (0:07:47) ===================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

Processed prompts:   0%|          | 0/1 [05:14<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Error: Error: failed to run script step: command terminated with non-zero exit code: Error executing in Docker Container: 1
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.

@Yikun
Copy link
Collaborator

Yikun commented Sep 13, 2025

Not sure a real bug here or not, just retrigger this.

@Yikun Yikun added ready read for review ready-for-test start test by label for PR labels Sep 13, 2025
@Yikun
Copy link
Collaborator

Yikun commented Sep 13, 2025

Notice the current CI are running in CANN 8.2.RC1 but main is 8.3.RC1.alpha001 already

@wangxiyuan
Copy link
Collaborator

merged by #2915

@wangxiyuan wangxiyuan closed this Sep 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:tests ready read for review ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants