Add super kernel in moe #1877

NNUCJ · 2025-07-18T07:44:52Z

What this PR does / why we need it?

Reduce the scheduling overhead of MOE operator on devices through super_kernel

Does this PR introduce any user-facing change?

How was this patch tested?

The superkernel feature is also controlled through “enable_rultistream_mae“， which is set to True to enable the superkernel
The example is as follows

 llm = LLM(
        model=model,
        tensor_parallel_size=GPUs_per_dp_rank,
        # enforce_eager=True,
        max_num_seqs=12,
        max_model_len=4600,
        max_num_batched_tokens=4600,
        gpu_memory_utilization=0.85,
        enable_expert_parallel=True,
        trust_remote_code=True,
        enable_prefix_caching=False,
        additional_config={
            'ascend_scheduler_config': {'enabled': True},
            'torchair_graph_config': {'enabled': True, 'enable_multistream_moe': True, 'enable_multistream_mla': True, "graph_batch_sizes": [12], "enable_kv_nz": True},
        }
    )

Signed-off-by: NNUCJ <616151263@qq.com>

github-actions bot added module:ops module:core module:quantization labels Jul 18, 2025

NNUCJ force-pushed the sk_model_091 branch 7 times, most recently from 611aa57 to c018c84 Compare July 18, 2025 09:08

Add super kernel in moe

dfc5935

Signed-off-by: NNUCJ <616151263@qq.com>

NNUCJ force-pushed the sk_model_091 branch from c018c84 to dfc5935 Compare July 18, 2025 11:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add super kernel in moe #1877

Add super kernel in moe #1877

NNUCJ commented Jul 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add super kernel in moe #1877

Are you sure you want to change the base?

Add super kernel in moe #1877

Conversation

NNUCJ commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

NNUCJ commented Jul 18, 2025 •

edited

Loading