Skip to content

Conversation

@danielafrimi
Copy link
Contributor

@danielafrimi danielafrimi commented Nov 30, 2025

This PR adds support for FlashInfer MoE FP8 kernels that require the intermediate (gated) dimension to be aligned.
Some models break when the intermediate size isn’t divisible by 16, especially under different TP.

This PR introduces _maybe_pad_intermediate_for_flashinfer, which pads w13 and w2 along the intermediate dim so the weights meet FlashInfer’s alignment constraints.

If padding is needed, we zero-pad the up/gate and down projection weights and update intermediate_size_per_partition accordingly. This keeps the model numerically correct while avoiding kernel launch failures.

Daniel Afrimi and others added 3 commits November 27, 2025 05:56
Signed-off-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster>
Signed-off-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster>
Signed-off-by: dafrimi <dafrimi@nvidia.com>
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces padding for the intermediate dimension of MoE weights to meet the alignment requirements of FlashInfer FP8 kernels. This is a necessary fix to prevent kernel launch failures for certain models and tensor parallelism configurations. The changes are well-contained within the ModelOptFp8MoEMethod, and the new _maybe_pad_intermediate_for_flashinfer function correctly pads the w13 and w2 weights with zeros and updates the intermediate_size_per_partition attribute on the layer. The implementation appears correct and robust.

danielafrimi and others added 2 commits November 30, 2025 15:47
Signed-off-by: dafrimi <dafrimi@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant