Skip to content

Conversation

@mxz297
Copy link
Contributor

@mxz297 mxz297 commented Nov 3, 2025

Summary: #26443 adds checking of availability of nvcc as a condition to enable flashinfer moe. In our deployment env, there is no nvcc, so flashinfer moe is disabled
Differential Revision: D86104899

Summary: vllm-project#26443 adds checking of availability of nvcc as a condition to enable flashinfer moe. On devgpus, we may have nvcc so there is no issue. But in tw jobs, there is no nvcc, then flashinfer moe is disabled.

Differential Revision: D86104899
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to enable FlashInfer in environments where nvcc is not available by removing the nvcc availability check. While this addresses the issue for environments with pre-compiled kernels, it could introduce runtime crashes for users who lack both nvcc and pre-compiled kernels. I've suggested a safer alternative that makes the nvcc check conditional on the VLLM_HAS_FLASHINFER_CUBIN environment variable. This approach provides the desired flexibility for production environments while preserving the safeguard for other users.

@mxz297
Copy link
Contributor Author

mxz297 commented Nov 3, 2025

@mgoin our internal prod environment uses flashinfer in an AOT fashion, and do not have nvcc. So right now we are seeing flashinfer moe being disabled internally, causing perf regression.

@alecsolder
Copy link
Contributor

Is there a way we can add unit tests to ensure this doesn't get turned off accidentally again for the model?

@mxz297 mxz297 changed the title do not check nvcc availability [flashinfer][fix] do not check nvcc availability Nov 3, 2025
@heheda12345
Copy link
Collaborator

Is nvcc required by the jit compilation of FlashInfer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants