Fix PyTorch CUDACachingAllocator::snapshot() API compatibility for conda builds #65
+14
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Problem
The torchcomms build in the xlformers_llama4_flagship_conda environment is failing with:
D86907322 introduced a new method registerMemPreHook() that calls
c10::cuda::CUDACachingAllocator::snapshot({device, 0})with arguments.This is caused by a PyTorch API breaking change in
c10::cuda::CUDACachingAllocator::snapshot():snapshot({device, pool_id})snapshot()- no argumentsRoot Cause
Torchcomms code was written for the older PyTorch API where
snapshot()accepted arguments, but conda environments use a newer PyTorch version with the updated API.Solution
Added conditional compilation to handle both API versions:
TORCHCOMMS_CONDA_BUILDmacro in CMakeLists.txt files for conda buildscomms/torchcomms/ncclx/TorchCommNCCLXCCA.cppcomms/torchcomms/nccl/TorchCommNCCLCCA.cppThis follows the same pattern used in
ProcessGroupNCCLX.cppwith theNCCLX_CONDA_BUILDmacro.Differential Revision: D87413411