[MIOpen] Reintroduce workspace size limit for GEMM convolution solver#4418
Open
amd-bartgips wants to merge 8 commits intodevelopfrom
Open
[MIOpen] Reintroduce workspace size limit for GEMM convolution solver#4418amd-bartgips wants to merge 8 commits intodevelopfrom
amd-bartgips wants to merge 8 commits intodevelopfrom
Conversation
…solver Add configurable workspace size threshold for GEMM convolution solvers to prevent excessive memory allocation. Introduces MIOPEN_CONV_GEMM_MAX_SIZE environment variable (default: 7287183769 bytes) as a temporary workaround that essentially reverts #2393. Changes: - Add IsGEMMProblemTooLarge() to estimate and check GEMM workspace size against configured limit using the same formula as GemmFwdRest solver - Overload IsAlgorithmDisabled() to accept ProblemDescription parameter for problem-specific constraint checking - Update GEMMSolverFinder::IsEnabled() to check workspace size limits - Add detailed logging when GEMM solver is enabled/disabled based on workspace size The workspace calculation accounts for spatial dimensions, channel count, group count, data type size, and doubles the size for Int8 operations.
- Fix typo in comment: "revertin" -> "reverting" - Change max_size type from unsigned long long to std::size_t - Add MIOPEN_USE_GEMM guards around IsGEMMProblemTooLarge function - Remove verbose logging when workspace is within limits - Refactor IsAlgorithmDisabled to check global disable first, then problem-specific constraints - Simplify algorithm-specific logic: only GEMM has problem-specific constraints, others rely on global disable check
averinevg
reviewed
Feb 11, 2026
Relocate `IsGEMMProblemTooLarge()` function from `conv::detail` to `solver::conv::gemm` namespace for better code organization. This change: - Moves the GEMM workspace size validation logic to the solver-specific header where it logically belongs - Removes the temporary environment variable declaration from solver_finders.cpp (now in gemm_common.hpp) - Updates the call site to use the new namespace path - Also wrapped the workspace size limiting in a flag that refers to the original ticket No functional changes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Essentially reverting #2393 but in a more easily revertible way. (and made the exclusion threshold configurable)
Note this is a temporary stop-gap solution until the root cause of the failure of the GEMM solvers is found and fixed.
Motivation
When the GemmFwdRest solver is used to evaluate a particular shapes, it intermittently fails, either because it exceeds tolerance or it produces NaNs.
This behaviour has been noticed after #2393 was merged, hence this PR attempts to reintroduce the exclusion threshold based on workspace size that was removed in that PR.
Technical Details
Add configurable workspace size threshold for GEMM convolution solvers to prevent excessive memory allocation. Introduces MIOPEN_DEBUG_CONV_GEMM_MAX_WORKSPACE_SIZE environment variable (default: 7287183769 bytes, following the original/legacy limit) as a temporary workaround that essentially reverts #2393.
Changes:
The workspace calculation accounts for spatial dimensions, channel count, group count, data type size, and doubles the size for Int8 operations.
Test Plan
Run all tests.
Run MIODriver command that initially triggered the erroneous GEMM call to make sure it is excluded.
Use:
To force GEMM solver to be selected.
Test Result
New branch states:
MIOpen(HIP): Info2 [IsGEMMProblemTooLarge] GEMMSolverFinder disabled for workspace size 8663334912 bytes > 7287183769 bytes (MIOPEN_DEBUG_CONV_GEMM_MAX_WORKSPACE_SIZE)And as a result MIOpenDriver fails to find a suitable algo:
Submission Checklist