Skip to content

[MIOpen] Reintroduce workspace size limit for GEMM convolution solver#4418

Open
amd-bartgips wants to merge 8 commits intodevelopfrom
users/bartgips/gemm_vram_limit
Open

[MIOpen] Reintroduce workspace size limit for GEMM convolution solver#4418
amd-bartgips wants to merge 8 commits intodevelopfrom
users/bartgips/gemm_vram_limit

Conversation

@amd-bartgips
Copy link
Contributor

@amd-bartgips amd-bartgips commented Feb 9, 2026

Essentially reverting #2393 but in a more easily revertible way. (and made the exclusion threshold configurable)
Note this is a temporary stop-gap solution until the root cause of the failure of the GEMM solvers is found and fixed.

Motivation

When the GemmFwdRest solver is used to evaluate a particular shapes, it intermittently fails, either because it exceeds tolerance or it produces NaNs.

This behaviour has been noticed after #2393 was merged, hence this PR attempts to reintroduce the exclusion threshold based on workspace size that was removed in that PR.

Technical Details

Add configurable workspace size threshold for GEMM convolution solvers to prevent excessive memory allocation. Introduces MIOPEN_DEBUG_CONV_GEMM_MAX_WORKSPACE_SIZE environment variable (default: 7287183769 bytes, following the original/legacy limit) as a temporary workaround that essentially reverts #2393.

Changes:

  • Add IsGEMMProblemTooLarge() to estimate and check GEMM workspace size against configured limit using the same formula as GemmFwdRest solver
  • Overload IsAlgorithmDisabled() to accept ProblemDescription parameter for problem-specific constraint checking
  • Update GEMMSolverFinder::IsEnabled() to check workspace size limits
  • Add detailed logging when GEMM solver is enabled/disabled based on workspace size

The workspace calculation accounts for spatial dimensions, channel count, group count, data type size, and doubles the size for Int8 operations.

Test Plan

Run all tests.
Run MIODriver command that initially triggered the erroneous GEMM call to make sure it is excluded.

MIOpenDriver convfp16 -n 1 -c 256 --in_d 19 -H 146 -W 258 -k 256 --fil_d 3 -y 3 -x 3 --pad_d 0 -p 0 -q 0 --conv_stride_d 1 -u 1 -v 1 --dilation_d 1 -l 1 -j 1 --spatial_dim 3 -m conv -g 1 -F 1 -t 1 --iter 1

Use:

export MIOPEN_DEBUG_CONV_FFT=0
export MIOPEN_DEBUG_CONV_DIRECT=0
export MIOPEN_DEBUG_CONV_GEMM=1
export MIOPEN_DEBUG_CONV_WINOGRAD=0
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0

To force GEMM solver to be selected.

Test Result

New branch states:
MIOpen(HIP): Info2 [IsGEMMProblemTooLarge] GEMMSolverFinder disabled for workspace size 8663334912 bytes > 7287183769 bytes (MIOPEN_DEBUG_CONV_GEMM_MAX_WORKSPACE_SIZE)
And as a result MIOpenDriver fails to find a suitable algo:

MIOpen Error: ctr-cx66-mi300x-02.amd.com:/home/AMD/bartgips/code/rocm-libraries/projects/miopen/src/ocl/convolutionocl.cpp:618: No suitable algorithm was found to execute the required convolution
RunForwardGPU() FAILED, rc = 0x7
Forward Convolution FAILED: 1.79769e+308 > 0.0082

Submission Checklist

…solver

Add configurable workspace size threshold for GEMM convolution solvers to
prevent excessive memory allocation. Introduces MIOPEN_CONV_GEMM_MAX_SIZE
environment variable (default: 7287183769 bytes) as a temporary workaround
that essentially reverts #2393.

Changes:
- Add IsGEMMProblemTooLarge() to estimate and check GEMM workspace size
  against configured limit using the same formula as GemmFwdRest solver
- Overload IsAlgorithmDisabled() to accept ProblemDescription parameter
  for problem-specific constraint checking
- Update GEMMSolverFinder::IsEnabled() to check workspace size limits
- Add detailed logging when GEMM solver is enabled/disabled based on
  workspace size

The workspace calculation accounts for spatial dimensions, channel count,
group count, data type size, and doubles the size for Int8 operations.
- Fix typo in comment: "revertin" -> "reverting"
- Change max_size type from unsigned long long to std::size_t
- Add MIOPEN_USE_GEMM guards around IsGEMMProblemTooLarge function
- Remove verbose logging when workspace is within limits
- Refactor IsAlgorithmDisabled to check global disable first, then problem-specific constraints
- Simplify algorithm-specific logic: only GEMM has problem-specific constraints, others rely on global disable check
@amd-bartgips amd-bartgips marked this pull request as ready for review February 11, 2026 13:11
@amd-bartgips amd-bartgips requested a review from a team as a code owner February 11, 2026 13:11
@amd-bartgips amd-bartgips enabled auto-merge (squash) February 11, 2026 13:14
@amd-bartgips amd-bartgips self-assigned this Feb 11, 2026
@amd-bartgips amd-bartgips changed the title [MIOpen]Reintroduced workspace size limit for GEMM convolution solver [MIOpen] Reintroduce workspace size limit for GEMM convolution solver Feb 11, 2026
Relocate `IsGEMMProblemTooLarge()` function from `conv::detail` to `solver::conv::gemm` namespace for better code organization. This change:

- Moves the GEMM workspace size validation logic to the solver-specific header where it logically belongs
- Removes the temporary environment variable declaration from solver_finders.cpp (now in gemm_common.hpp)
- Updates the call site to use the new namespace path
- Also wrapped the workspace size limiting in a flag that refers to the original ticket

No functional changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants