Fix Libfabric MR caching issues #13327

bwbarrett · 2025-07-09T19:25:11Z

Fix a set of bugs in both the OFI BTL and OFI MTL around caching MRs. The Libfabric EFA provider used to (erroneously) cache explicitly created MRs and will stop doing so in Libfabric 2.2. This caused a performance regression in both the OFI MTL (with HMEM) and BTL (always) over EFA because bad behaviors had snuck in OMPI w.r.t assuming the provider caches MRs. So we stop disabling the BTL rcache for EFA and add an rcache for HMEM MRs for the OFI MTL. The OFI MTL requires the provider not require FI_MR_LOCAL, so we don't need to worry about caching general MRs there.

While I was fixing that, noticed two other issues in the OFI code that I cleaned up. First, we should use the state of FI_MR_HMEM instead of a provider name for avoiding creating HMEM MRs in the OFI MTL. Second, in the case that the OFI BTL is used without the OFI MTL, we were not properly coupling the OMPI memory monitor with the Libfabric memory monitor.

This was an optimization around a bug in the EFA provider. The EFA provider shouldn't be caching explicit registrations anyway, so avoiding the double cache is silly (and breaks when EFA fixes the explicit registration cache bug). Signed-off-by: Brian Barrett <bbarrett@amazon.com>

The OFI MTL exports a memory monitor to Libfabric (so that OMPI's patcher wins), but in cases where OB1 is directly selected, that code won't run. So make sure to also configure Libfabric so that it won't try to use a suboptimial memory monitor in the case that only the OFI BTL is used. Signed-off-by: Brian Barrett <bbarrett@amazon.com>

Rather than use the CXI provider name to disable explicit hmem registration, use the FI_MR_HMEM flag. Signed-off-by: Brian Barrett <bbarrett@amazon.com>

The OFI MTL was creating a registration for every operation that used HMEM when FI_MR_HMEM is required. This is really performance inefficient, since creating registrations is expensive. So stick a rcache in front of the registrations. Signed-off-by: Brian Barrett <bbarrett@amazon.com>

bwbarrett · 2025-07-09T19:28:08Z

@hppritcha can you give this a whirl on a CXI system with --mca mtl_base_verbose 100 and make sure you see a line like:

Support for device buffers enabled with implicit registration

hppritcha

this PR doesn't seem to change behavior of the CXI provider on CUDA systems. There is some problem with one-sided but its present on main as well.

bwbarrett added 4 commits July 9, 2025 04:15

mtl/ofi: Use FI_MR_HMEM for explicit reg check

edf3634

Rather than use the CXI provider name to disable explicit hmem registration, use the FI_MR_HMEM flag. Signed-off-by: Brian Barrett <bbarrett@amazon.com>

bwbarrett requested a review from hppritcha July 9, 2025 19:25

github-actions bot added the Target: main label Jul 9, 2025

bwbarrett requested a review from sunkuamzn July 9, 2025 19:29

hppritcha approved these changes Jul 10, 2025

View reviewed changes

shijin-aws approved these changes Jul 14, 2025

View reviewed changes

bwbarrett merged commit 8f3c171 into open-mpi:main Jul 14, 2025
15 checks passed

bwbarrett deleted the ofi-hmem branch July 14, 2025 19:41

bwbarrett mentioned this pull request Jul 14, 2025

v5.0.x: Fix Libfabric MR caching issues #13327 #13332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Libfabric MR caching issues #13327

Fix Libfabric MR caching issues #13327

bwbarrett commented Jul 9, 2025

Uh oh!

bwbarrett commented Jul 9, 2025

Uh oh!

hppritcha left a comment

Uh oh!

Uh oh!

Uh oh!

Fix Libfabric MR caching issues #13327

Fix Libfabric MR caching issues #13327

Conversation

bwbarrett commented Jul 9, 2025

Uh oh!

bwbarrett commented Jul 9, 2025

Uh oh!

hppritcha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!