Skip to content

Conversation

TomAugspurger
Copy link
Contributor

Description

This notes that buffers need to be freed before the MR itself is freed, motivated by some debugging in rapidsai/cudf@958face.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

This notes that buffers need to be freed before the MR itself is freed.
Copy link

copy-pr-bot bot commented Sep 18, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@TomAugspurger
Copy link
Contributor Author

/ok to test

@TomAugspurger TomAugspurger added doc Documentation non-breaking Non-breaking change labels Sep 18, 2025
@TomAugspurger TomAugspurger marked this pull request as ready for review September 18, 2025 17:49
Comment on lines +165 to +168
> Note that a Python exception that's raised in a scope that allocates
> buffers can keep references to the buffers that will keep them alive.
> Use ``traceback.clear_frames`` to ensure those buffers are freed
> before destroying the memory resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this description alone, I don't think I would be able to understand how to avoid this problem.

If you want to link to an issue or include an example of the bad code inline, that is fine.

@TomAugspurger
Copy link
Contributor Author

I might close this PR. rapidsai/cudf#20028 fixed the issue. AFAICT, rmm's Python API isn't susceptible to the same issue since its DeviceBuffer (apparently) already keeps a reference to the MR. So this is strange, but won't cause a segfault:

>>> tmp_mr = rmm.mr.CudaMemoryResource()
>>> rmm.mr.set_current_device_resource(tmp_mr)
>>> buf = rmm.DeviceBuffer(size=1024)  # alloacted with tmp_mr
>>> del tmp_mr

@vyasr
Copy link
Contributor

vyasr commented Sep 26, 2025

Correct, the Python API should protect against this. The C++ API will as well once we fully get onto the newer CCCL versions that support what we need for resources.

@TomAugspurger
Copy link
Contributor Author

Thanks, I'll close this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation non-breaking Non-breaking change
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants