Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying mr in DeviceBuffer construction, and document ownership requirements in Python/C++ interfacing #1552

Merged
merged 7 commits into from
May 16, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 91 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Python requirements:
* `cuda-python`
* `cython`

For more details, see [pyproject.toml](python/pyproject.toml)
For more details, see [pyproject.toml](python/rmm/pyproject.toml)


### Script to build RMM from source
Expand Down Expand Up @@ -855,3 +855,93 @@ Out[6]:
'total_bytes': 16,
'total_count': 1}
```

## Taking ownership of C++ objects from Python.

When interacting with a C++ library that uses RMM from Python, one
must be careful when taking ownership of `rmm::device_buffer` objects
on the Python side. The `rmm::device_buffer` contains a _raw_ pointer
to the memory resource used for its allocation, and the allocating
wence- marked this conversation as resolved.
Show resolved Hide resolved
user is expected to keep this memory resource alive for at least the
lifetime of the buffer. When taking ownership of such a buffer in
Python, we have no way (in the general case) of ensuring that the
memory resource will outlive the buffer we are now holding.

To avoid any issues, we need two things:

1. The C++ library we are interfacing with should accept a memory
resource that is used for allocations that are returned to the
user.
2. When calling into the library from python, we should provide a
memory resource whose lifetime we control. This memory resource
should then be provided when we take ownership of any allocated
`rmm::device_buffer`s.

For example, suppose we have a C++ function that allocates
`device_buffer`s, which has a utility overload that defaults the
memory resource to the current device resource:

```c++
std::unique_ptr<rmm::device_buffer> allocate(
std::size_t size,
rmm::mr::device_async_resource_ref mr = get_current_device_resource())
{
return std::make_unique<rmm::device_buffer>(size, rmm::cuda_stream_default, mr);
}
```

The Python `DeviceBuffer` class has a convenience Cython function,
`c_from_unique_ptr` to construct a `DeviceBuffer` from a
`unique_ptr<rmm::device_buffer>`, taking ownership of it. To do this
safely, we must ensure that the allocation that was done on the C++
side uses a memory resource we control. So:

```cython
# Bad, don't control lifetime
buffer_bad = DeviceBuffer.c_from_unique_ptr(allocate(10))

# Good, allocation happens with an MR we control
# mr is a DeviceMemoryResource
buffer_good = DeviceBuffer.c_from_unique_ptr(
allocate(10, mr.get_mr()),
mr=mr,
)
```

Note two differences between the bad and good cases:

1. In the good case we pass the memory resource to the allocation
function.
2. In the good case, we pass _the same_ memory resource to the
`DeviceBuffer` constructor so that its lifetime is tied to the
lifetime of the buffer.

### Potential pitfalls if relying on `get_current_device_resource`
wence- marked this conversation as resolved.
Show resolved Hide resolved

Functions in both the C++ and Python APIs that perform allocation
typically default the memory resource argument to the value of
`get_current_device_resource`. This is to simplify the interface for
callers. When using a C++ library from Python, this defaulting is
safe, _as long as_ it is only the Python process that ever calls
`set_current_device_resource`.

This is because the current device resource on the C++ side has a
lifetime which is expected to be managed by the user. The resources
set by `rmm::mr::set_current_device_resource` are stored in a static
`std::map` whose keys are device ids and values are raw pointers to
the memory resources. Consequently,
`rmm::mr::get_current_device_resource` returns a raw pointer with no
lifetime provenance. This is, for the reasons discussed above, not
usable from Python. To handle this on the Python side, the
Python-level `set_current_device_resource` sets the C++ resource _and_
stores the Python object in a static global dictionary. The Python
`get_current_device_resource` then _does not use_
`rmm::mr::get_current_device_resource` and instead looks up the
current device resource in this global dictionary.

Hence, if the C++ library we are interfacing with calls
`rmm::mr::set_current_device_resource`, the C++ and Python sides of
the program can disagree on what `get_current_device_resource`
returns. The only safe thing to do if using the simplified interfaces
is therefore to ensure that `set_current_device_resource` is only ever
called on the Python side.
vyasr marked this conversation as resolved.
Show resolved Hide resolved
3 changes: 2 additions & 1 deletion python/rmm/rmm/_lib/device_buffer.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ cdef class DeviceBuffer:
@staticmethod
cdef DeviceBuffer c_from_unique_ptr(
unique_ptr[device_buffer] ptr,
Stream stream=*
Stream stream=*,
DeviceMemoryResource mr=*,
)

@staticmethod
Expand Down
14 changes: 10 additions & 4 deletions python/rmm/rmm/_lib/device_buffer.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ from cuda.ccudart cimport (
)

from rmm._lib.memory_resource cimport (
DeviceMemoryResource,
device_memory_resource,
get_current_device_resource,
)
Expand All @@ -48,7 +49,8 @@ cdef class DeviceBuffer:
def __cinit__(self, *,
uintptr_t ptr=0,
size_t size=0,
Stream stream=DEFAULT_STREAM):
Stream stream=DEFAULT_STREAM,
DeviceMemoryResource mr=None):
"""Construct a ``DeviceBuffer`` with optional size and data pointer

Parameters
Expand All @@ -65,6 +67,9 @@ cdef class DeviceBuffer:
scope while the DeviceBuffer is in use. Destroying the
underlying stream while the DeviceBuffer is in use will
result in undefined behavior.
mr : optional
DeviceMemoryResource for the allocation, if not provided
defaults to the current device resource.

Note
----
Expand All @@ -80,7 +85,7 @@ cdef class DeviceBuffer:
cdef const void* c_ptr
cdef device_memory_resource * mr_ptr
# Save a reference to the MR and stream used for allocation
self.mr = get_current_device_resource()
self.mr = get_current_device_resource() if mr is None else mr
self.stream = stream

mr_ptr = self.mr.get_mr()
Expand Down Expand Up @@ -162,13 +167,14 @@ cdef class DeviceBuffer:
@staticmethod
cdef DeviceBuffer c_from_unique_ptr(
unique_ptr[device_buffer] ptr,
Stream stream=DEFAULT_STREAM
Stream stream=DEFAULT_STREAM,
DeviceMemoryResource mr=None,
):
cdef DeviceBuffer buf = DeviceBuffer.__new__(DeviceBuffer)
if stream.c_is_default():
stream.c_synchronize()
buf.c_obj = move(ptr)
buf.mr = get_current_device_resource()
buf.mr = get_current_device_resource() if mr is None else mr
buf.stream = stream
return buf

Expand Down
27 changes: 27 additions & 0 deletions python/rmm/rmm/tests/test_rmm.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# limitations under the License.

import copy
import functools
import gc
import os
import pickle
Expand Down Expand Up @@ -498,6 +499,32 @@ def test_mr_devicebuffer_lifetime():
del a


def test_device_buffer_with_mr():
allocations = []
base = rmm.mr.CudaMemoryResource()
rmm.mr.set_current_device_resource(base)

def alloc_cb(size, stream, *, base):
allocations.append(size)
return base.allocate(size, stream)

def dealloc_cb(ptr, size, stream, *, base):
return base.deallocate(ptr, size, stream)

cb_mr = rmm.mr.CallbackMemoryResource(
functools.partial(alloc_cb, base=base),
functools.partial(dealloc_cb, base=base),
wence- marked this conversation as resolved.
Show resolved Hide resolved
)
rmm.DeviceBuffer(size=10)
assert len(allocations) == 0
buf = rmm.DeviceBuffer(size=256, mr=cb_mr)
assert len(allocations) == 1
assert allocations[0] == 256
del cb_mr
gc.collect()
del buf


def test_mr_upstream_lifetime():
# Simple test to ensure upstream MRs are deallocated before downstream MR
cuda_mr = rmm.mr.CudaMemoryResource()
Expand Down
Loading