-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a host-pinned memory resource that can be used as upstream for pool_memory_resource
.
#1392
Add a host-pinned memory resource that can be used as upstream for pool_memory_resource
.
#1392
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor suggestions
I am a bit dismayed about the amount of documentation boilerplate.
Maybe we could work with defaulted arguments rather than redefining the function each time?
I am able to consolidate the non-async functions into a single allocate and deallocate (eliminating two functions). But for the async versions, we have existing calls to Also, suppose we did consolidate these. What should we use for Actually, this default alignment problem applies to the non-async Thoughts? |
- Consolidate allocate/deallocate functions using default alignment argument. - Add missing includes.
We could define a free helper function that return 256 on device and |
What is a minimum safe value (given we don't know the type of the pointer we're allocating for)? On host, I believe the answer is Should memory resources that perform concrete allocations advertise their default alignment as a property and then wrapping resources can query that? |
We can make this a property. The one potential design issue that our |
Can you explain a bit more? Maybe an example so I can understand what you mean? |
The issue is that properties are awesome if you have them around. But Its the difference between: template<class T>
requires resource_with<cuda::mr::device_accessible>
void* special_allocate(T& memory_resource, size_t size)
void* special_allocate(cuda::mr::resource_ref<cuda::mr::device_accessible>& memory_resource, size_t size) The latter is a streamlined implementation that reduces binary size considerably and generally simplifies the interfaces a lot. However, there is no Its definitely a wart |
Also the former requires C++20, right? |
Yes, but you can also just write template<class T, cuda::std::enable_if_t<resource_with<cuda::mr::device_accessible>, int> = 0>
void* special_allocate(T& memory_resource, size_t size) |
…e_Device_memory utility.
… that require an initial pool size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, the alignment parameters not being used obviously means we will need to align on our own. Our pinned code looks to be using std::max_align_t
right now, just FYI.
@abellina you are right we should probably fix this to actually align. I think cudaHostAlloc leaves alignment up to the caller. |
Done. |
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor documentation nits, otherwise looks great!
include/rmm/aligned.hpp
Outdated
* | ||
* @return Whether the input a power of two with non-negative exponent | ||
* @return True if the input a power of two with non-negative exponent, false otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @return True if the input a power of two with non-negative exponent, false otherwise. | |
* @return True if the input is a power of two with non-negative exponent, false otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double-nit (no need to act on it) non-negative integer exponent
(all integers can be expressed as powers of two if we admit real exponents).
/ok to test |
For me this still looks good. I was able to replace our pinned pool with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions, then I'm happy to approve.
* @briefreturn{true if the specified resource is the same type as this resource, otherwise | ||
* false.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring implies it's possible to compare with another type of resource and get false
, but the implementation doesn't allow that. Do we need to update the implementation or the docstrings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I had that thought. Is there a blanket "false" implementation in the base class somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is how comparison works with cuda::mr
. Basically if you try to compare with another type of resource, compilation will fail. Note that refactoring to cuda::mr
will necessitate changing the semantics RMM currently (mostly) has for MR equality comparison. #1402
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note also that pinned_host_memory_resource
is NOT a device_memory_resource
. It simply implements the cuda::mr::memory_resource
and cuda::mr::async_memory_resource
concepts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also note there is no base class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the docstring so it doesn't say that false can be returned. Note that we should probably followup with more explicit tests of this MR and future MRs like it. Right now, though, our test machinery for MRs assumes they are all device_memory_resource
, so while I can pass a pool_memory_resource<pinned_host_memory_resource>
to all the MR tests, I can't pass just pinned_host_memory_resource
currently. (It does get tested as the upstream in the former case though, including its operator==
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. If there's no base class, I've just lost track of how the class hierarchy works. I don't have any further comments here but I'll need to refresh myself on how things are supposed to work someday.
/ok to test |
@harrism asked me to merge once I approve, so I'll do that. |
/merge |
Description
Depends on #1417
Adds a new
host_pinned_memory_resource
that implements the newcuda::mr::memory_resource
andcuda::mr::async_memory_resource
concepts which makes it usable as an upstream MR forrmm::mr::device_memory_resource
.Also tests a pool made with this new MR as the upstream.
Note that the tests explicitly set the initial and maximum pool sizes as using the defaults does not currently work. See #1388 .
Closes #618
Checklist