Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add best practices/advice with respect to using pool allocators #1694

Open
wence- opened this issue Oct 4, 2024 · 7 comments
Open
Labels
doc Documentation

Comments

@wence-
Copy link
Contributor

wence- commented Oct 4, 2024

RMM has multiple pool-like allocators:

  • a pool_memory_resource that wraps a coalescing best fit suballocator around an upstream resource;
  • an arena_memory_resource that similarly wraps around an upstream resource but divides the global allocation into size-binned arenas to mitigate fragmentation when allocating/deallocating;
  • and a cuda_async_memory_resource that uses the memory pool implementation provided by cudaMallocAsync. This one can avoid fragmentation because it is in control of the virtual address space.

Since these are all composable, one can happily wrap a pool_memory_resource around a cuda_async_memory_resource (or an arena, ...). But should one?

It would be useful if the documentation provided some guidance on which combinations make sense, and what typical allocation scenarios best fit a particular pool.

We should also recommend best practices for picking initial pool sizes: a bad choice here can lead to overfragmentation.

@wence- wence- added ? - Needs Triage Need team to review and classify doc Documentation and removed ? - Needs Triage Need team to review and classify labels Oct 4, 2024
@harrism
Copy link
Member

harrism commented Oct 9, 2024

Side thought: Maybe we should experiment with replacing the cuda_memory_resource used for initial_resource with a cuda_async_memory_resource...

@wence-
Copy link
Contributor Author

wence- commented Oct 10, 2024

Side thought: Maybe we should experiment with replacing the cuda_memory_resource used for initial_resource with a cuda_async_memory_resource...

Maybe, though we'd have the usual static destruction problems, so we'd never explicitly free that memory pool.

It might also be problematic in the multiple library case where one library is not configured with a specific pool, so makes an allocation from the initial_resource which now builds a pool. And now we're conflicting (potentially) with other libraries that then set a pool up.

@harrism
Copy link
Member

harrism commented Oct 10, 2024

I was thinking by default the async resource uses the default pool, which we would not own. Maybe I'm misremembering how it's implemented.

@vyasr
Copy link
Contributor

vyasr commented Oct 10, 2024

The async resource will use the pool managed by the CUDA driver, which we do not own and would probably be fine. Ideally everyone would use that and then all pooling would be handled by the driver. If we use the async mr by default and a different library does not but constructs their own pool manually using a different underlying allocation routine (e.g. cudaMalloc instead of cudaMallocAsync), then we could conflict.

@wence-
Copy link
Contributor Author

wence- commented Oct 11, 2024

In cuda_async_memory_resource we call cudaMempoolCreate and get a handle to a pool and use that to make our allocations. So that sounds like we own that pool.

@vyasr
Copy link
Contributor

vyasr commented Oct 11, 2024

My mistake, I didn't realize that we were allocating from a specific pool that we created. The failure mode should still be relatively graceful if two processes both use the async allocation routines and one pool blocks another's growth. I don't think it will be as graceful if you mix and match async with non-async allocation, but I could be wrong there.

@harrism
Copy link
Member

harrism commented Oct 14, 2024

I believe the reason that cuda_async_memory_resource owns its pool is because we provide a non-owning MR: cuda_async_view_memory_resource. We could use the latter as the default resource with the default pool. That MR requires passing a cudaMemPool_t pool handle, and the default pool handle can be retrieved using cudaDeviceGetDefaultMempool.

Perhaps, however, we should wait to make this default change until we can start using the cuda_async_memory_resource from libcu++, which has a different design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation
Projects
Status: Todo
Development

No branches or pull requests

3 participants