Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37364: [C++][GPU] Add CUDA impl of Device Event/Stream #37365

Merged
merged 9 commits into from
Aug 30, 2023

Conversation

zeroshade
Copy link
Member

@zeroshade zeroshade commented Aug 24, 2023

What changes are included in this PR?

Adding CudaDevice::SyncEvent and CudaDevice::Stream implementations which provide more idiomatic handling of Events and Streams.

Are these changes tested?

Simple SyncEvent test added. More stream tests still being added.

@zeroshade
Copy link
Member Author

CC @kkraus14

@github-actions
Copy link

⚠️ GitHub issue #37364 has been automatically assigned in GitHub to PR creator.

cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
Comment on lines 157 to 158
explicit Stream(std::shared_ptr<CudaContext> ctx, CUstream stream) noexcept
: context_{std::move(ctx)}, stream_{stream} {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a constructor that just works on the default CUDA context similar to how MakeDeviceSyncEvent and WrapDeviceSyncEvent work. Should we follow the same pattern for Streams here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that and was trying to think about if/how we wanted to handle Stream lifetime stuff and whether we wanted to go the same route we did with events in terms of the whole unique_ptr and custom deleter etc. to allow either the CudaDevice::Stream owning the lifetime or not.

Do you think it makes more sense for the constructing function to be part of the memory manager or the device object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device object. You could have different memory managers of the same device using the same stream, so I think it logically should be associated to the device.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the methods to the Device object, following the same pattern I set up for the SyncEvent, but for streams. Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed something: why does Stream need to own instead of just wrap? It seems that stream lifetime management will always be handled elsewhere. For example, say there's an application built with CUDA which is adding an arrow integration: the application will already have a pool of CUstreams which only need to be wrapped in arrow Device::Streams when calling an arrow function. I don't think an arrow function will ever take ownership of the stream away from the hypothetical application, so to me it seems we don't need arrow functions to wrap them in smart pointers for dynamic lifetimes or functions for producing new streams (MakeStream).

To put this another way: would we ever need to produce a vector<shared_ptr<Stream>> where the wrapped streams are a mix of CUstream and hipStream_t? Even if an application were using both ROCM and CUDA at the same time, the streams would (I'm certain) be maintained in separate pools which obviates the need for polymorphic lifetime management.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with both of you and lean towards the synchronicity and convenience side of things. It's nice having the APIs be similar between the events and streams and so I lean more towards the ownership of the stream itself.

That said, I'm not opposed to having it just wrap and not own. What do you think @bkietz?

Copy link
Member

@bkietz bkietz Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the potential aggravation due to arrow's failure to integrate seamlessly with a user's existing stream pool is greater than the potential due to a user needing to implement their own stream pool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with someone using Stream::WrapStream and passing nullptr if they don't want any lifetime management or handling something like a shared pointer in a release function if desired?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'd agree with @kkraus14 that if they are trying to integrate seamlessly with an existing stream pool they could use Device::WrapStream and pass nullptr or whatever other releasing back to the pool they want in the release function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not a fundamental issue, it's just an IMHO: avoiding ownership entirely here will be the more user-friendly API. Certainly using Stream::WrapStream as you describe will also work

cpp/src/arrow/gpu/cuda_test.cc Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_test.cc Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_test.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Aug 25, 2023
cpp/src/arrow/device.h Outdated Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Aug 28, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 28, 2023
Copy link
Contributor

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment issue, otherwise LGTM

cpp/src/arrow/device.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Aug 29, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Aug 29, 2023
@zeroshade zeroshade requested a review from bkietz August 29, 2023 16:01
@@ -109,19 +109,50 @@ class ARROW_EXPORT Device : public std::enable_shared_from_this<Device>,
/// should be trivially constructible from it's device-specific counterparts.
class ARROW_EXPORT Stream {
public:
virtual const void* get_raw() const { return NULLPTR; }
using release_fn_t = void (*)(void*);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider making this std::function instead of a raw function pointer here? Using the former would allow for passing lambda captures at the cost of some overhead. Lambda captures would be useful for something like if someone has a smart pointer managing the lifetime of a stream.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Aug 30, 2023
@zeroshade zeroshade merged commit 3b8ab8e into apache:main Aug 30, 2023
31 of 33 checks passed
@zeroshade zeroshade removed the awaiting merge Awaiting merge label Aug 30, 2023
@zeroshade zeroshade deleted the cuda-event-stream-impl branch August 31, 2023 15:22
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3b8ab8e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

zeroshade added a commit that referenced this pull request Sep 5, 2023
…pendencies (#37497)

### Rationale for this change
#37365 was built locally using a newer version of the CUDA driver API than the crossbow builds run with causing the crossbow build to fail due to a change in macro names. This puts the macro name back to allow it to build with the older version of the cuda driver api.

* Closes: #37523

Authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…he#37365)

### What changes are included in this PR?
Adding `CudaDevice::SyncEvent` and `CudaDevice::Stream` implementations which provide more idiomatic handling of Events and Streams.

### Are these changes tested?
Simple SyncEvent test added. More stream tests still being added.

* Closes: apache#37364

Authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…UDA dependencies (apache#37497)

### Rationale for this change
apache#37365 was built locally using a newer version of the CUDA driver API than the crossbow builds run with causing the crossbow build to fail due to a change in macro names. This puts the macro name back to allow it to build with the older version of the cuda driver api.

* Closes: apache#37523

Authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…he#37365)

### What changes are included in this PR?
Adding `CudaDevice::SyncEvent` and `CudaDevice::Stream` implementations which provide more idiomatic handling of Events and Streams.

### Are these changes tested?
Simple SyncEvent test added. More stream tests still being added.

* Closes: apache#37364

Authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…UDA dependencies (apache#37497)

### Rationale for this change
apache#37365 was built locally using a newer version of the CUDA driver API than the crossbow builds run with causing the crossbow build to fail due to a change in macro names. This puts the macro name back to allow it to build with the older version of the cuda driver api.

* Closes: apache#37523

Authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][GPU] Add CUDA Implementation of Device Event/Stream
4 participants