GH-37364: [C++][GPU] Add CUDA impl of Device Event/Stream #37365

zeroshade · 2023-08-24T16:28:35Z

What changes are included in this PR?

Adding CudaDevice::SyncEvent and CudaDevice::Stream implementations which provide more idiomatic handling of Events and Streams.

Are these changes tested?

Simple SyncEvent test added. More stream tests still being added.

Closes: [C++][GPU] Add CUDA Implementation of Device Event/Stream #37364

zeroshade · 2023-08-24T16:28:43Z

CC @kkraus14

github-actions · 2023-08-24T16:29:06Z

⚠️ GitHub issue #37364 has been automatically assigned in GitHub to PR creator.

cpp/src/arrow/gpu/cuda_context.cc

kkraus14 · 2023-08-24T23:41:10Z

cpp/src/arrow/gpu/cuda_context.h

+    explicit Stream(std::shared_ptr<CudaContext> ctx, CUstream stream) noexcept
+        : context_{std::move(ctx)}, stream_{stream} {}


It would be nice to have a constructor that just works on the default CUDA context similar to how MakeDeviceSyncEvent and WrapDeviceSyncEvent work. Should we follow the same pattern for Streams here?

I was thinking about that and was trying to think about if/how we wanted to handle Stream lifetime stuff and whether we wanted to go the same route we did with events in terms of the whole unique_ptr and custom deleter etc. to allow either the CudaDevice::Stream owning the lifetime or not.

Do you think it makes more sense for the constructing function to be part of the memory manager or the device object?

The device object. You could have different memory managers of the same device using the same stream, so I think it logically should be associated to the device.

Added the methods to the Device object, following the same pattern I set up for the SyncEvent, but for streams. Let me know what you think.

I think I missed something: why does Stream need to own instead of just wrap? It seems that stream lifetime management will always be handled elsewhere. For example, say there's an application built with CUDA which is adding an arrow integration: the application will already have a pool of CUstreams which only need to be wrapped in arrow Device::Streams when calling an arrow function. I don't think an arrow function will ever take ownership of the stream away from the hypothetical application, so to me it seems we don't need arrow functions to wrap them in smart pointers for dynamic lifetimes or functions for producing new streams (MakeStream).

To put this another way: would we ever need to produce a vector<shared_ptr<Stream>> where the wrapped streams are a mix of CUstream and hipStream_t? Even if an application were using both ROCM and CUDA at the same time, the streams would (I'm certain) be maintained in separate pools which obviates the need for polymorphic lifetime management.

I tend to agree with both of you and lean towards the synchronicity and convenience side of things. It's nice having the APIs be similar between the events and streams and so I lean more towards the ownership of the stream itself.

That said, I'm not opposed to having it just wrap and not own. What do you think @bkietz?

I think the potential aggravation due to arrow's failure to integrate seamlessly with a user's existing stream pool is greater than the potential due to a user needing to implement their own stream pool.

What's the issue with someone using Stream::WrapStream and passing nullptr if they don't want any lifetime management or handling something like a shared pointer in a release function if desired?

Yea, I'd agree with @kkraus14 that if they are trying to integrate seamlessly with an existing stream pool they could use Device::WrapStream and pass nullptr or whatever other releasing back to the pool they want in the release function.

There's not a fundamental issue, it's just an IMHO: avoiding ownership entirely here will be the more user-friendly API. Certainly using Stream::WrapStream as you describe will also work

cpp/src/arrow/gpu/cuda_test.cc

cpp/src/arrow/gpu/cuda_context.cc

cpp/src/arrow/device.h

cpp/src/arrow/gpu/cuda_context.cc

kkraus14

One minor comment issue, otherwise LGTM

cpp/src/arrow/device.h

kkraus14 · 2023-08-30T15:31:25Z

cpp/src/arrow/device.h

@@ -109,19 +109,50 @@ class ARROW_EXPORT Device : public std::enable_shared_from_this<Device>,
  /// should be trivially constructible from it's device-specific counterparts.
  class ARROW_EXPORT Stream {
   public:
-    virtual const void* get_raw() const { return NULLPTR; }
+    using release_fn_t = void (*)(void*);


Should we consider making this std::function instead of a raw function pointer here? Using the former would allow for passing lambda captures at the cost of some overhead. Lambda captures would be useful for something like if someone has a smart pointer managing the lifetime of a stream.

conbench-apache-arrow · 2023-09-02T10:46:33Z

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3b8ab8e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

…pendencies (#37497) ### Rationale for this change #37365 was built locally using a newer version of the CUDA driver API than the crossbow builds run with causing the crossbow build to fail due to a change in macro names. This puts the macro name back to allow it to build with the older version of the cuda driver api. * Closes: #37523 Authored-by: Matt Topol <[email protected]> Signed-off-by: Matt Topol <[email protected]>

…he#37365) ### What changes are included in this PR? Adding `CudaDevice::SyncEvent` and `CudaDevice::Stream` implementations which provide more idiomatic handling of Events and Streams. ### Are these changes tested? Simple SyncEvent test added. More stream tests still being added. * Closes: apache#37364 Authored-by: Matt Topol <[email protected]> Signed-off-by: Matt Topol <[email protected]>

…UDA dependencies (apache#37497) ### Rationale for this change apache#37365 was built locally using a newer version of the CUDA driver API than the crossbow builds run with causing the crossbow build to fail due to a change in macro names. This puts the macro name back to allow it to build with the older version of the cuda driver api. * Closes: apache#37523 Authored-by: Matt Topol <[email protected]> Signed-off-by: Matt Topol <[email protected]>

…he#37365) ### What changes are included in this PR? Adding `CudaDevice::SyncEvent` and `CudaDevice::Stream` implementations which provide more idiomatic handling of Events and Streams. ### Are these changes tested? Simple SyncEvent test added. More stream tests still being added. * Closes: apache#37364 Authored-by: Matt Topol <[email protected]> Signed-off-by: Matt Topol <[email protected]>

…UDA dependencies (apache#37497) ### Rationale for this change apache#37365 was built locally using a newer version of the CUDA driver API than the crossbow builds run with causing the crossbow build to fail due to a change in macro names. This puts the macro name back to allow it to build with the older version of the cuda driver api. * Closes: apache#37523 Authored-by: Matt Topol <[email protected]> Signed-off-by: Matt Topol <[email protected]>

apacheGH-37364: [C++][GPU] Add CUDA impl of Device Event/Stream

78f6698

zeroshade requested review from felipecrv, bkietz and pitrou August 24, 2023 16:28

github-actions bot added Component: C++ awaiting committer review Awaiting committer review labels Aug 24, 2023

zeroshade added 4 commits August 24, 2023 12:50

linting

8ad843a

proper context handling

6ba42c2

add another stream test

38daa6d

more linting

12051d1

kkraus14 reviewed Aug 24, 2023

View reviewed changes

felipecrv reviewed Aug 25, 2023

View reviewed changes

cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Aug 25, 2023

bkietz requested changes Aug 28, 2023

View reviewed changes

cpp/src/arrow/device.h Outdated Show resolved Hide resolved

cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved

cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved

updates from review comments

3c4b5ad

github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Aug 28, 2023

linting

ef298a6

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 28, 2023

kkraus14 approved these changes Aug 29, 2023

View reviewed changes

cpp/src/arrow/device.h Outdated Show resolved Hide resolved

fix comment

1ee4a4c

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Aug 29, 2023

github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Aug 29, 2023

zeroshade requested a review from bkietz August 29, 2023 16:01

kkraus14 reviewed Aug 30, 2023

View reviewed changes

bkietz approved these changes Aug 30, 2023

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Aug 30, 2023

switch to std::function

5b00630

zeroshade merged commit 3b8ab8e into apache:main Aug 30, 2023
31 of 33 checks passed

zeroshade removed the awaiting merge Awaiting merge label Aug 30, 2023

zeroshade mentioned this pull request Aug 31, 2023

GH-37523: [C++][CI][CUDA] Don't use newer API and add missing CUDA dependencies #37497

Merged

zeroshade deleted the cuda-event-stream-impl branch August 31, 2023 15:22

zeroshade mentioned this pull request Sep 1, 2023

[C++][CI][CUDA] CUDA Crossbow build failure #37523

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-37364: [C++][GPU] Add CUDA impl of Device Event/Stream #37365

GH-37364: [C++][GPU] Add CUDA impl of Device Event/Stream #37365

zeroshade commented Aug 24, 2023 •

edited by github-actions bot

Loading

zeroshade commented Aug 24, 2023

github-actions bot commented Aug 24, 2023

kkraus14 Aug 24, 2023

zeroshade Aug 25, 2023

kkraus14 Aug 25, 2023

zeroshade Aug 28, 2023

bkietz Aug 29, 2023

zeroshade Aug 29, 2023

bkietz Aug 30, 2023 •

edited

Loading

kkraus14 Aug 30, 2023

zeroshade Aug 30, 2023

bkietz Aug 30, 2023

kkraus14 left a comment

kkraus14 Aug 30, 2023

conbench-apache-arrow bot commented Sep 2, 2023

		explicit Stream(std::shared_ptr<CudaContext> ctx, CUstream stream) noexcept
		: context_{std::move(ctx)}, stream_{stream} {}

GH-37364: [C++][GPU] Add CUDA impl of Device Event/Stream #37365

GH-37364: [C++][GPU] Add CUDA impl of Device Event/Stream #37365

Conversation

zeroshade commented Aug 24, 2023 • edited by github-actions bot Loading

What changes are included in this PR?

Are these changes tested?

zeroshade commented Aug 24, 2023

github-actions bot commented Aug 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkietz Aug 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkraus14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conbench-apache-arrow bot commented Sep 2, 2023

zeroshade commented Aug 24, 2023 •

edited by github-actions bot

Loading

bkietz Aug 30, 2023 •

edited

Loading