-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use stream pool for gather/scatter. #14162
base: branch-23.10
Are you sure you want to change the base?
Conversation
Gather (scatter) summary:
|
This is probably not going to do much for the listed issue. The issue here is the number of raw thrust and kernel calls stemming from nesting (think thousands of columns underneath the top level table). The fix for that is going to be smarter parallelization of what is currently the recursive cpu-side approach. We have some ideas here. |
// only a single column, the fork/join overhead should be avoided. | ||
auto streams = std::vector<rmm::cuda_stream_view>{}; | ||
if (num_columns > 1) { | ||
streams = cudf::detail::fork_streams(stream, num_columns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious how this works for per-thread default stream. For spark, we build cuDF with PTDS. Will streams
will be have number-of-columns vector of the PTDS stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stream
passed in would be the PTDS stream. Then a stream pool (for that thread) would be created (or reused), the work would be executed across that stream pool, and then the join
step would insert events for all the elements of streams
to be synchronized with stream
before new work on stream
(the PTDS stream in Spark's case) would be runnable.
Description
This PR uses the stream pool introduced in #13922 to gather/scatter each column in a table on a separate stream.
Related: #13509, which this might resolve (need to verify).
Checklist