-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bytes_per_second
to transpose benchmark
#14170
Add bytes_per_second
to transpose benchmark
#14170
Conversation
/ok to test |
ded8fa9
to
d2676b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing this. Just one request for maintainability.
@@ -40,16 +40,29 @@ static void BM_transpose(benchmark::State& state) | |||
cuda_event_timer raii(state, true); | |||
auto output = cudf::transpose(input); | |||
} | |||
|
|||
// Collect memory statistics. | |||
auto const bytes_read = input.num_columns() * input.num_rows() * (sizeof(int32_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid potential future type mismatches that result in wrong bytes/s reports. So I think you should stash the type_id
in a variable above:
constexpr auto column_type = cudf::type_id::INT32;
And then here use CUDF's id_to_type
utility:
auto const bytes_read = input.num_columns() * input.num_rows() * (sizeof(int32_t)); | |
auto const bytes_read = input.num_columns() * input.num_rows() * (sizeof(cudf::id_to_type(column_type))); |
@@ -42,7 +44,7 @@ static void BM_transpose(benchmark::State& state) | |||
} | |||
|
|||
// Collect memory statistics. | |||
auto const bytes_read = input.num_columns() * input.num_rows() * (sizeof(int32_t)); | |||
auto const bytes_read = input.num_columns() * input.num_rows() * cudf::size_of(column_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 suggestion: This is one way to do it. But I think the way I suggested is a bit better because it all happens at compile time, whereas cudf::size_of()
invokes the type dispatcher at run time. Not that it will affect benchmarks, but it just seems cleaner to use sizeof(cudf::id_to_type<column_type>)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Blonck !
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Blonck !
@Blonck Can you please rebase with the latest |
auto const bytes_written = bytes_read; | ||
// Account for nullability in input and output. | ||
auto const null_bytes = | ||
2 * input.num_columns() * cudf::bitmask_allocation_size_bytes(input.num_rows()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This one could also overflow, I think, perhaps:
2 * input.num_columns() * cudf::bitmask_allocation_size_bytes(input.num_rows()); | |
2 * static_cast<uint64_t>(input.num_columns()) * cudf::bitmask_allocation_size_bytes(input.num_rows()); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about this one? Since the return type of cudf::bitmask_allocation_size_bytes
is std::size_t
which is either unsigned long
or unsigned long long
so for reasonable input sizes the integer type promotion will avoid the overflow (https://cppinsights.io/s/26f977cb).
That said, just having this discussion indicates I should've included an explicit cast upfront to clear up any potential confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left-to-right associativity means that this is evaluated as (2 * ncol) * nrow
, the first multiplication is performed in size_type
(AKA, int32_t
), so that could overflow, no? Although I think these benchmarks are generally run with fewer than
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just always put the thing that returns size_t
(sizeof
or bitmask_allocation_size_bytes
) first in the arithmetic in all of these PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would keep the cast explicit to make visible what is happening, but I don't have a strong stance on it.
/ok to test |
Co-authored-by: Lawrence Mitchell <[email protected]>
/ok to test |
/ok to test |
/ok to test |
/merge |
/ok to test |
/merge |
1 similar comment
/merge |
Wonder why my merge commands weren't accepted. |
It means now github starts to realize that you are no longer cudf developer 😆 |
This patch relates to #13735.
Benchmark: transpose_benchmark.txt
Checklist