Add validation to reader benchmarks #14137

vuule · 2023-09-20T06:39:36Z

Description

Validate that cuIO readers successfully round trip the input tables.
Validation is only done in the first iteration, and it is not included in the timing.
If there is a difference, a warning is logged.
Setting CUDF_BENCH_OUTPUT_DIFF environment variable adds diff to the standard output. Valid values are FIRST_ERROR and ALL_ERRORS.

Benchmarks currently report differences for data types that can't be preserved through the given formats. There are not differences caused by data corruption.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…fea-benchmark-validation

vyasr

The implementation seems sensible, but why are we doing this in benchmarks rather than tests?

vuule · 2023-09-22T19:27:12Z

The implementation seems sensible, but why are we doing this in benchmarks rather than tests?

To verify that the benchmarks are doing what they're supposed to. Also, it's helpful in situations where we benchmark different configurations (thread pool size, GDS policy, compression policy...). Basically enabling us to benchmark changes without jumping back to tests for each change.

vuule added 7 commits September 19, 2023 13:06

checker

025b26b

ORC reader

9c98019

missing block

397d997

PQ reader

501e0ec

CSV reader

b0fd609

JSON reader

6c65b3e

Merge branch 'branch-23.10' of https://github.com/rapidsai/cudf into …

13bf944

…fea-benchmark-validation

vuule added tests Unit testing for project improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 20, 2023

vuule self-assigned this Sep 20, 2023

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 20, 2023

vuule changed the title ~~Fea benchmark validation~~ Add validation to reader benchmarks Sep 20, 2023

vuule added 2 commits September 20, 2023 09:55

docs

cb8296a

Merge branch 'branch-23.10' of https://github.com/rapidsai/cudf into …

366afb6

…fea-benchmark-validation

vuule marked this pull request as ready for review September 20, 2023 16:56

vuule requested a review from a team as a code owner September 20, 2023 16:56

vuule requested review from vyasr and divyegala September 20, 2023 16:56

vyasr requested changes Sep 22, 2023

View reviewed changes

Merge branch 'branch-23.10' into fea-benchmark-validation

855c376

vuule closed this Dec 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add validation to reader benchmarks #14137

Add validation to reader benchmarks #14137

vuule commented Sep 20, 2023

vyasr left a comment

vuule commented Sep 22, 2023

Add validation to reader benchmarks #14137

Add validation to reader benchmarks #14137

Conversation

vuule commented Sep 20, 2023

Description

Checklist

vyasr left a comment

Choose a reason for hiding this comment

vuule commented Sep 22, 2023