Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast only time of day to nanos to avoid an overflow in Parquet INT96 write #13776

Merged
merged 29 commits into from
Aug 2, 2023

Conversation

gerashegalov
Copy link
Contributor

@gerashegalov gerashegalov commented Jul 27, 2023

Description

Rework extraction of nanoseconds of the last day in INT96 write call path to avoid overflow. Contributes to NVIDIA/spark-rapids#8625

Fixes #8070

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@gerashegalov gerashegalov requested a review from a team as a code owner July 27, 2023 08:57
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 27, 2023
@gerashegalov gerashegalov added Spark Functionality that helps Spark RAPIDS bug Something isn't working labels Jul 27, 2023
gerashegalov added a commit to gerashegalov/spark-rapids that referenced this pull request Jul 27, 2023
@gerashegalov gerashegalov added the non-breaking Non-breaking change label Jul 27, 2023
@gerashegalov gerashegalov self-assigned this Jul 28, 2023
@gerashegalov gerashegalov added the cuIO cuIO issue label Jul 29, 2023
@gerashegalov gerashegalov changed the title INT96 - Cast to nanos only time of the day to avoid a 64-bit overflow Cast only time of day to nanos to avoid an overflow in Parquet INT96 write Aug 1, 2023
cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of minor nitpicks regarding documentation, and a typo when calling julian_days_with_time().

@gerashegalov gerashegalov requested a review from bdice August 1, 2023 18:57
Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to ship after addressing the "references to temporaries" problem at line 1241.

@gerashegalov
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit c412480 into rapidsai:branch-23.10 Aug 2, 2023
54 checks passed
@gerashegalov gerashegalov deleted the int96v2 branch August 2, 2023 13:50
gerashegalov added a commit to NVIDIA/spark-rapids that referenced this pull request Aug 4, 2023
- Remove INT96 check for being in the Long.Min:Max range. 
- Update docs and tests

Fixes #8625 

Depends on  rapidsai/cudf#8070, rapidsai/cudf#13776
 
Verified with the [notebook](https://github.com/gerashegalov/rapids-shell/blob/9cb4598b0feba7b71eb91f396f4b577bbb0dec00/src/jupyter/int96.ipynb) and 
 
 ```bash
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 TEST_PARALLEL=0 SPARK_HOME=~/dist/spark-3.3.2-bin-hadoop3 ./integration_tests/run_pyspark_from_build.sh -k test_fix_for_int96_overflow
```

Signed-off-by: Gera Shegalov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG] Parquet writer isn't writing INT96 values properly
6 participants