Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet: Clean up Parquet generic and internal readers #12102

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Jan 25, 2025

This is a refactor that cleans up a few issues I noticed while reviewing #11904 and while working on Parquet variant readers.

  • Updates INT and UINT handling to reject unsupported unsigned types (like UINT64)
  • Adds more factory methods to ParquetValueReaders to keep implementations private
  • Removes unused types argument from ParquetValueReader.StructReader subclasses. Some accessible uses had to be left, but most are removed. This allowed removing types from the reader builders as well.
  • Removes unnecessary TimeUnit param from factory methods added in Parquet: Add readers and writers for the internal object model #11904
  • Removes unnecessary implementations of setPageSource
  • Adds imports for LogicalTypeAnnotation classes to cut down on multi-line method signatures

Changes are grouped into separate commits for easier review.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is great, thanks for splitting into multiple commits, that made it easy to review. We'll need to update revAPI with the new ParquetValueReader.createStructReader interface which doesn't take in explicit types

Comment on lines 90 to +91
protected abstract ParquetValueReader<T> createStructReader(
List<Type> types, List<ParquetValueReader<?>> fieldReaders, Types.StructType structType);
List<ParquetValueReader<?>> fieldReaders, Types.StructType structType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RevAPI is failing due to the new abstract method that implementations will need to implement (which I understand the rationale, the previous types argument were not used). I think we'll need to add the breaking change to revAPI

./gradlew :iceberg-parquet:revapiAcceptBreak --justification "Implementations of ParquetValueReader.createStructReader should not have to pass in explicit types" \
          --code "java.method.abstractMethodAdded" \
          --new "method org.apache.iceberg.parquet.ParquetValueReader<T> org.apache.iceberg.data.parquet.BaseParquetReaders<T>::createStructReader(java.util.List<org.apache.iceberg.parquet.ParquetValueReader<?>>, org.apache.iceberg.types.Types.StructType)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants