Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2382: Remove the deprecated OriginalType #1194

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Nov 15, 2023

Make sure you have checked all steps below.

For Iceberg we're adding nanosecond timestamps. During my investigation in Parquet, I noticed that there are two ways of declaring logical types:

  1. Through the deprecated OriginalType
  2. Using the new LogicalTypeAnnotation API.

The old API does not support nano's but is still used in downstream projects, such as Parquet, where I'm working on migrating to the new API: apache/iceberg#9063

However, since it was five years ago in PARQUET-1452 when it was marked as deprecated, released in Parquet 1.11.0. I would love to remove the old API to make sure that downstream engines migrate to the new API and handle nano's correctly.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

OriginalType originalType = getLogicalTypeAnnotation(schemaElement.converted_type, schemaElement).toOriginalType();
OriginalType newOriginalType = (schemaElement.isSetLogicalType() && getLogicalTypeAnnotation(schemaElement.logicalType) != null) ?
getLogicalTypeAnnotation(schemaElement.logicalType).toOriginalType() : null;
if (!originalType.equals(newOriginalType)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was a 🐛 hidden here. Because we would only compare on the logical type itself, and not its properties (precision/scale for decimal, or adjust-for-utc for time/timestamp). Tests started failing, therefore I added the getAdjustToUtc to retrieve the actual value from the Parquet structure.

@Fokko Fokko changed the title PARQUET-2382: Remove the deprecated OriginalType PARQUET-2382: Remove the deprecated OriginalType Nov 15, 2023
@wgtmac
Copy link
Member

wgtmac commented Nov 16, 2023

I am fine with it but I'd like to seek advices from @gszadovszky @shangxinli

@gszadovszky
Copy link
Contributor

I don't think it is a good idea to remove deprecated API in a minor release. That's why we have japicmp to ensure backward compatibility.
I think, there is no harm for Parquet if they use the old OriginalTypes. If we enforce significant code changes in minor releases we would also slow down the upgrades.

@Fokko
Copy link
Contributor Author

Fokko commented Nov 24, 2023

Makes sense. I think it would be good to remove OriginalType at some point. Let's target this PR for 2.0 and leave it for now. I'll create another PR to mark the getOriginalType() deprecated (this one was marked as private by Yetus before, but I think it would be best to mark them as deprecated for 1.14.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants