Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: parquet 53.0.0 feature branch #6050

Closed
alamb opened this issue Jul 13, 2024 · 13 comments
Closed

Proposal: parquet 53.0.0 feature branch #6050

alamb opened this issue Jul 13, 2024 · 13 comments
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate auto-dependencies enhancement Any new improvement worthy of a entry in the changelog next-major-release the PR has API changes and it waiting on the next major version object-store Object Store Interface parquet Changes to the parquet crate

Comments

@alamb
Copy link
Contributor

alamb commented Jul 13, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We are now being careful about breaking changes (see https://github.com/apache/arrow-rs/blob/master/CONTRIBUTING.md#breaking-changes)

This means we can't merger PRs with breaking API changes to main until early August

However there are now three potentially large parquet changes that could conflict with each other and have API changes:

Describe the solution you'd like
Some way to avoid a massive set of merge conflicts when we start merging changes to master for parquet 53

I would also love to be able to review and merge smaller PRs rather than keep several large ones outstanding

Describe alternatives you've considered

I would like to propose we create a feature branch (e.g. parquet-53.0.0) in the arrow-rs repo that we can merge parquet API changes to and develop new features

Once main opens for 53 (in early August) we can merge the branch to main

This approach does require maintenance of the parquet 53 branch and runs the risk of accumulating merge conflicts as it diverges from master. I am willing to help do the proces

Additional context

@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Jul 13, 2024
@alamb alamb added the parquet Changes to the parquet crate label Jul 13, 2024
@adriangb
Copy link
Contributor

If we can't merge breaking changes to main I agree this is the path forward. It's very generous of you to take up that burden.

For what it's worth I don't think #6000 should include breaking changes, if there are any I think they're accidental. I'll double check. I do agree it might cause merge conflicts.

@etseidl
Copy link
Contributor

etseidl commented Jul 13, 2024

FWIW I just tried a merge of #5486 and #6000. The only significant conflict was missing (new) arguments for the thrift OffsetIndex and ColumnIndex structs. Shouldn't be hard to integrate those two, but it's probably better for the metadata encoder to merge first (especially since #5486 will now be broken up into smaller chunks). #6045 and #6000 should be easier to reconcile and can probably be merged in any order.

The bloom filter changes seem to be orthogonal to the other two, so I don't anticipate any issues there.

@alamb
Copy link
Contributor Author

alamb commented Jul 14, 2024

For what it's worth I don't think #6000 should include breaking changes, if there are any I think they're accidental. I'll double check. I do agree it might cause merge conflicts.

Thanks @adriangb -- I agree the first PR won't have API conflicts. However, I expect some iteration on the APIs, so once we have merged / released new API (Stats builder) then any changes we make to the new API will require "breaking changes" so we would have to remember what APIs we have released / haven't released.

Maybe we could put it behind a feature flag or something that made it clear the API would change 🤔

@alamb
Copy link
Contributor Author

alamb commented Jul 14, 2024

The bloom filter changes seem to be orthogonal to the other two, so I don't anticipate any issues there.

I guess I was imagining the parquet metadata encoder would also potentially have the ability to write bloom filters and thus may be affected by #6000

@alamb
Copy link
Contributor Author

alamb commented Jul 15, 2024

🤔 maybe I can simply make a 53.0.0-dev branch and we can merge the 53 features there 🤔

@alamb
Copy link
Contributor Author

alamb commented Jul 15, 2024

The more I think about this the more sense I think it makes to have a 53 feature branch for the next few weeks so we can not build up a massive set of conclits. I plan to create one tomorrow unless there are objections and start merging stuff there to clear the review queue

@etseidl
Copy link
Contributor

etseidl commented Jul 15, 2024

The more I think about this the more sense I think it makes to have a 53 feature branch for the next few weeks so we can not build up a massive set of conclits. I plan to create one tomorrow unless there are objections and start merging stuff there to clear the review queue

Sounds good to me. In addition to #6045, I have 3 stacked PRs ready to go, and I also have a plan for integrating with the changes from #6000.

@alamb
Copy link
Contributor Author

alamb commented Jul 16, 2024

I have created https://github.com/apache/arrow-rs/tree/53.0.0-dev as the branch and will now begin retargeting and merging PRs.

This will likely require me to do some release note finagling but we'll handle that when we get there

@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'next-major-release'} from #5933

@alamb alamb added the arrow Changes to the arrow crate label Jul 24, 2024
@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'arrow'} from #6018

@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'auto-dependencies'} from #6018

@alamb alamb added the arrow-flight Changes to the arrow-flight crate label Jul 24, 2024
@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'arrow-flight'} from #6041

@alamb alamb added the object-store Object Store Interface label Jul 24, 2024
@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2024

label_issue.py automatically added labels {'object-store'} from #5930

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate auto-dependencies enhancement Any new improvement worthy of a entry in the changelog next-major-release the PR has API changes and it waiting on the next major version object-store Object Store Interface parquet Changes to the parquet crate
Projects
None yet
Development

No branches or pull requests

3 participants