-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Instead of Panic On Attempting to Write More Than 32769 Row Groups #6591
Comments
The i16 is actually limit enforced by the parquet format itself - https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L940 Row groups of this size are such a bad idea, the format actively prevents it 😅 That being said we could make this an error not a panic |
No disagreement here. I am exploring opportunities to change the UX of |
I agree error vs panic is a much nicer behavior |
Is this fixed by #6378? I can't reproduce with the current head because |
It appears that #6378 (thanks @progval ❤️ ) was released in version |
|
Great stuff, thanks! |
Kudos to @etseidl -- BTW this should be available on crates.io in about 2 days -- it is included in |
Describe the bug
i16 counting row groups overflows and becomes negative causing panic
To Reproduce
Writing 32769 row groups with the file writer
Expected behavior
Maybe an error indicating that too many batches have been written would be preferable. Alternatively it would be nice if this just worked, yet I could also get behind the thinking that this may be too many row groups for a single file anyway.
Additional context
Occurred in the context of a user running
odbc2parquet
. His row groups were very small (15 rows) due to an issue with his row sizes, causing him to write lots of row groups into a single file. See: pacman82/odbc2parquet#652The text was updated successfully, but these errors were encountered: