-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow RLE for bools in v1 pages #885
Conversation
I don't think fastparquet will follow suit and write bools as RLE... |
It might be worth adding a test for this .. as it doesn't seem to be fixing it for me (well, almost, but values are still inverted). Using the files from the issue:
|
Are you certain?
|
Well, you can see what I printed above. That was after doing a You can check if the files you created are actually using RLE encoding with The files I was using: test_bool_pa13.zip |
V2 Data pages are actually in a weird state, where most of the benefits. Last I checked, I think Arrow still had some bugs for them. There is also a somewhat stalled effort to try to standardize expectations around the format: apache/parquet-format#164 |
Did you remember any outstanding bug related to data page v2 in the parquet-cpp? |
Sorry for the late reply, doing a quick audit appears that they might all be fixed (I think row count might have been the last one). |
Fixes #884
@jorisvandenbossche , could you please try with this?
Interestingly, there seems to have been no issue with V2 pages. I wonder why the push to change the defaults (and indeed the allowed encodings) for a version that is long superceded?