-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for parquet files #91
Comments
I don't think this is unreasonable at all (which isn't to say it will be easy :)) ! I think something like Parquet would be pretty interesting to support with frawk. I may repurpose this issue for general "supporting complex datatypes with arbitrary nesting" but I think Parquet should definitely be in the picture, along with Arrow and maybe JSON eventually. This will take some time, and I will probably start on some "easier" work around fixing the parser first but I will definitely keep this issue open. Thanks for the suggestion. |
I'm sure the "nested data" part will be a headache, maybe initially not-nested files are supported.. |
time flies, and now I use duckdb to convert Parquet to CSV. $ duckdb -c "COPY (select * from 'family.parquet') TO 'query.csv' (FORMAT CSV)" Now lots of tools support Parquet to CSV: Polars, DuckDB, ClickHouse local, GlareDB etc. |
This might sound crazy but still I wanted to propose a feature request about parquet files.
You might ask, why? Parquet files are becoming more widespread and might even be considered as "the new csv". There are specialized tools such as duckdb to run sql commands on them. I didn't see or came across any awk-like utility which can process parquet files.
IMHO, supporting parquet files by frawk will be a huge win for "data analysis at the commandline" camp.
The text was updated successfully, but these errors were encountered: