Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for parquet files #91

Open
alperyilmaz opened this issue Jun 26, 2022 · 3 comments
Open

support for parquet files #91

alperyilmaz opened this issue Jun 26, 2022 · 3 comments

Comments

@alperyilmaz
Copy link
Contributor

This might sound crazy but still I wanted to propose a feature request about parquet files.

You might ask, why? Parquet files are becoming more widespread and might even be considered as "the new csv". There are specialized tools such as duckdb to run sql commands on them. I didn't see or came across any awk-like utility which can process parquet files.

IMHO, supporting parquet files by frawk will be a huge win for "data analysis at the commandline" camp.

@ezrosent
Copy link
Owner

ezrosent commented Jun 27, 2022

I don't think this is unreasonable at all (which isn't to say it will be easy :)) ! I think something like Parquet would be pretty interesting to support with frawk. I may repurpose this issue for general "supporting complex datatypes with arbitrary nesting" but I think Parquet should definitely be in the picture, along with Arrow and maybe JSON eventually. This will take some time, and I will probably start on some "easier" work around fixing the parser first but I will definitely keep this issue open. Thanks for the suggestion.

@alperyilmaz
Copy link
Contributor Author

I'm sure the "nested data" part will be a headache, maybe initially not-nested files are supported..

@linux-china
Copy link

time flies, and now I use duckdb to convert Parquet to CSV.

$ duckdb -c "COPY (select * from 'family.parquet') TO 'query.csv' (FORMAT CSV)"

Now lots of tools support Parquet to CSV: Polars, DuckDB, ClickHouse local, GlareDB etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants