support for parquet files #91

alperyilmaz · 2022-06-26T23:36:59Z

This might sound crazy but still I wanted to propose a feature request about parquet files.

You might ask, why? Parquet files are becoming more widespread and might even be considered as "the new csv". There are specialized tools such as duckdb to run sql commands on them. I didn't see or came across any awk-like utility which can process parquet files.

IMHO, supporting parquet files by frawk will be a huge win for "data analysis at the commandline" camp.

ezrosent · 2022-06-27T02:53:39Z

I don't think this is unreasonable at all (which isn't to say it will be easy :)) ! I think something like Parquet would be pretty interesting to support with frawk. I may repurpose this issue for general "supporting complex datatypes with arbitrary nesting" but I think Parquet should definitely be in the picture, along with Arrow and maybe JSON eventually. This will take some time, and I will probably start on some "easier" work around fixing the parser first but I will definitely keep this issue open. Thanks for the suggestion.

alperyilmaz · 2022-06-27T09:20:02Z

I'm sure the "nested data" part will be a headache, maybe initially not-nested files are supported..

linux-china · 2024-03-12T03:38:33Z

time flies, and now I use duckdb to convert Parquet to CSV.

$ duckdb -c "COPY (select * from 'family.parquet') TO 'query.csv' (FORMAT CSV)"

Now lots of tools support Parquet to CSV: Polars, DuckDB, ClickHouse local, GlareDB etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for parquet files #91

support for parquet files #91

alperyilmaz commented Jun 26, 2022

ezrosent commented Jun 27, 2022 •

edited

Loading

alperyilmaz commented Jun 27, 2022

linux-china commented Mar 12, 2024

support for parquet files #91

support for parquet files #91

Comments

alperyilmaz commented Jun 26, 2022

ezrosent commented Jun 27, 2022 • edited Loading

alperyilmaz commented Jun 27, 2022

linux-china commented Mar 12, 2024

ezrosent commented Jun 27, 2022 •

edited

Loading