Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support RecordBatch.flatten #6369

Open
kszlim opened this issue Sep 8, 2024 · 5 comments
Open

Support RecordBatch.flatten #6369

kszlim opened this issue Sep 8, 2024 · 5 comments
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@kszlim
Copy link
Contributor

kszlim commented Sep 8, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I want to write flattened parquet files, as not everything has support for structs.

Describe the solution you'd like
Recursively flatten all struct columns in a recordbatch (similar to pandas json normalize), alternatively, a solution via datafusion might be acceptable.

Describe alternatives you've considered
Running pyarrow.Table.flatten in a loop until there are no more top level struct columns, though this requires you to go through python.

@kszlim kszlim added the enhancement Any new improvement worthy of a entry in the changelog label Sep 8, 2024
@alamb
Copy link
Contributor

alamb commented Sep 9, 2024

I think implementing the equivalent of
https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten

For RecordBatch

Makes sense to me

@kszlim
Copy link
Contributor Author

kszlim commented Sep 9, 2024

If implemented similar to json normalize you could take in a max depth option, this would make it strictly more powerful/flexible than pyarrow.Table.flatten.

@ngli-me
Copy link
Contributor

ngli-me commented Nov 17, 2024

Hi, do you all mind if I give this a shot?

@kszlim
Copy link
Contributor Author

kszlim commented Nov 17, 2024

Hi, do you all mind if I give this a shot?

Go ahead!

@ngli-me
Copy link
Contributor

ngli-me commented Nov 17, 2024

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants