Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement (de)serialization of Series/DataFrames using IPC #17250

Closed
stinodego opened this issue Jun 27, 2024 · 4 comments
Closed

Implement (de)serialization of Series/DataFrames using IPC #17250

stinodego opened this issue Jun 27, 2024 · 4 comments
Labels
A-serde Area: seralization and deserialization accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-goal Priority: aligns with long-term Polars goals P-low Priority: low

Comments

@stinodego
Copy link
Member

Our existing implementation of Serialize/Deserialize on ChunkedArray is not very optimized, and does not support nested data well.

We should leverage IPC to improve this.

@stinodego stinodego added enhancement New feature or an improvement of an existing feature accepted Ready for implementation A-serde Area: seralization and deserialization P-goal Priority: aligns with long-term Polars goals labels Jun 27, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jun 27, 2024
@stinodego stinodego moved this from Ready to Next in Backlog Jun 27, 2024
@ritchie46 ritchie46 added the P-low Priority: low label Jun 30, 2024
@ritchie46
Copy link
Member

Added p-low for now as it will be the bottlebeck once we want to support larger frames, but I'd like to start with cloud datasets.

@stinodego
Copy link
Member Author

stinodego commented Jun 30, 2024

Added p-low for now as it will be the bottlebeck once we want to support larger frames, but I'd like to start with cloud datasets.

Do you think it will be less effort to fix the various bugs with our current serialization (mostly for nested types) than to switch to IPC serialization?

Performance doesn't have to be optimal at first, but the serialization does need to be correct in all cases.

@ritchie46
Copy link
Member

ritchie46 commented Jun 30, 2024

No, arbitrary nesting is much more complex and it will be an effort that's in vain as we will switch to IPC anyway. The p-goal is to get the cloud queries running. We can start with non-nested literals for now, until we switch to IPC.

@lukemanley
Copy link
Contributor

is this closed by #20266?

@github-project-automation github-project-automation bot moved this from Next to Done in Backlog Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-serde Area: seralization and deserialization accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-goal Priority: aligns with long-term Polars goals P-low Priority: low
Projects
Status: Done
Development

No branches or pull requests

4 participants