Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Partition-metadata tracking in Dask-DataFrame #3

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rjzamora
Copy link
Member

Brief proposal for a performance-motivated metadata update in Dask-DataFrame.

NOTE: Although this proposal is distinct from a high-level graph or query-optimization system, it should make such a system much easier to implement! I say this, because we will still want to be tracking and managing the same kind of metadata in one place.

See also: dask/dask#9473

@rjzamora
Copy link
Member Author

cc @jrbourbeau @mrocklin for viz

There is not much to review here yet, but I am starting to organize my thoughts on this, and I'm feeling somewhat confident that this would be a worthwhile effort. The rough POC wasn't very difficult to put together, and I think proper HLG/HLE-optimization would require us to do much of the same refactoring anyway (unless we choose to do high-level optimization in a completely new library, that is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant