-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialisable partitioning spec #291
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem description
The physical layout and indexing of the dataset dominantly impacts read performances. Often dataset are designed in such a way to support a rather specific use case where many of the partitioning parameters must be set and even minor deviations or omittances would cause severe changes in performance. We offer increasingly many levers to control the dataset layout but do not offer a concise way to store, share, verify or reproduce this easily. Many of the performance critical parameters are not easily reconstructable
Things I have in mind which should be part of this specification are
Benefits
Open questions
I'm curious to know if other people consider this useful or not
The text was updated successfully, but these errors were encountered: