Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should MEDS datasets write _metadata or _common_metadata files? #43

Open
mmcdermott opened this issue Aug 9, 2024 · 2 comments
Open
Labels
enhancement New feature or request Pending Community Use Issues that should be solved after sufficient community uptake and use to dictate method priority:low Low priority; does not need to be included in any upcoming release candidates. Usability For the usability of the MEDS schema more generally by the community at a technical level.

Comments

@mmcdermott
Copy link
Contributor

I don't use these, and am certainly not suggesting these would be high priority, but according to this:

https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-metadata-files

some frameworks use them to better parse sharded/partitioned datasets.

@EthanSteinberg
Copy link
Collaborator

I think we should consider adding this if it becomes an issue for someone, but to not bother until then. These datasets are small enough that I'm skeptical there would be that much benefit (those files are mainly helpful for thousands of shards IIRC)

@mmcdermott
Copy link
Contributor Author

sounds good to me. I'll leave it up to you to close this issue or leave it open; either way hopefully folks who want this functionality (if any such exist) will be able to find it and comment on it in the future to indicate interest.

@mmcdermott mmcdermott added enhancement New feature or request priority:low Low priority; does not need to be included in any upcoming release candidates. Pending Community Use Issues that should be solved after sufficient community uptake and use to dictate method Usability For the usability of the MEDS schema more generally by the community at a technical level. labels Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Pending Community Use Issues that should be solved after sufficient community uptake and use to dictate method priority:low Low priority; does not need to be included in any upcoming release candidates. Usability For the usability of the MEDS schema more generally by the community at a technical level.
Projects
None yet
Development

No branches or pull requests

2 participants