-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax extension requirements, maybe rename to formats #64
Comments
the main reason is to provide an incentive for people to flock around a few data formats and make them better ... as opposed to everyone "brewing their own thing". The latter leads to an increased burden in managing IO software, and also for end users to be able to "know" more data formats. |
Did you mean “data formats” instead of “datasets”? If not I don't think that's something I ever saw as a goal of BIDS, i.e. consolidating a small number of datasets as opposed to allowing better access to as many as possible — nor do I see what it has to do with data formats. If you meant “data formats”... it's not just that we're limiting the total number of formats we support, but also constraining them per modality for reasons that are more historical than anything. In a sense that constrains the IO software. Lots of data can be represented as NIfTI, and thereby analyzed with the rich NIfTI tools, so why restrict that? Blanket permitting all/some formats would allow formats to better spread across use cases based on the tooling support they have. |
yes, sorry.
I wouldn't say that we do that for "historical" reasons. To my understanding we do that to reflect the most common practices in the field where a particular modality is used. For example, NIfTI is used in MRI ... but not in EEG, even though you probably could somehow encode EEG data in NIfTI.
yes, but it will also invite edge cases, where a single dataset curator is exceptionally well versed in a particular data format and uses/applies it ... however the large majority of the community won't be able to use it because they lack the tools/skills. I am playing a bit of a devil's advocate here. I personally don't have a big horse in this race. But I do think that fewer, rather than more, data formats are a good idea. I am saying this coming from a project like MNE-Python, where every other few months somebody is requesting support for yet another data format that is entirely unnecessary as the data could be represented in an already existing (open) format. |
The whole point of any standardization is to minimize variability. BIDS did not only minimize variability in how people name their files, but also in file formats to use. Hence you @TheChymera can always open Someone in turn could establish some "BIDS naming convention" or "BIDS naming principles" which would then allow for arbitrary file formats to be used and rather just promote use of schema and the rest of the logic behind files organization. But it would be a different project.
|
+1 to closing this. |
@yarikoptic but that's exactly not what I meant. I was referring to data formats specifically. I even gave the exact same example:
Yes, the metadata files, like the file naming conventions, are optimized for easy browsing, readability, and (maybe on purpose maybe incidentally) are very convenient to manipulate with GNU coreutils or other ubiquitous CLI packages.
I already mentioned that proposal 2 was probably not as good as proposal 1, because there are some reasons to exclude e.g. proprietary formats.
But should a dataset be “invalid” for using an uncommon practice, even if it's still open source and useful to the experimenter?
Isn't that an edge case we want? Think of the following: MRI expert wants to integrate data with microscopy, and use NIfTI for everything, including the microscopy data, so it can all be handled in the same space with the same tools. Why block that?
In a sense, that's addressed by proposal 1. The guy from the NIfTI example is me. I'd like to use NIfTI for more things. More broader acceptance of formats that are already accepted could materialize in a consolidation around fewer formats. I also think there are other people who would like to |
during several BEP processes (e.g., EEG, iEEG) several file formats have been vetoed |
@sappelhoff oh, I was unaware, thanks for telling me. Do you remember which ones they were or have a link tot he discussions? I'm curious what demonstrably disqualifies a format. |
Most of these discussions happened on the old BEP006 Google Doc, and there was a community survey about data formats used in the community in 2018. the survey results used to be reported here: https://bids.berkeley.edu/news/bids-megeegieeg-data-format-survey. Unfortunately this archive did not preserve images: https://web.archive.org/web/20230130152808/https://bids.berkeley.edu/news/bids-megeegieeg-data-format-survey ... but perhaps you can do some digging and find something.
we wanted the file formats to:
|
With some file formats supporting data across modalities (any volumetric data can be NIfTI, any raster image can be TIFF, anyting at all can be ZARR) I wonder if it makes sense to restrict these “extensions”.
I'm also wondering whether the terminology shouldn't be renamed to “formats”.
More generally, I'm also not sure why the emergence of a new format would need to be “accepted” by BIDS first before a dataset using it can be BIDS-compliant.
Is there any reason why we would ever say no?
If not, why not allow any data format?
I'm mentioning data format specifically, because for metadata files, which BIDS as a standard controls the contents of, we can't just have people using
participants.xlsx
. But BIDS does not control the analysis of TIFF, or NWB, or MNAF (my new amazing format), so why not let people use whatever fits their use case?I see some utility in discouraging bad practices, such as proprietary or
.m
files for everything, or compressed.jpeg
for optical imaging — so maybe allowing anything would go too far. But in any case I think open formats with no compression could be globally accepted.The text was updated successfully, but these errors were encountered: