-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniformize .tsv and .tsv.gz: both to have header + both have "Columns" dictionary (!not list) in sidecar .json #71
Comments
I would vote for uniformity between .tsv and .tsv.gz |
We could state that any of these is acceptable (perhaps with a preference in some use cases), assume people will use one that matches the typical use case for their dataset, and make a simple tool available to convert among them. |
Just want to note that EDIT: Based on maintainers discussion with Peer- maybe just assume |
FWIW, although being .tsv does not mandate having a header generally (outside of the BIDS), I feel there was somewhat of overfitting to the tool that .tsv.gz was made to be the one without header while .tsv had to have a header. I second @effigies suggestion above, besides I think addition of re
|
During BEP044 call I was made aware that situation is more "intricate" in case of motion files (an example is ds004460). There
|
This issue is collating two aspects but I think it is warranted. If would be desired - we could split into two.
BIDS 1.x situation
ATM,
.tsv.gz
are a not just a compressed.tsv
like e.g. it happens with.nii.gz
and.nii
-- they are special* as they are not to carry the header as.tsv
files do.The "specialty" extends into side-car .json files
.tsv
s we carry.json
file where each entry is a structured record describing that column BIDS 1: tabular-files ..tsv
files, as in the case with_beh.json
also carry metadata fields such asTaskName
alongside with columns descriptors -- thus possibly leading to collisions (snake_case is just a recommendation for column names, and overall making schema for such .json files a concoction of two aspects)._motion.tsv
are an exclusion to the rule (see more below) - they are headless,_channels.{tsv,json}
describe columns, and then_motion.json
contains extra metadata..tsv.gz
we carry.json
file with a dedicatedColumns
field with a list of header fields and in addition again optionally descriptors per each column (same possible collision) BIDS 1: compressed-tabular-filesIf I got it right (@effigies can correct) the header was excluded from
.tsv.gz
as "not readily readable". May be some folks remember also further details? IMHO argument is weak since it is just a matter of adequate abstraction of "file opener" like e.g. is done in Python. But even if we place that aspect aside I think we would benefit from a more harmonious approach, which only might require 1 extra check for validator:BIDS 2 proposal
.tsv
and.tsv.gz
should carry a header..gz
would only signal compression..tsv.gz
and.tsv
should be supported interchangeably across usessubjects.tsv
,sessions.tsv
etc) unless prohibitive in size (e.g.subjects.tsv
for 10000 subjects with 100 columns or smth like that).json
for either case of.tsv
or.tsv.gz
MAY describe columns withinColumns
field of the.json
which would be a dict containing records conforming current set of fields we reserve for .tsv files .json's but also adding 1 OPTIONAL field (but may be RECOMMENDED for .tsv.gz) -Index
which would provide ordering information.bids-validator
could easily ensure corresponding to the order in.tsv
or.tsv.gz
.Index
, ie. if dicts are ordered like now in Python.Cons
Columns
field. But it might actually be even simplification in some cases (e.g. those_beh.json
) where now they should "subselect" what to choose for descriptors and what for metadata..tsv.gz
through file abstraction would need to "build" index based onIndex
field . But it is really not a rocket sciencePros
.tsv
and.tsv.gz
so if someone obtains a long.tsv
and decides to compress it -- it would be just a matter of exactly that -- compression, without changing content (removing header). Would allow for simpler/generic code/handling.Columns
without any ambiguity (jsonschema or linkml model would be much easier to construct) and possibility of collision.The text was updated successfully, but these errors were encountered: