-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] clarify that BIDS does not specify names of directories containing datasets #1734
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1734 +/- ##
=======================================
Coverage 87.93% 87.93%
=======================================
Files 16 16
Lines 1351 1351
=======================================
Hits 1188 1188
Misses 163 163 ☔ View full report in Codecov by Sentry. |
@yarikoptic pinging you here because the original issue was opened after a chat with you and @sappelhoff in Copenhagen last year. |
|
||
Note that rawdata here is an arbitrarily-chosen name. | ||
BIDS specifies the contents of a dataset, | ||
not the names of directories containing datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot the details if I participated, but this particular sentence placed here is more confusing than helpful:
BIDS does specify names of the folders which are to contain (sub)datasets, i.e. derivatives
, rawdata
.
AFAIK it is only the top level folder of the dataset itself, if not contained within another BIDS dataset (like rawdata/
) is what not specified by BIDS. If that is what was intended by this sentence, then it should be rephrased and I would not emphasize it as a note
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIDS does specify names of the folders which are to contain (sub)datasets, i.e.
derivatives
,rawdata
.
I think that's the point @sappelhoff and I were trying to make and it would be good to clarify in this PR.
From our reading rawdata
has not been one of the recognized "top level" directory that is allowed by BIDS.
Maybe I recall wrongly but I think you were arguing that one could have in a BIDS (presumably a derivative one) a rawdata
folder that would be where the raw bids dataset would go.
In any case I also agree that the phrasing should be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Indeed, rawdata/
doesn't not make sense for raw BIDS dataset, since its presence and restriction that it should contain "BIDS dataset" implies that current dataset is a derivative dataset. But that is an entirely different aspect from the original sentence IMHO. Also it is not in the schema et all. Filed
which could be addressed as part of this PR but feels like a separate issue or may be not and that is what underlying goal here -- to clarify that it is non-mandatory but "specified" for derivative datasets? And then separate sentence on the fact that overall name of the "not nested inside rawdata/
" BIDS dataset is arbitrary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rawdata/
has never been permitted in a BIDS dataset. It is used as an example where a BIDS dataset is placed alongside its sources and derivatives, rather than one nested within another. We could equally well say that the following are valid:
rawdata/ # Collection of BIDS datasets
ds000001/
ds000002/
sourcedata/ # DICOM archive
ds000001/
ds000002/
derivatives/
ds000001/
preprocessing/
analysis/
ds000002/
preprocessing/
analysis/
The point is that in the current example, my_dataset-1/rawdata
is a BIDS dataset, and in this example rawdata/ds000001
and rawdata/ds000002
are BIDS datasets.
I don't know how better to word this, but I am describing the current state of what the validator applies to and what are outside the scope of BIDS. I'm fine with clarifying this, as it seems to be causing confusion, but I don't see what benefit there is to turning example text into a new term (and presumably doing something new with the validator...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong agree with @effigies on this one. If anything I'd remove the rawdata all together from the spec to avoid the confusion and the need for an explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that we are converging!
@effigies is my_dataset-1
in above example a BIDS dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we simply should not talk about any "need" , "SHOULD" or any other opinion of what should or need to be in its subfolders! Moreover we never even describe "needs" - we use RFC2119, so that is "incorrect" too.
And I think that is the main confusion here!
We are in a BIDS specification document and right before here we were talking about components of a BIDS dataset (derivatives/
, README
etc) and then we just say that there is an "Alternative way". Well -- there is a universe of alternative non-BIDS ways! But if we are to talk about one here, we MUST talk only about an alternative BIDS way. So we could e.g. move rawdata/
under sourcedata/
and say that it can be that BIDS dataset (originally sourcedata/
is the top directory of source dataset, not listing of source datasets), and that my_dataset-1
remains implied to be a BIDS dataset.
Overall
- I do not think we should talk about possible non-BIDS layouts here: adds confusion etc
- we should make explicit that
my_dataset-1
is a BIDS dataset (adddataset_description.json
)
- we should make explicit that
- We can explicitly state that names of bids datasets themselves are nohow prescribed by BIDS per se (although there is a recommendation for
derivatives/{pipeline}-{index}/
iirc) rawdata/
should go away from top level if there is agreement that it doesn't belong to BIDS (and we closerawdata/
is described in text but not part of the schema #1736)- we could clarify that it (
sourcedata/rawdata
) is a convenient location for source bids datasets in bids-derivative datasets, but then it might as well besourcedata/raw
)
- we could clarify that it (
- also related: Clarify/provide example/formalize on the fact that BIDS is suitable for "study level" dataset #1739
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitted
- [FIX] Move
rawdata/
intosourcedata/raw
in alternative structure example, clarify on naming of datasets themselves #1741
with my take on this debate ;-)
edited: was incorrectly pasted issue link
close in favor of #1741 |
rawdata
directory is not a prescription #1519