[ENH] clarify that BIDS does not specify names of directories containing datasets #1734

Remi-Gau · 2024-03-20T22:25:27Z

closes Clarify that a rawdata directory is not a prescription #1519

codecov · 2024-03-20T22:29:00Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.93%. Comparing base (6d13c80) to head (8d00f7a).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1734   +/-   ##
=======================================
  Coverage   87.93%   87.93%           
=======================================
  Files          16       16           
  Lines        1351     1351           
=======================================
  Hits         1188     1188           
  Misses        163      163

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Remi-Gau · 2024-03-21T13:48:08Z

@yarikoptic pinging you here because the original issue was opened after a chat with you and @sappelhoff in Copenhagen last year.

yarikoptic · 2024-03-21T14:14:55Z

src/common-principles.md

+
+    Note that rawdata here is an arbitrarily-chosen name.
+    BIDS specifies the contents of a dataset,
+    not the names of directories containing datasets.


I forgot the details if I participated, but this particular sentence placed here is more confusing than helpful:

BIDS does specify names of the folders which are to contain (sub)datasets, i.e. derivatives, rawdata.

AFAIK it is only the top level folder of the dataset itself, if not contained within another BIDS dataset (like rawdata/) is what not specified by BIDS. If that is what was intended by this sentence, then it should be rephrased and I would not emphasize it as a note.

BIDS does specify names of the folders which are to contain (sub)datasets, i.e. derivatives, rawdata.

I think that's the point @sappelhoff and I were trying to make and it would be good to clarify in this PR.
From our reading rawdata has not been one of the recognized "top level" directory that is allowed by BIDS.

https://bids-specification.readthedocs.io/en/latest/common-principles.html#other-top-level-directories

Maybe I recall wrongly but I think you were arguing that one could have in a BIDS (presumably a derivative one) a rawdata folder that would be where the raw bids dataset would go.

In any case I also agree that the phrasing should be improved.

Interesting! Indeed, rawdata/ doesn't not make sense for raw BIDS dataset, since its presence and restriction that it should contain "BIDS dataset" implies that current dataset is a derivative dataset. But that is an entirely different aspect from the original sentence IMHO. Also it is not in the schema et all. Filed

rawdata/ is described in text but not part of the schema #1736

which could be addressed as part of this PR but feels like a separate issue or may be not and that is what underlying goal here -- to clarify that it is non-mandatory but "specified" for derivative datasets? And then separate sentence on the fact that overall name of the "not nested inside rawdata/" BIDS dataset is arbitrary?

rawdata/ has never been permitted in a BIDS dataset. It is used as an example where a BIDS dataset is placed alongside its sources and derivatives, rather than one nested within another. We could equally well say that the following are valid:

rawdata/ # Collection of BIDS datasets ds000001/ ds000002/ sourcedata/ # DICOM archive ds000001/ ds000002/ derivatives/ ds000001/ preprocessing/ analysis/ ds000002/ preprocessing/ analysis/

The point is that in the current example, my_dataset-1/rawdata is a BIDS dataset, and in this example rawdata/ds000001 and rawdata/ds000002 are BIDS datasets.

I don't know how better to word this, but I am describing the current state of what the validator applies to and what are outside the scope of BIDS. I'm fine with clarifying this, as it seems to be causing confusion, but I don't see what benefit there is to turning example text into a new term (and presumably doing something new with the validator...).

Strong agree with @effigies on this one. If anything I'd remove the rawdata all together from the spec to avoid the confusion and the need for an explanation.

I feel that we are converging!

@effigies is my_dataset-1 in above example a BIDS dataset?

then we simply should not talk about any "need" , "SHOULD" or any other opinion of what should or need to be in its subfolders! Moreover we never even describe "needs" - we use RFC2119, so that is "incorrect" too.

And I think that is the main confusion here!

We are in a BIDS specification document and right before here we were talking about components of a BIDS dataset (derivatives/, README etc) and then we just say that there is an "Alternative way". Well -- there is a universe of alternative non-BIDS ways! But if we are to talk about one here, we MUST talk only about an alternative BIDS way. So we could e.g. move rawdata/ under sourcedata/ and say that it can be that BIDS dataset (originally sourcedata/ is the top directory of source dataset, not listing of source datasets), and that my_dataset-1 remains implied to be a BIDS dataset.

Overall

I do not think we should talk about possible non-BIDS layouts here: adds confusion etc

we should make explicit that my_dataset-1 is a BIDS dataset (add dataset_description.json)

We can explicitly state that names of bids datasets themselves are nohow prescribed by BIDS per se (although there is a recommendation for derivatives/{pipeline}-{index}/ iirc)

rawdata/ should go away from top level if there is agreement that it doesn't belong to BIDS (and we close rawdata/ is described in text but not part of the schema #1736)

we could clarify that it (sourcedata/rawdata) is a convenient location for source bids datasets in bids-derivative datasets, but then it might as well be sourcedata/raw)

also related: Clarify/provide example/formalize on the fact that BIDS is suitable for "study level" dataset #1739

submitted

[FIX] Move rawdata/ into sourcedata/raw in alternative structure example, clarify on naming of datasets themselves #1741
with my take on this debate ;-)

edited: was incorrectly pasted issue link

Remi-Gau · 2024-04-11T11:50:04Z

close in favor of #1741

Update common-principles.md

8d00f7a

Remi-Gau requested a review from DimitriPapadopoulos as a code owner March 20, 2024 22:25

Remi-Gau changed the title ~~[ENH] clarify that BIDS specifies the contents of a dataset, not the names of directories containing datasets~~ [ENH] clarify that BIDS doe snot specify names of directories containing datasets Mar 20, 2024

Remi-Gau changed the title ~~[ENH] clarify that BIDS doe snot specify names of directories containing datasets~~ [ENH] clarify that BIDS does not specify names of directories containing datasets Mar 21, 2024

yarikoptic reviewed Mar 21, 2024

View reviewed changes

This was referenced Mar 21, 2024

rawdata/ is described in text but not part of the schema #1736

Closed

[FIX] Move rawdata/ into sourcedata/raw in alternative structure example, clarify on naming of datasets themselves #1741

Merged

Remi-Gau closed this Apr 11, 2024

sappelhoff deleted the Remi-Gau-patch-3 branch June 5, 2024 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] clarify that BIDS does not specify names of directories containing datasets #1734

[ENH] clarify that BIDS does not specify names of directories containing datasets #1734

Remi-Gau commented Mar 20, 2024

codecov bot commented Mar 20, 2024

Remi-Gau commented Mar 21, 2024

yarikoptic Mar 21, 2024

Remi-Gau Mar 21, 2024

yarikoptic Mar 21, 2024

effigies Mar 21, 2024

Remi-Gau Mar 21, 2024

yarikoptic Mar 21, 2024

effigies Mar 21, 2024

yarikoptic Mar 21, 2024 •

edited

Loading

yarikoptic Mar 21, 2024 •

edited

Loading

Remi-Gau commented Apr 11, 2024

[ENH] clarify that BIDS does not specify names of directories containing datasets #1734

[ENH] clarify that BIDS does not specify names of directories containing datasets #1734

Conversation

Remi-Gau commented Mar 20, 2024

codecov bot commented Mar 20, 2024

Codecov Report

Remi-Gau commented Mar 21, 2024

yarikoptic Mar 21, 2024

Choose a reason for hiding this comment

Remi-Gau Mar 21, 2024

Choose a reason for hiding this comment

yarikoptic Mar 21, 2024

Choose a reason for hiding this comment

effigies Mar 21, 2024

Choose a reason for hiding this comment

Remi-Gau Mar 21, 2024

Choose a reason for hiding this comment

yarikoptic Mar 21, 2024

Choose a reason for hiding this comment

effigies Mar 21, 2024

Choose a reason for hiding this comment

yarikoptic Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

yarikoptic Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Remi-Gau commented Apr 11, 2024

yarikoptic Mar 21, 2024 •

edited

Loading

yarikoptic Mar 21, 2024 •

edited

Loading