Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] clarify that BIDS does not specify names of directories containing datasets #1734

Closed
wants to merge 1 commit into from

Conversation

Remi-Gau
Copy link
Collaborator

@Remi-Gau Remi-Gau changed the title [ENH] clarify that BIDS specifies the contents of a dataset, not the names of directories containing datasets [ENH] clarify that BIDS doe snot specify names of directories containing datasets Mar 20, 2024
Copy link

codecov bot commented Mar 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.93%. Comparing base (6d13c80) to head (8d00f7a).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1734   +/-   ##
=======================================
  Coverage   87.93%   87.93%           
=======================================
  Files          16       16           
  Lines        1351     1351           
=======================================
  Hits         1188     1188           
  Misses        163      163           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Remi-Gau Remi-Gau changed the title [ENH] clarify that BIDS doe snot specify names of directories containing datasets [ENH] clarify that BIDS does not specify names of directories containing datasets Mar 21, 2024
@Remi-Gau
Copy link
Collaborator Author

@yarikoptic pinging you here because the original issue was opened after a chat with you and @sappelhoff in Copenhagen last year.


Note that rawdata here is an arbitrarily-chosen name.
BIDS specifies the contents of a dataset,
not the names of directories containing datasets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot the details if I participated, but this particular sentence placed here is more confusing than helpful:

BIDS does specify names of the folders which are to contain (sub)datasets, i.e. derivatives, rawdata.

AFAIK it is only the top level folder of the dataset itself, if not contained within another BIDS dataset (like rawdata/) is what not specified by BIDS. If that is what was intended by this sentence, then it should be rephrased and I would not emphasize it as a note.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIDS does specify names of the folders which are to contain (sub)datasets, i.e. derivatives, rawdata.

I think that's the point @sappelhoff and I were trying to make and it would be good to clarify in this PR.
From our reading rawdata has not been one of the recognized "top level" directory that is allowed by BIDS.

https://bids-specification.readthedocs.io/en/latest/common-principles.html#other-top-level-directories

Maybe I recall wrongly but I think you were arguing that one could have in a BIDS (presumably a derivative one) a rawdata folder that would be where the raw bids dataset would go.

In any case I also agree that the phrasing should be improved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Indeed, rawdata/ doesn't not make sense for raw BIDS dataset, since its presence and restriction that it should contain "BIDS dataset" implies that current dataset is a derivative dataset. But that is an entirely different aspect from the original sentence IMHO. Also it is not in the schema et all. Filed

which could be addressed as part of this PR but feels like a separate issue or may be not and that is what underlying goal here -- to clarify that it is non-mandatory but "specified" for derivative datasets? And then separate sentence on the fact that overall name of the "not nested inside rawdata/" BIDS dataset is arbitrary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rawdata/ has never been permitted in a BIDS dataset. It is used as an example where a BIDS dataset is placed alongside its sources and derivatives, rather than one nested within another. We could equally well say that the following are valid:

rawdata/  # Collection of BIDS datasets
    ds000001/
    ds000002/
sourcedata/  # DICOM archive
    ds000001/
    ds000002/
derivatives/
    ds000001/
        preprocessing/
        analysis/
    ds000002/
        preprocessing/
        analysis/

The point is that in the current example, my_dataset-1/rawdata is a BIDS dataset, and in this example rawdata/ds000001 and rawdata/ds000002 are BIDS datasets.

I don't know how better to word this, but I am describing the current state of what the validator applies to and what are outside the scope of BIDS. I'm fine with clarifying this, as it seems to be causing confusion, but I don't see what benefit there is to turning example text into a new term (and presumably doing something new with the validator...).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong agree with @effigies on this one. If anything I'd remove the rawdata all together from the spec to avoid the confusion and the need for an explanation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that we are converging!

@effigies is my_dataset-1 in above example a BIDS dataset?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

Copy link
Collaborator

@yarikoptic yarikoptic Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we simply should not talk about any "need" , "SHOULD" or any other opinion of what should or need to be in its subfolders! Moreover we never even describe "needs" - we use RFC2119, so that is "incorrect" too.

And I think that is the main confusion here!

We are in a BIDS specification document and right before here we were talking about components of a BIDS dataset (derivatives/, README etc) and then we just say that there is an "Alternative way". Well -- there is a universe of alternative non-BIDS ways! But if we are to talk about one here, we MUST talk only about an alternative BIDS way. So we could e.g. move rawdata/ under sourcedata/ and say that it can be that BIDS dataset (originally sourcedata/ is the top directory of source dataset, not listing of source datasets), and that my_dataset-1 remains implied to be a BIDS dataset.

Overall

Copy link
Collaborator

@yarikoptic yarikoptic Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitted

edited: was incorrectly pasted issue link

@Remi-Gau
Copy link
Collaborator Author

close in favor of #1741

@Remi-Gau Remi-Gau closed this Apr 11, 2024
@sappelhoff sappelhoff deleted the Remi-Gau-patch-3 branch June 5, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify that a rawdata directory is not a prescription
3 participants