Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GmSegChallenge2016 dataset BIDSification #199

Closed
wants to merge 11 commits into from

Conversation

valosekj
Copy link
Member

@valosekj valosekj commented Jan 2, 2023

This PR discusses the BIDSification of GmSegChallenge2016 dataset.

Input datasets (training and testing) and the BIDSified dataset are located in ~/extrassd1/janvalosek/GmSegChallenge2016.

Related to #195

Input dataset structure

  • training dataset - 10 subjects from each site (10 x 4 = 40 subjects). Each subject has:
    • image.nii.gz representing T2star.nii.gz image
    • four mask-r files with manual labels of WM (pixel value 2) and GM (pixel value 1)
    • levels.txt file
  • testing dataset - 10 subjects from each site (10 x 4 = 40 subjects). Each subject has:
    • image.nii.gz representing T2star.nii.gz image
    • levels.txt file
├── training-data-gm-sc-challenge-ismrm16-v20160302b
│   ├── license.txt
│   ├── site1-sc01-image.nii.gz
│   ├── site1-sc01-levels.txt
│   ├── site1-sc01-mask-r1.nii.gz
│   ├── site1-sc01-mask-r2.nii.gz
│   ├── site1-sc01-mask-r3.nii.gz
│   ├── site1-sc01-mask-r4.nii.gz
│   ├── ...
└── test-data-gm-sc-challenge-ismrm16-v20160401
    ├── license.txt
    ├── site1-sc11-image.nii.gz
    ├── site1-sc11-levels.txt
    ├── ...

Output dataset structure

├── code
│   └── curate_gmsegchallenge2016.py
├── dataset_description.json
├── derivatives
│   ├── dataset_description.json
│   └── manual_labels
│       ├── sub-epm001
│       │   └── anat
│       │       ├── sub-epm001_T2star_label-vertebral-levels.txt
│       │       ├── sub-epm001_T2star_label-GM_mask1.json
│       │       ├── sub-epm001_T2star_label-GM_mask1.nii.gz
│       │       ├── sub-epm001_T2star_label-GM_mask2.json
│       │       ├── sub-epm001_T2star_label-GM_mask2.nii.gz
...
│       │       ├── sub-epm001_T2star_label-SC_mask1.json
│       │       ├── sub-epm001_T2star_label-SC_mask1.nii.gz
...
│       │       ├── sub-epm001_T2star_label-WM_mask1.json
│       │       ├── sub-epm001_T2star_label-WM_mask1.nii.gz
...
├── LICENSE
├── participants.json
├── participants.tsv
├── sub-epm001
│   └── anat
│       ├── sub-epm001_T2star.json
│       └── sub-epm001_T2star.nii.gz
├── sub-epm002
│   └── anat
│       ├── sub-epm002_T2star.json
│       └── sub-epm002_T2star.nii.gz
...

SC, WM and GM segmentations

I re-created SC, WM and GM segmentations from the input manual labels contacting both WM (pixel value 2) and GM (pixel value 1) segmentations using sct_maths. Also, I created json sidecars for each segmentation containing the rater's name.

SCseg (-bin 0):

os.system('sct_maths -i ' + path_file_in + ' -bin 0 -o ' + path_file_out)
logger.info(f'Using {path_file_in} to create SCseg: {path_file_out}')
# Create a json sidecar
data_json = {
"Author": rater,
"Label": "SC-seg-manual"
}

GMseg (-uthr 1):

os.system('sct_maths -i ' + path_file_in + ' -uthr 1 -o ' + path_file_out)
logger.info(f'Using {path_file_in} to create GMseg: {path_file_out}')
# Create a json sidecar
data_json = {
"Author": rater,
"Label": "GM-seg-manual"
}

WMseg (-bin 1):

os.system('sct_maths -i ' + path_file_in + ' -bin 1 -o ' + path_file_out)
logger.info(f'Using {path_file_in} to create WMseg: {path_file_out}')
# Create a json sidecar
data_json = {
"Author": rater,
"Label": "WM-seg-manual"
}

Manual raters

Based on GM mask delineation section from Prados et al., 2017:

Rater 1 (MY) and rater 3 (GD), ...
Rater 2 (SMD) and 4 (BL) ...

I included the following rater's names in the json sidecars:

rater_to_name = {
1: 'Marios C. Yiannakas',
2: 'Sara M. Dupont',
3: 'Gergely David',
4: 'Bailey Lyttle'
}

Additional files

dataset_description.json, participants.json, participants.tsv, and README are attached to this PR to allow easy feedback. @jcohenadad could you please add the contact person and email conversion to the README?
LICENSE is a copy of the licence from the input dataset.

UPDATE 2023-01-03: The *-levels.txt files containing information about the vertebral levels were copied to derivatives/manual_labels. To be clear that these txt files contain info about vertebral levels and not about discs, I choose the following filename: *_T2star_label-vertebral-levels.txt, for example: sub-epm001_T2star_label-vertebral-levels.txt.

@jcohenadad
Copy link
Member

I have been considering the creation of vertebral levels nii files using sct_label_vertebrae or sct_labels_utils.

I'm not sure these data will be used to train DL model for disc labeling because the discs are poorly visible on these axial GRE data. So maybe it's ok to keep the disc labels as is.

@jcohenadad
Copy link
Member

@valosekj the PR is still in 'draft' mode but you requested a review from me. Is it ready for review?

@valosekj
Copy link
Member Author

valosekj commented Jan 3, 2023

I'm not sure these data will be used to train DL model for disc labeling because the discs are poorly visible on these axial GRE data. So maybe it's ok to keep the disc labels as is.

Okay. I kept the labeling as txt files and placed them under derivatives/manual_labels/xxx/anat/:

├── derivatives
│   └── manual_labels
│       ├── sub-epm001
│       │   └── anat
│       │       ├── sub-epm001_T2star_label-vertebral-levels.txt
│       │       ├── sub-epm001_T2star_label-GM_mask1.json
│       │       ├── sub-epm001_T2star_label-GM_mask1.nii.gz
...

To be clear that these txt files contain info about vertebral levels and not about discs, I choose the following filename: *_T2star_label-vertebral-levels.txt.

@valosekj
Copy link
Member Author

valosekj commented Jan 3, 2023

@valosekj the PR is still in 'draft' mode but you requested a review from me. Is it ready for review?

Sorry about that.
Based on #195 (comment), I renamed derivatives/labels to derivatives/manual_labels and got rid of the desc-manual.
Now, the PR is ready for review.

@valosekj valosekj marked this pull request as ready for review January 3, 2023 16:45
Copy link
Member

@jcohenadad jcohenadad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested the script to generate the separate labels but overall it looks good! thank you @valosekj 🙏

@valosekj
Copy link
Member Author

valosekj commented Jan 3, 2023

I haven't tested the script to generate the separate labels but overall it looks good! thank you @valosekj 🙏

Okay! Thank you. I will upload the dataset to git-annex. Then I will close this PR without merging and delete the branch.

Regarding the README; can I add Ferran Prados as a contact person?

@jcohenadad
Copy link
Member

Regarding the README; can I add Ferran Prados as a contact person?

👍

@valosekj
Copy link
Member Author

valosekj commented Jan 5, 2023

The dataset uploaded to git-annex --> closing this PR.

@valosekj valosekj closed this Jan 5, 2023
@valosekj valosekj deleted the jv/curate_gmsegchallenge2016 branch January 5, 2023 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants