Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add examples for dataset_conversion for TCIA DICOM SEG/RTSTRUCT datasets #2668

Open
kirbyju opened this issue Jan 18, 2025 · 3 comments
Open
Assignees

Comments

@kirbyju
Copy link

kirbyju commented Jan 18, 2025

Hi there!

In The Cancer Imaging Archive we host a large number of DICOM datasets that have images (CT, MR, PT, etc) with corresponding tumor and organ segmentations (RTSTRUCT, SEG). Would it be possible to create some examples that help people get from those DICOM datasets to something that nnUNet expects as input? I'd be happy to hop on a t-con to discuss some details if there is interest.

Best,
Justin

@seziegler
Copy link
Member

Hi Justin,

thanks for the suggestion!
There are several example dataset conversions here. Do you think they are sufficient?

Best,
Sebastian

@kirbyju
Copy link
Author

kirbyju commented Jan 20, 2025

Hi Sebastian,

I saw these, but they are all starting from some version of NIfTI data and that is not what comes out of our API when you download data. We're providing the raw DICOM data. I think it could save our 25-30k users that visit the site every month a lot of time if we collaborated on a notebook or conversion script that includes:

  1. Download the source DICOM images (CT/MR/etc) and segmentations (SEG or RTSTRUCT) of interest from the TCIA REST API. I have lots of examples for doing this in https://github.com/kirbyju/TCIA_Notebooks, but https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_REST_API_Downloads.ipynb and https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_Segmentations.ipynb are most applicable here.
  2. Analyze the DICOM metadata to extract the key info necessary for nnUNet inputs.
  3. Convert the DICOM to NIfTI and save the files with expected file name formats required by nnUNet.
    E.g. {CASE_IDENTIFIER}_{XXXX}.{FILE_ENDING} for images and {CASE_IDENTIFIER}.{FILE_ENDING} for segmentations. I think it should also retain the full DICOM metadata as a JSON companion file for each DICOM series in case the other metadata about scanner type, sequence type, patient demographics are useful.
  4. Use the info from the previous steps to automatically generate a dataset.json

Then every person whose intention was to use nnUNet after they download the data wouldn't waste time repeating these steps. Do you think that could help?

I was also wondering if perhaps in downloader function of https://pypi.org/project/tcia-utils/ I should consider adding a parameter like downloadSeries(data, format = 'nnUnet') where it automatically converts the DICOM data that's downloaded into this NIfTI + JSON data saved with the file name structure you're expecting. Do you think that would be useful?

Best,
Justin

@seziegler
Copy link
Member

Hi Justin,

That would be really helpful. Would you be willing to make a pull request with respective examples in the dataset_conversion folder? We would be happy to accept it!

Regarding the tcia_utils, having an nnunet format parameter would be very useful. It would make sense to distinguish between format = 'nnUnetv1' and format = 'nnUnetv2', to be more clear and also to be able to adapt to a possible v3 format in the future.

Best,
Sebastian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants