add examples for dataset_conversion for TCIA DICOM SEG/RTSTRUCT datasets #2668

kirbyju · 2025-01-18T03:27:02Z

Hi there!

In The Cancer Imaging Archive we host a large number of DICOM datasets that have images (CT, MR, PT, etc) with corresponding tumor and organ segmentations (RTSTRUCT, SEG). Would it be possible to create some examples that help people get from those DICOM datasets to something that nnUNet expects as input? I'd be happy to hop on a t-con to discuss some details if there is interest.

Best,
Justin

seziegler · 2025-01-20T15:37:23Z

Hi Justin,

thanks for the suggestion!
There are several example dataset conversions here. Do you think they are sufficient?

Best,
Sebastian

kirbyju · 2025-01-20T16:51:00Z

Hi Sebastian,

I saw these, but they are all starting from some version of NIfTI data and that is not what comes out of our API when you download data. We're providing the raw DICOM data. I think it could save our 25-30k users that visit the site every month a lot of time if we collaborated on a notebook or conversion script that includes:

Download the source DICOM images (CT/MR/etc) and segmentations (SEG or RTSTRUCT) of interest from the TCIA REST API. I have lots of examples for doing this in https://github.com/kirbyju/TCIA_Notebooks, but https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_REST_API_Downloads.ipynb and https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_Segmentations.ipynb are most applicable here.
Analyze the DICOM metadata to extract the key info necessary for nnUNet inputs.
Convert the DICOM to NIfTI and save the files with expected file name formats required by nnUNet.
E.g. {CASE_IDENTIFIER}_{XXXX}.{FILE_ENDING} for images and {CASE_IDENTIFIER}.{FILE_ENDING} for segmentations. I think it should also retain the full DICOM metadata as a JSON companion file for each DICOM series in case the other metadata about scanner type, sequence type, patient demographics are useful.
Use the info from the previous steps to automatically generate a dataset.json

Then every person whose intention was to use nnUNet after they download the data wouldn't waste time repeating these steps. Do you think that could help?

I was also wondering if perhaps in downloader function of https://pypi.org/project/tcia-utils/ I should consider adding a parameter like downloadSeries(data, format = 'nnUnet') where it automatically converts the DICOM data that's downloaded into this NIfTI + JSON data saved with the file name structure you're expecting. Do you think that would be useful?

Best,
Justin

seziegler · 2025-01-22T09:57:10Z

Hi Justin,

That would be really helpful. Would you be willing to make a pull request with respective examples in the dataset_conversion folder? We would be happy to accept it!

Regarding the tcia_utils, having an nnunet format parameter would be very useful. It would make sense to distinguish between format = 'nnUnetv1' and format = 'nnUnetv2', to be more clear and also to be able to adapt to a possible v3 format in the future.

Best,
Sebastian

FabianIsensee assigned seziegler Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add examples for dataset_conversion for TCIA DICOM SEG/RTSTRUCT datasets #2668

add examples for dataset_conversion for TCIA DICOM SEG/RTSTRUCT datasets #2668

kirbyju commented Jan 18, 2025

seziegler commented Jan 20, 2025

kirbyju commented Jan 20, 2025

seziegler commented Jan 22, 2025

add examples for dataset_conversion for TCIA DICOM SEG/RTSTRUCT datasets #2668

add examples for dataset_conversion for TCIA DICOM SEG/RTSTRUCT datasets #2668

Comments

kirbyju commented Jan 18, 2025

seziegler commented Jan 20, 2025

kirbyju commented Jan 20, 2025

seziegler commented Jan 22, 2025