Skip to content

A dataset of 831 3D Multiphase CT exams of renal masses from UCSF.

License

Notifications You must be signed in to change notification settings

LarsonLab/UCSF-RMaC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UCSF RMaC: UCSF Renal Mass CT Dataset

Logo

This dataset provides a set of 831 3D Multiphase CT exams of renal masses from UCSF. Each exam includes an annotation of renal mass in the form of bounding boxes or polygon masks, and the pathology results from each renal mass that were obtained after surgery that serve as the ground-truth outcome. The purpose of this dataset is to support development of new algorithms to better distinguish aggressive from indolent disease based on non-invasive imaging.

The CT volumes were acquired at UCSF between 2002-2018 and only renal masses less than or equal to 7cm (T1 stage) were included. Each exam has an unenhanced CT volume and up to three contrast enhanced CT phases (arterial/corticomedullary, portal venous/nephrogenic, delayed/excretory). For each exam, the contrast enhanced CT volumes are registered to the unenhanced volume. For a minority of the exams, registration was unsuccessful, but these exams are still included for further investigation.

Data Access (In Progress)

The dataset is hosted on AWS S3. It can be found at the following URIs:

The dataset can be downloaded directly by clickling on the following URLs:

Alternatively, the dataset can be downloaded via the AWS CLI:

  1. Install AWS CLI.
  2. Copy using the S3 URI
aws s3 cp <URI>

File Structure of Dataset

All CT imaging data and associated metadata are organized in HDF5 container files named by patient ID (a 10 digit random alphanumeric code). A csv file is included as a key describing which phases are available for each subject and the registration status for each CT volume.

Within phase_reg_key.csv:

  • 0 = no volume
  • 1 = volume exists but is not registered to the unenhanced (noncon) volume
  • 2 = volume exists and is registered to the unenhanced (noncon) volume

The file structure:

.
├── 08FBroxzI6.hdf5
├── 0A87Rq5Hkl.hdf5
├── 0ByGP3oWJi.hdf5
├── 0cb2z7Hao2.hdf5
...
├── phase_reg_key.csv
...
├── Zu1bNdA2od.hdf5
├── ZYUz7t5hOn.hdf5
└── Zz99Ji2swU.hdf5

Within a HDF5 container file, the CT volumes are organized as follows:

└── Zz99Ji2swU.hdf5
   ├── attrs
   ├── arterial
   ├── delay
   ├── mask
   ├── noncon
   └── portven

The attributes includes selected metadata and image labels.

The HDF5 files can be read in Python using the H5py package. For example, to print the containers and atrributes and extract the unenhanced (noncon) CT volume in a HDF5 file:

import h5py
with h5py.File("Zz99Ji2swU.hdf5", "r") as hdf:
    print(f"HDF5 file datasets: {list(hdf.keys())}")
    print(f"HDF5 file attributes: {list(hdf.attrs.keys())}")
    noncon = hdf["noncon"][:]
    print(f"Shape of noncon volume: {noncon.shape}")

Output:

HDF5 file datasets: ['arterial', 'delay', 'mask', 'noncon', 'portven']
HDF5 file attributes: ['Manufacturer', 'PID', 'Patient Age', 'Patient Sex', 'arterial_pixdim', 'delay_pixdim', 'mask_pixdim', 'noncon_pixdim', 'pathology', 'pathology_grade', 'portven_pixdim', 'tumor_type']
Shape of noncon volume: (512, 512, 49)

Tutorials

Data Curation

Curation jupyter notebooks are collected in /curation and are numbered 01-07 to indicate each step of curation process.

A sample conda environment can be found in environment.yml

curation/utils.py -- contains utility functions for the curation steps

Contributors

Project initiation and leadership - Peder Larson, PhD, and Zhen Jane Wang, MD

Dataset Extraction - Sage Kramer, MD

Curation - Sage Kramer, MD, Sule Sahin, PhD, Samantha Jones, Ernesto Diaz

Data Management - Sule Sahin, PhD, Abhejit Rajagopal, PhD, Ernesto Diaz, Qing Dai

About

A dataset of 831 3D Multiphase CT exams of renal masses from UCSF.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published