Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decompression preprocessing step to TotalSegmentator2D for more efficient slice loading #705

Merged
merged 8 commits into from
Nov 12, 2024

Conversation

nkaenzig
Copy link
Collaborator

@nkaenzig nkaenzig commented Nov 12, 2024

Closes #704

The issue for the slow data loading was due to the .gz compression used for the ct and mask .nii.gz files combined with reading individual slices. While reading the first few slices from compressed niftis using nibabel is fast, the deeper the slice index, the slower it gets, because each time you read a slice a sequential decompression of the file occurs. When unpacking the .nii.gz file beforehand, and then reading the .nii, reading deeper slices becomes much faster.
This is especially an issue for reading the data in 2D fashion, because if you do 3D you might read the whole CT only once, so you have to decompress only once.

This PR reduces the runtime for iterating over the complete dataset from 1h to 2-3min, using a torch dataloader with 16 workers.

@nkaenzig nkaenzig linked an issue Nov 12, 2024 that may be closed by this pull request
@nkaenzig nkaenzig marked this pull request as ready for review November 12, 2024 11:09
@nkaenzig nkaenzig enabled auto-merge (squash) November 12, 2024 14:08
@nkaenzig nkaenzig merged commit 50c90a3 into main Nov 12, 2024
6 checks passed
@nkaenzig nkaenzig deleted the optimize-total-segmentator-dataset branch November 12, 2024 14:16
@nkaenzig nkaenzig self-assigned this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow dataloader iterations for TotalSegmentator2D dataset
2 participants