Skip to content

Commit

Permalink
Adding DICOM module as another WSI-Reader
Browse files Browse the repository at this point in the history
  • Loading branch information
FabianHoerst committed Apr 2, 2024
1 parent 0a6a231 commit 085dbee
Show file tree
Hide file tree
Showing 52 changed files with 986 additions and 242 deletions.
115 changes: 62 additions & 53 deletions PathoPatch.ipynb
Original file line number Diff line number Diff line change
@@ -1,48 +1,32 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyOSg5Tomy2ythze0d941UHb",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
"colab_type": "text",
"id": "view-in-github"
},
"source": [
"<a href=\"https://colab.research.google.com/github/TIO-IKIM/PathoPatcher/blob/main/PathoPatch.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# PathoPatch Example\n"
],
"metadata": {
"id": "GA0VTVQzkmEJ"
}
},
"source": [
"# PathoPatch Example\n"
]
},
{
"cell_type": "markdown",
"source": [
"### 1. Installation (OpenSlide, CuCIM, PathoPatch)"
],
"metadata": {
"id": "chE1_Uyxk4Lt"
}
},
"source": [
"### 1. Installation (OpenSlide, CuCIM, PathoPatch)"
]
},
{
"cell_type": "code",
Expand All @@ -56,8 +40,8 @@
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33m\r0% [Working]\u001b[0m\r \rHit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease\n",
"Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease\n",
Expand Down Expand Up @@ -87,21 +71,18 @@
},
{
"cell_type": "code",
"source": [
"!pip install openslide-python pathopatch"
],
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "WkCsLuQAj41e",
"outputId": "796bb818-c29c-4a22-df88-9eecebc897ff"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: openslide-python in /usr/local/lib/python3.10/site-packages (1.3.1)\n",
"Requirement already satisfied: pathopatch in /usr/local/lib/python3.10/site-packages (0.9.5.1b0)\n",
Expand Down Expand Up @@ -173,25 +154,25 @@
"\u001b[0m"
]
}
],
"source": [
"!pip install openslide-python pathopatch"
]
},
{
"cell_type": "code",
"source": [
"!pip install cucim"
],
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xdsM1hLnkC1D",
"outputId": "e9209dc9-7de8-44dc-b0a2-64fa1c355ec5"
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting cucim\n",
" Downloading cucim-23.10.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (43 kB)\n",
Expand All @@ -207,36 +188,34 @@
"\u001b[0m"
]
}
],
"source": [
"!pip install cucim"
]
},
{
"cell_type": "markdown",
"source": [
"## 2. Download files"
],
"metadata": {
"id": "MNTTsbCblo-Q"
}
},
"source": [
"## 2. Download files"
]
},
{
"cell_type": "code",
"source": [
"!mkdir wsi_data\n",
"!wget --directory-prefix ./wsi_data https://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/CMU-1-Small-Region.svs\n",
"!wget --directory-prefix ./wsi_data https://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/CMU-1.svs"
],
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ldwjSWvYmbic",
"outputId": "4f3994ce-6486-4c7b-9821-f767fcc98720"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"output_type": "stream",
"text": [
"mkdir: cannot create directory ‘wsi_data’: File exists\n",
"--2024-03-26 21:12:26-- https://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/CMU-1-Small-Region.svs\n",
Expand All @@ -260,16 +239,46 @@
"CMU-1.svs 36%[======> ] 61.51M 457KB/s eta 2m 52s "
]
}
],
"source": [
"!mkdir wsi_data\n",
"!wget --directory-prefix ./wsi_data https://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/CMU-1-Small-Region.svs\n",
"!wget --directory-prefix ./wsi_data https://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/CMU-1.svs"
]
},
{
"cell_type": "code",
"source": [],
"execution_count": null,
"metadata": {
"id": "6smBlRYcmgPQ"
},
"execution_count": null,
"outputs": []
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"DICOM-Files:\n",
"\n",
"whole folder must be provided, please use the wsi_extension .dcm for loading wsi, or provide a .csv file as a filelist with path to the dicom folder"
]
}
]
}
],
"metadata": {
"colab": {
"authorship_tag": "ABX9TyOSg5Tomy2ythze0d941UHb",
"include_colab_link": true,
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
27 changes: 23 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ We provide different use cases - Offline-Dataset (Store on Disk :floppy_disk:) a

In our Pre-Processing pipeline, we are able to extract quadratic patches from detected tissue areas, load annotation files (`.json`) and apply color normlizations. We make use of the popular [OpenSlide](https://openslide.org/) library, but extended it with the [RAPIDS cuCIM](https://github.com/rapidsai/cucim) framework for a speedup in patch-extraction.

> We support all OpenSlide file formats + .dcm-File format (DICOM), by utilizing [`wsidicom`](https://github.com/imi-bigpicture/wsidicom) and [`wsidicomizer`](https://github.com/imi-bigpicture/wsidicomizer).
**Explanations for use cases :floppy_disk: vs :zap:**
<details>
<summary>Offline-Dataset</summary>
Expand Down Expand Up @@ -369,10 +371,20 @@ In our Pre-Processing pipeline, we are able to extract quadratic patches from de
An example notebook is given [here](PathoPatch.ipynb):
<a href="https://colab.research.google.com/github/TIO-IKIM/PathoPatcher/blob/main/PathoPatch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
## Roadmap
- :construction: In-memory inference loader - This feature is currently under development. Once completed, it will allow a dataset to be loaded into memory for inference, eliminating the need to store it on disk. Useful for inference
### DICOM-conversion
To convert WSI-Files into DICOM-Format, please follow [this documentation](docs/DICOM.md)
- :soon: Dicom support - We plan to add another backend for handling DICOM files with a different structure
### Filelist with metadata
See here: [examples/filelist.csv](examples/filelist.csv)
```csv
path,slide_mpp,magnification
./test_database/input/WSI/CMU-1.svs,0.500,20
```
Only the path is enforced, other two cols are optional.
## Roadmap
- :construction: In-memory inference loader - This feature is currently under development - an unstable version is already online. Once completed, it will allow a dataset to be loaded into memory for inference, eliminating the need to store it on disk. Useful for inference
- :soon: More test cases
Expand Down Expand Up @@ -406,6 +418,13 @@ An example notebook is given [here](PathoPatch.ipynb):
pages="356--361",,
isbn="978-3-658-44037-4"
}
```
### Acknowledgement
For processing DICOM-files, this work relies on the IMI-Bigpicture [`wsidicom`](https://github.com/imi-bigpicture/wsidicom) and [`wsidicomizer`](https://github.com/imi-bigpicture/wsidicomizer) libraries, with the following acknowledgements:
```
>wsidicom: Copyright 2021 Sectra AB, licensed under Apache 2.0.
This project is part of a project that has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 945358. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. IMI website: <www.imi.europa.eu>
>wsidicomizer: Copyright 2021 Sectra AB, licensed under Apache 2.0.
This project is part of a project that has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 945358. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. IMI website: <www.imi.europa.eu>
57 changes: 57 additions & 0 deletions docs/DICOM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Convert WSI-Files to DICOM

## Basic cli-usage

```bash
usage: wsidicomizer [-h] -i INPUT [-o OUTPUT] [-t TILE_SIZE] [-m METADATA]
[-d DEFAULT_METADATA] [-l LEVELS [LEVELS ...]] [--label LABEL]
[--no-label] [--no-overview] [--no-confidential] [-w WORKERS]
[--chunk-size CHUNK_SIZE] [--format FORMAT] [--quality QUALITY]
[--subsampling SUBSAMPLING] [--offset-table OFFSET_TABLE]

Convert compatible wsi file to DICOM

options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Path to input wsi file.
-o OUTPUT, --output OUTPUT
Path to output folder. Folder will be created and must not
exist. If not specified a folder named after the input file is
created in the same path.
-t TILE_SIZE, --tile-size TILE_SIZE
Tile size (same for width and height). Required for ndpi and
openslide formats E.g. 512
-m METADATA, --metadata METADATA
Path to json metadata that will override metadata from source
image file.
-d DEFAULT_METADATA, --default-metadata DEFAULT_METADATA
Path to json metadata that will be used as default values.
-l LEVELS [LEVELS ...], --levels LEVELS [LEVELS ...]
Pyramid levels to include, if not all. E.g. 0 1 for base and
first pyramid layer.
--label LABEL Optional label image to use instead of label found in file.
--no-label If not to include label
--no-overview If not to include overview
--no-confidential If not to include confidential metadata
-w WORKERS, --workers WORKERS
Number of worker threads to use
--chunk-size CHUNK_SIZE
Number of tiles to give each worker at a time
--format FORMAT Encoding format to use if re-encoding. 'jpeg' or 'jpeg2000'.
--quality QUALITY Quality to use if re-encoding. It is recommended to not use >
95 for jpeg. Use < 1 or > 1000 for lossless jpeg2000.
--subsampling SUBSAMPLING
Subsampling option if using jpeg for re-encoding. Use '444'
for no subsampling, '422' for 2x1 subsampling, and '420' for
2x2 subsampling.
--offset-table OFFSET_TABLE
Offset table to use, 'bot' basic offset table, 'eot' extended
offset table, 'None' - no offset table.
```
## Acknowledgement for using WSIDICOMIZER
wsidicomizer: Copyright 2021 Sectra AB, licensed under Apache 2.0.
This project is part of a project that has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 945358. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. IMI website: <www.imi.europa.eu>
2 changes: 2 additions & 0 deletions environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,5 @@ dependencies:
- scikit-image==0.19.3
- torchvision==0.16.2
- tqdm==4.65.0
- wsidicomizer==0.13.2
- wsidicom==0.20.4
2 changes: 2 additions & 0 deletions examples/filelist.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
path,slide_mpp,magnification
./test_database/input/WSI/CMU-1.svs,0.500,20
7 changes: 4 additions & 3 deletions examples/patch_extraction.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ wsi_filelist: # Path to a csv-filelist with WSI files (separator
# used.Must include full paths to WSIs, including suffixes.Can be used as an replacement for
# the wsi_paths option.If both are provided, yields an error. [str] [Optional, defaults to None]
output_path: # Path to the folder where the resulting dataset should be stored [str]
wsi_extensions: # The extension of the WSI-files [str] [Optional, defaults to "svs"]
wsi_extension: # The extension of the WSI-files [str] [Optional, defaults to "svs"]

# basic setups
patch_size: # The size of the patches in pixel that will be retrieved from the WSI, e.g. 256 for 256px. [][Optional, defaults to 256]
Expand Down Expand Up @@ -76,5 +76,6 @@ filter_patches: # Post-extraction patch filtering to sort out arte
log_path: # Path where log files should be stored. Otherwise, log files are stored in the output folder. [str][Optional, defaults to None]
log_level: # Set the logging level. [str][Optional, defaults to info]
hardware_selection: # Select hardware device (just if available, otherwise always cucim). [str] [Optional, defaults to cucim]
wsi_magnification: # Manual WSI magnification, but just applies if metadata cannot be derived from OpenSlide (e.g., for .tiff files). [float][Optional, defaults to None]
wsi_mpp: # Manual WSI MPP, but just applies if metadata cannot be derived from OpenSlide (e.g., for .tiff files). [float][Optional, defaults to None]
wsi_properties: # If provided, the properties of the WSI are used for the extraction. [str][Optional, defaults to None]
magnifcation: # Manual WSI magnification, but just applies if metadata cannot be derived from OpenSlide (e.g., for .tiff files). [float][Optional, defaults to None]
slide_mpp: # Manual WSI MPP, but just applies if metadata cannot be derived from OpenSlide (e.g., for .tiff files). [float][Optional, defaults to None]
14 changes: 9 additions & 5 deletions pathopatch/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,11 @@ class PreProcessingConfig(BaseModel):
Args:
wsi_paths (str): Path to the folder where all WSI are stored or path to a single WSI-file.
output_path (str): Path to the folder where the resulting dataset should be stored.
wsi_extension (str, optional): The extension of the WSI-files. Defaults to "svs.
wsi_extension (str, optional): The extension of the WSI-files. Defaults to "svs".
wsi_filelist (str, optional): Path to a csv-filelist with WSI files (separator: `,`), if provided just these files are used. Must include full paths to WSIs, including suffixes.
Can be used as an replacement for the wsi_paths option. If both are provided, yields an error. Defaults to None.
Can be used as an replacement for the wsi_paths option. If both are provided, yields an error.
The path to the files should be written in a column named "path". Metadata for slide magnification and mpp can be provided in columns named 'slide_mpp' and 'magnification'.
Defaults to None.
patch_size (int, optional): The size of the patches in pixel that will be retrieved from the WSI, e.g. 256 for 256px. Defaults to 256.
patch_overlap (float, optional): The percentage amount pixels that should overlap between two different patches.
Please Provide as integer between 0 and 100, indicating overlap in percentage.
Expand Down Expand Up @@ -339,9 +341,11 @@ def __init__(self) -> None:
parser.add_argument(
"--wsi_filelist",
type=str,
help="Path to a csv-filelist with WSI files (separator: `,`), if provided just these files are used."
"Must include full paths to WSIs, including suffixes."
"Can be used as an replacement for the wsi_paths option."
help="Path to a csv-filelist with WSI files (separator: `,`), if provided just these files are used. "
"Must include full paths to WSIs, including suffixes. "
"Can be used as an replacement for the wsi_paths option. "
"The path to the files should be written in a column named `path`. "
"Metadata for slide magnification and mpp can be provided in columns named `slide_mpp` and `magnification`. "
"If both are provided, yields an error.",
)
parser.add_argument(
Expand Down
2 changes: 1 addition & 1 deletion pathopatch/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"vms",
"vmu",
"dcm",
] # mirax not tested yet
]
ANNOTATION_EXT: List[str] = ["json"]
LOGGING_EXT: List[str] = ["critical", "error", "warning", "info", "debug"]

Expand Down
Loading

0 comments on commit 085dbee

Please sign in to comment.