Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for the workshop #755

Merged
merged 4 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions workshops/i2k_2024/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@ If you want to learn more about the `micro_sam` napari plugin or python library
Please make sure to install the latest version of `micro_sam` before the workshop using `conda` (or `mamba`).
You can create a new environment and install it like this:
```bash
$ conda create -c conda-forge -n micro_sam python=3.11 natsort
$ conda activate micro_sam
$ conda install -c pytorch -c conda-forge "micro_sam>=1.1" "pytorch>=2.4" "protobuf<5" cpuonly
conda create -c conda-forge -n micro_sam python=3.11 natsort
conda activate micro_sam
conda install -c pytorch -c conda-forge "micro_sam>=1.1" "pytorch>=2.4" "protobuf<5" cpuonly
```
If you already have an installation of `micro_sam` please update it by running the last command in your respective environment. You can find more information about the installation [here](https://computational-cell-analytics.github.io/micro-sam/micro_sam.html#installation).

Expand All @@ -51,18 +51,18 @@ The image embeddings are necessary to run interactive segmentation. Computing th
To run the script you first need to use `git` to download this repository:

```bash
$ git clone https://github.com/computational-cell-analytics/micro-sam
git clone https://github.com/computational-cell-analytics/micro-sam
```
then go to this directory:

```bash
$ cd micro-sam/workshops/i2k_2024
cd micro-sam/workshops/i2k_2024
```

and download the precomputed embeddings:

```bash
$ python download_embeddings.py -e embeddings -d lucchi
python download_embeddings.py -e embeddings -d lucchi
```

### High-throughput Image Annotation
Expand All @@ -73,13 +73,13 @@ This annotation mode is well suited for generating annotations for 2D cell segme
We have prepared an example dataset for the workshop that you can use. It consists of 15 images from the [CellPose](https://www.cellpose.org/) dataset. You can download the data with the script `download_dataset.py`:

```bash
$ python download_datasets.py -i data -d cells
python download_datasets.py -i data -d cells
```

This will download the data to the folder `data/cells` with images stored in the subfolder `images` and segmentation masks in `masks`. After this you can start the image series annotation tool, either via the napari plugin (we will show this in the workshop) or via the command line:

```bash
$ micro_sam.image_series_annotator -i data/cells/images -o annotations/cells -e embeddings/cells/vit_b_lm -m vit_b_lm
micro_sam.image_series_annotator -i data/cells/images -o annotations/cells -e embeddings/cells/vit_b_lm -m vit_b_lm
```

Note: You can use `micro_sam` with different models: the original models from Segment Anything and models finetuned for different microscopy segmentation tasks by us.
Expand All @@ -93,17 +93,17 @@ You can use the [3D annotation tool](https://computational-cell-analytics.github

You can download the data with the script `download_dataset.py`:
```bash
$ python download_datasets.py -i data -d nuclei_3d
python download_datasets.py -i data -d nuclei_3d
```

After this please download the precomputed embeddings:
```bash
$ python download_embeddings.py -e embeddings -d nuclei_3d
python download_embeddings.py -e embeddings -d nuclei_3d
```

You can then start the 3d annotation tool, either via the napari plugin (we will show this in the workshop) or the command line:
```bash
$ micro_sam.annotator_3d -i data/nuclei_3d/images/X1.tif -e embeddings/nuclei_3d/vit_b_lm/embedseg_Mouse-Skull-Nuclei-CBG_train_X1.zarr -m vit_b_lm
micro_sam.annotator_3d -i data/nuclei_3d/images/X1.tif -e embeddings/nuclei_3d/vit_b_lm/embedseg_Mouse-Skull-Nuclei-CBG_train_X1.zarr -m vit_b_lm
```

Note: You can use `micro_sam` with different models: the original models from Segment Anything and models finetuned for different microscopy segmentation tasks by us.
Expand All @@ -119,19 +119,19 @@ We have prepared an example dataset for the workshop that you can use. It consis
You can download the data with the script `download_dataset.py`:

```bash
$ python download_datasets.py -i data -d volume_em
python download_datasets.py -i data -d volume_em
```

After this please download the precomputed embeddings:

```bash
$ python download_embeddings.py -e embeddings -d volume_em
python download_embeddings.py -e embeddings -d volume_em
```

You can then start the 3d annotation tool, either via the napari plugin (we will show this in the workshop) or the command line:

```bash
$ micro_sam.annotator_3d -i data/volume_em/images/train_data_membrane_02.tif -e embeddings/volume_em/vit_b/platynereis_membrane_train_data_membrane_02.zarr -m vit_b
micro_sam.annotator_3d -i data/volume_em/images/train_data_membrane_02.tif -e embeddings/volume_em/vit_b/platynereis_membrane_train_data_membrane_02.zarr -m vit_b
```

Note: You can use `micro_sam` with different models: the original models from Segment Anything and models finetuned for different microscopy segmentation tasks by us.
Expand All @@ -146,7 +146,7 @@ We provide an example notebook `finetune_sam.ipynb` and script `finetune_sam.py`

You can download the sample data by running:
```bash
$ python download_datasets.py -i data -d hpa
python download_datasets.py -i data -d hpa
```

Note: You need a GPU in order to finetune the model (finetuning on the CPU is possible but takes too long for the workshop).
Expand All @@ -168,9 +168,9 @@ You can use the command line to precompute embeddings for volumetric segmentatio
Here is the example script for pre-computing the embeddings on the [3D nucleus segmentation data](#3d-lm-segmentation).

```bash
$ micro_sam.precompute_embeddings -i data/nuclei_3d/images/X1.tif # Filepath where inputs are stored.
-m vit_b # You can provide name for a model of your choice (supported by 'micro-sam') (eg. 'vit_b_lm').
-e embeddings/vit_b/nuclei_3d_X1 # Filepath where computed embeddings will be stored.
micro_sam.precompute_embeddings -i data/nuclei_3d/images/X1.tif # Filepath where inputs are stored.
-m vit_b # You can provide name for a model of your choice (supported by 'micro-sam') (eg. 'vit_b_lm').
-e embeddings/vit_b/nuclei_3d_X1 # Filepath where computed embeddings will be stored.
```

You need to adapt the path to the data, choose the model you want to use (`vit_b`, `vit_b_lm`, `vit_b_em_organelles`) and adapt the path where the embeddings should be saved.
Expand Down
77 changes: 56 additions & 21 deletions workshops/i2k_2024/download_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,40 +6,54 @@
from torch_em.util.image import load_data


def _download_sample_data(data_dir, url, checksum, download):
def _download_sample_data(data_dir, url, checksum, download, downloader):
if os.path.exists(data_dir):
return

os.makedirs(data_dir, exist_ok=True)

zip_path = os.path.join(data_dir, "data.zip")
datasets.util.download_source(path=zip_path, url=url, download=download, checksum=checksum)

if downloader == "owncloud":
datasets.util.download_source(path=zip_path, url=url, download=download, checksum=checksum)
else:
datasets.util.download_source_gdrive(path=zip_path, url=url, download=download, checksum=checksum)

datasets.util.unzip(zip_path=zip_path, dst=data_dir)


def _get_cells_sample_data_paths(path, download):
def _get_cells_sample_data_paths(path, download, downloader):
data_dir = os.path.join(path, "cells")

url = "https://owncloud.gwdg.de/index.php/s/c96cyWc1PpLAPOn/download"
if downloader == "owncloud":
url = "https://owncloud.gwdg.de/index.php/s/c96cyWc1PpLAPOn/download"
else:
url = "https://drive.google.com/uc?export=download&id=1SVC5Zgsbq9V7gPJGOFLvhClbeMC-GT-O"

checksum = "5d6cb5bc67a2b48c862c200d2df3afdfe6703f9c21bc33a3dd13d2422a396897"

_download_sample_data(data_dir, url, checksum, download)
_download_sample_data(data_dir, url, checksum, download, downloader)

raw_paths = natsorted(glob(os.path.join(data_dir, "images", "*.png")))
label_paths = natsorted(glob(os.path.join(data_dir, "masks", "*.png")))

return raw_paths, label_paths


def _get_hpa_data_paths(path, split, download):
def _get_hpa_data_paths(path, split, download, downloader):
splits = ["train", "val", "test"]
assert split in splits, f"'{split}' is not a valid split."

data_dir = os.path.join(path, "hpa")
url = "https://owncloud.gwdg.de/index.php/s/IrzUcaMxQKVRLTs/download"

if downloader == "owncloud":
url = "https://owncloud.gwdg.de/index.php/s/IrzUcaMxQKVRLTs/download"
else:
url = "https://drive.google.com/uc?export=download&id=1EuBj2UkVTy2DRfKGltaFga5zuXHqVOvI"

checksum = "f2c41be1761cdd96635ee30bee9dcbdeda4ebe3ab3467ad410c28417d46cdaad"

_download_sample_data(data_dir, url, checksum, download)
_download_sample_data(data_dir, url, checksum, download, downloader)

raw_paths = natsorted(glob(os.path.join(data_dir, split, "images", "*.tif")))

Expand All @@ -50,39 +64,55 @@ def _get_hpa_data_paths(path, split, download):
return raw_paths, label_paths


def _get_nuclei_3d_data_paths(path, download):
def _get_nuclei_3d_data_paths(path, download, downloader):
data_dir = os.path.join(path, "nuclei_3d")
url = "https://owncloud.gwdg.de/index.php/s/QdibduvClGmruIV/download"

if downloader == "owncloud":
url = "https://owncloud.gwdg.de/index.php/s/QdibduvClGmruIV/download"
else:
url = "https://drive.google.com/uc?export=download&id=1rveZC4OKfC7eXQsX21MyrRb_VOy5MQ-X"

checksum = "551d2c55e0e5614ae21c03e75e7a0afb765b312cb569dd4c32d1d634d8798c91"
_download_sample_data(data_dir, url, checksum, download=download)

_download_sample_data(data_dir, url, checksum, download, downloader)
raw_paths = [os.path.join(data_dir, "images", "X1.tif")]
label_paths = [os.path.join(data_dir, "masks", "Y1.tif")]
return raw_paths, label_paths


def _get_volume_em_data_paths(path, download):
def _get_volume_em_data_paths(path, download, downloader):
data_dir = os.path.join(path, "volume_em")
url = "https://owncloud.gwdg.de/index.php/s/5CzsV6bsqX0kvSv/download"

if downloader == "owncloud":
url = "https://owncloud.gwdg.de/index.php/s/5CzsV6bsqX0kvSv/download"
else:
url = "https://drive.google.com/uc?export=download&id=1En2TX9M6aw3UtoZUuMl8otMs0nXzPsa6"

checksum = "e820e2a89ffb5d466fb4646945b8697269501cce18376f47b946c7773ede4653"
_download_sample_data(data_dir, url, checksum, download=download)

_download_sample_data(data_dir, url, checksum, download, downloader)
raw_paths = [os.path.join(data_dir, "images", "train_data_membrane_02.tif")]
label_paths = [os.path.join(data_dir, "masks", "train_data_membrane_02_labels.tif")]
return raw_paths, label_paths


def _get_dataset_paths(path, dataset_name, view=False):
def _get_dataset_paths(path, dataset_name, view=False, downloader="owncloud"):
if downloader not in ["owncloud", "drive"]:
raise ValueError(f"'{downloader}' is not a valid way to download.")

dataset_paths = {
# 2d LM dataset for cell segmentation
"cells": lambda: _get_cells_sample_data_paths(path=path, download=True),
"hpa": lambda: _get_hpa_data_paths(path=path, download=True, split="train"),
"cells": lambda: _get_cells_sample_data_paths(path=path, download=True, downloader=downloader),
"hpa": lambda: _get_hpa_data_paths(path=path, download=True, split="train", downloader=downloader),
# 3d LM dataset for nuclei segmentation
"nuclei_3d": lambda: _get_nuclei_3d_data_paths(path=path, download=True),
"nuclei_3d": lambda: _get_nuclei_3d_data_paths(path=path, download=True, downloader=downloader),
# 3d EM dataset for membrane segmentation
"volume_em": lambda: _get_volume_em_data_paths(path=path, download=True),
"volume_em": lambda: _get_volume_em_data_paths(path=path, download=True, downloader=downloader),
}

dataset_keys = {
"cells": [None, None],
"hpa": [None, None],
"nuclei_3d": [None, None],
"volume_em": [None, None]
}
Expand All @@ -100,7 +130,7 @@ def _get_dataset_paths(path, dataset_name, view=False):
)

paths = dataset_paths[dname]()
print(f"'{dataset_name}' is download at {path}.")
print(f"'{dname}' is downloaded at {path}.")

if view:
import napari
Expand Down Expand Up @@ -139,9 +169,14 @@ def main():
parser.add_argument(
"-v", "--view", action="store_true", help="Whether to view the downloaded data."
)
parser.add_argument(
"--downloader", type=str, default="owncloud",
help="The source of urls for downloading datasets. The available choices are 'owncloud' or 'drive'. "
"For downloading from drive, you need to install 'gdown' using 'conda install gdown==4.6.3'."
)
args = parser.parse_args()

_get_dataset_paths(path=args.input_path, dataset_name=args.dataset_name, view=args.view)
_get_dataset_paths(path=args.input_path, dataset_name=args.dataset_name, view=args.view, downloader=args.downloader)


if __name__ == "__main__":
Expand Down
57 changes: 44 additions & 13 deletions workshops/i2k_2024/download_embeddings.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
import os

from torch_em.data.datasets.util import download_source, unzip
from torch_em.data.datasets.util import download_source, unzip, download_source_gdrive


URLS = {
URLS_OWNCLOUD = {
"lucchi": {
"vit_b": "https://owncloud.gwdg.de/index.php/s/kQMA1B8L9LOvYrl/download",
"vit_b_em_organelles": "https://owncloud.gwdg.de/index.php/s/U8xs6moRg0cQhkS/download",
"vit_b_em_organelles": "https://owncloud.gwdg.de/index.php/s/a2ljJVsignmItHh/download",
},
"nuclei_3d": {
"vit_b": "https://owncloud.gwdg.de/index.php/s/EF9ZdMzYjDjl8fd/download",
Expand All @@ -18,10 +17,24 @@
},
}

URLS_DRIVE = {
"lucchi": {
"vit_b_em_organelles": "https://drive.google.com/uc?export=download&id=1Ls1lq3eLgmiSMmPmqJdBJAmRSA57w_Ga",
},
"nuclei_3d": {
"vit_b": "https://drive.google.com/uc?export=download&id=1aFkANRAqbkop2M3Df9zcZIct7Bab0jpA",
"vit_b_lm": "https://drive.google.com/uc?export=download&id=129JvneG3th9fFXxH4iQFAFIY7_VGlupu",
},
"volume_em": {
"vit_b": "https://drive.google.com/uc?export=download&id=1_4zhezz5PEX1kudPaEfxI8JfTd1AOSCd",
"vit_b_em_organelles": "https://drive.google.com/uc?export=download&id=1K_Az5ti-P215sHvI2dCoUKHpTFX17KK8",
},
}


CHECKSUMS = {
"lucchi": {
"vit_b": "e0d064765f1758a1a0823b2c02d399caa5cae0d8ac5a1e2ed96548a647717433",
"vit_b_em_organelles": "e0b5ab781c42e6f68b746fc056c918d56559ccaeedb4e4f2848b1e5e8f1bec58",
"vit_b_em_organelles": "8621591469a783c50a0fddbab1a0ff1bbfeb360f196069712960f70b1c03a9d3",
},
"nuclei_3d": {
"vit_b": "82f5351486e484dda5a3a327381458515c89da5dda8a48a0b1ab96ef10d23f02",
Expand All @@ -34,19 +47,27 @@
}


def _download_embeddings(embedding_dir, dataset_name):
def _download_embeddings(embedding_dir, dataset_name, downloader="owncloud"):
if downloader == "drive":
chosen_urls = URLS_DRIVE
elif downloader == "owncloud":
chosen_urls = URLS_OWNCLOUD
else:
raise ValueError(f"'{downloader}' is not a valid way to download.")

if dataset_name is None: # Download embeddings for all datasets.
dataset_names = list(URLS.keys())
dataset_names = list(chosen_urls.keys())
else: # Download embeddings for specific dataset.
dataset_names = [dataset_name]

for dname in dataset_names:
if dname not in URLS:
if dname not in chosen_urls:
raise ValueError(
f"'{dname}' does not have precomputed embeddings to download. Please choose from {list(URLS.keys())}."
f"'{dname}' does not have precomputed embeddings to download. "
f"Please choose from {list(chosen_urls.keys())}."
)

urls = URLS[dname]
urls = chosen_urls[dname]
checksums = CHECKSUMS[dname]

data_embedding_dir = os.path.join(embedding_dir, dname)
Expand All @@ -60,7 +81,12 @@ def _download_embeddings(embedding_dir, dataset_name):

checksum = checksums[name]
zip_path = os.path.join(data_embedding_dir, "embeddings.zip")
download_source(path=zip_path, url=url, download=True, checksum=checksum)

if downloader == "owncloud":
download_source(path=zip_path, url=url, download=True, checksum=checksum)
else:
download_source_gdrive(path=zip_path, url=url, download=True, checksum=checksum)

unzip(zip_path=zip_path, dst=data_embedding_dir)

print(f"The precompted embeddings for '{dname}' are downloaded at {data_embedding_dir}")
Expand All @@ -82,9 +108,14 @@ def main():
"By default, it downloads all the precomputed embeddings. Optionally, you can choose to download either of the "
"volumetric datasets: 'lucchi', 'nuclei_3d' or 'volume_em'."
)
parser.add_argument(
"--downloader", type=str, default="owncloud",
help="The source of urls for downloading embeddings. The available choices are 'owncloud' or 'drive'. "
"For downloading from drive, you need to install 'gdown' using 'conda install gdown==4.6.3'."
)
args = parser.parse_args()

_download_embeddings(embedding_dir=args.embedding_dir, dataset_name=args.dataset_name)
_download_embeddings(embedding_dir=args.embedding_dir, dataset_name=args.dataset_name, downloader=args.downloader)


if __name__ == "__main__":
Expand Down
Loading
Loading