From a51c17ed40edaca236246eeecfd5af445b71b2ca Mon Sep 17 00:00:00 2001 From: Constantin Pape Date: Mon, 21 Oct 2024 13:21:14 +0200 Subject: [PATCH 1/4] Updates for the workshop --- workshops/i2k_2024/README.md | 36 +++++++++++------------ workshops/i2k_2024/download_embeddings.py | 6 ++-- 2 files changed, 20 insertions(+), 22 deletions(-) diff --git a/workshops/i2k_2024/README.md b/workshops/i2k_2024/README.md index aca1a9da..cf3db61d 100644 --- a/workshops/i2k_2024/README.md +++ b/workshops/i2k_2024/README.md @@ -37,9 +37,9 @@ If you want to learn more about the `micro_sam` napari plugin or python library Please make sure to install the latest version of `micro_sam` before the workshop using `conda` (or `mamba`). You can create a new environment and install it like this: ```bash -$ conda create -c conda-forge -n micro_sam python=3.11 natsort -$ conda activate micro_sam -$ conda install -c pytorch -c conda-forge "micro_sam>=1.1" "pytorch>=2.4" "protobuf<5" cpuonly +conda create -c conda-forge -n micro_sam python=3.11 natsort +conda activate micro_sam +conda install -c pytorch -c conda-forge "micro_sam>=1.1" "pytorch>=2.4" "protobuf<5" cpuonly ``` If you already have an installation of `micro_sam` please update it by running the last command in your respective environment. You can find more information about the installation [here](https://computational-cell-analytics.github.io/micro-sam/micro_sam.html#installation). @@ -51,18 +51,18 @@ The image embeddings are necessary to run interactive segmentation. Computing th To run the script you first need to use `git` to download this repository: ```bash -$ git clone https://github.com/computational-cell-analytics/micro-sam +git clone https://github.com/computational-cell-analytics/micro-sam ``` then go to this directory: ```bash -$ cd micro-sam/workshops/i2k_2024 +cd micro-sam/workshops/i2k_2024 ``` and download the precomputed embeddings: ```bash -$ python download_embeddings.py -e embeddings -d lucchi +python download_embeddings.py -e embeddings -d lucchi ``` ### High-throughput Image Annotation @@ -73,13 +73,13 @@ This annotation mode is well suited for generating annotations for 2D cell segme We have prepared an example dataset for the workshop that you can use. It consists of 15 images from the [CellPose](https://www.cellpose.org/) dataset. You can download the data with the script `download_dataset.py`: ```bash -$ python download_datasets.py -i data -d cells +python download_datasets.py -i data -d cells ``` This will download the data to the folder `data/cells` with images stored in the subfolder `images` and segmentation masks in `masks`. After this you can start the image series annotation tool, either via the napari plugin (we will show this in the workshop) or via the command line: ```bash -$ micro_sam.image_series_annotator -i data/cells/images -o annotations/cells -e embeddings/cells/vit_b_lm -m vit_b_lm +micro_sam.image_series_annotator -i data/cells/images -o annotations/cells -e embeddings/cells/vit_b_lm -m vit_b_lm ``` Note: You can use `micro_sam` with different models: the original models from Segment Anything and models finetuned for different microscopy segmentation tasks by us. @@ -93,17 +93,17 @@ You can use the [3D annotation tool](https://computational-cell-analytics.github You can download the data with the script `download_dataset.py`: ```bash -$ python download_datasets.py -i data -d nuclei_3d +python download_datasets.py -i data -d nuclei_3d ``` After this please download the precomputed embeddings: ```bash -$ python download_embeddings.py -e embeddings -d nuclei_3d +python download_embeddings.py -e embeddings -d nuclei_3d ``` You can then start the 3d annotation tool, either via the napari plugin (we will show this in the workshop) or the command line: ```bash -$ micro_sam.annotator_3d -i data/nuclei_3d/images/X1.tif -e embeddings/nuclei_3d/vit_b_lm/embedseg_Mouse-Skull-Nuclei-CBG_train_X1.zarr -m vit_b_lm +micro_sam.annotator_3d -i data/nuclei_3d/images/X1.tif -e embeddings/nuclei_3d/vit_b_lm/embedseg_Mouse-Skull-Nuclei-CBG_train_X1.zarr -m vit_b_lm ``` Note: You can use `micro_sam` with different models: the original models from Segment Anything and models finetuned for different microscopy segmentation tasks by us. @@ -119,19 +119,19 @@ We have prepared an example dataset for the workshop that you can use. It consis You can download the data with the script `download_dataset.py`: ```bash -$ python download_datasets.py -i data -d volume_em +python download_datasets.py -i data -d volume_em ``` After this please download the precomputed embeddings: ```bash -$ python download_embeddings.py -e embeddings -d volume_em +python download_embeddings.py -e embeddings -d volume_em ``` You can then start the 3d annotation tool, either via the napari plugin (we will show this in the workshop) or the command line: ```bash -$ micro_sam.annotator_3d -i data/volume_em/images/train_data_membrane_02.tif -e embeddings/volume_em/vit_b/platynereis_membrane_train_data_membrane_02.zarr -m vit_b +micro_sam.annotator_3d -i data/volume_em/images/train_data_membrane_02.tif -e embeddings/volume_em/vit_b/platynereis_membrane_train_data_membrane_02.zarr -m vit_b ``` Note: You can use `micro_sam` with different models: the original models from Segment Anything and models finetuned for different microscopy segmentation tasks by us. @@ -146,7 +146,7 @@ We provide an example notebook `finetune_sam.ipynb` and script `finetune_sam.py` You can download the sample data by running: ```bash -$ python download_datasets.py -i data -d hpa +python download_datasets.py -i data -d hpa ``` Note: You need a GPU in order to finetune the model (finetuning on the CPU is possible but takes too long for the workshop). @@ -168,9 +168,9 @@ You can use the command line to precompute embeddings for volumetric segmentatio Here is the example script for pre-computing the embeddings on the [3D nucleus segmentation data](#3d-lm-segmentation). ```bash -$ micro_sam.precompute_embeddings -i data/nuclei_3d/images/X1.tif # Filepath where inputs are stored. - -m vit_b # You can provide name for a model of your choice (supported by 'micro-sam') (eg. 'vit_b_lm'). - -e embeddings/vit_b/nuclei_3d_X1 # Filepath where computed embeddings will be stored. +micro_sam.precompute_embeddings -i data/nuclei_3d/images/X1.tif # Filepath where inputs are stored. + -m vit_b # You can provide name for a model of your choice (supported by 'micro-sam') (eg. 'vit_b_lm'). + -e embeddings/vit_b/nuclei_3d_X1 # Filepath where computed embeddings will be stored. ``` You need to adapt the path to the data, choose the model you want to use (`vit_b`, `vit_b_lm`, `vit_b_em_organelles`) and adapt the path where the embeddings should be saved. diff --git a/workshops/i2k_2024/download_embeddings.py b/workshops/i2k_2024/download_embeddings.py index 505100fc..8616f5c1 100644 --- a/workshops/i2k_2024/download_embeddings.py +++ b/workshops/i2k_2024/download_embeddings.py @@ -5,8 +5,7 @@ URLS = { "lucchi": { - "vit_b": "https://owncloud.gwdg.de/index.php/s/kQMA1B8L9LOvYrl/download", - "vit_b_em_organelles": "https://owncloud.gwdg.de/index.php/s/U8xs6moRg0cQhkS/download", + "vit_b_em_organelles": "https://owncloud.gwdg.de/index.php/s/a2ljJVsignmItHh/download", }, "nuclei_3d": { "vit_b": "https://owncloud.gwdg.de/index.php/s/EF9ZdMzYjDjl8fd/download", @@ -20,8 +19,7 @@ CHECKSUMS = { "lucchi": { - "vit_b": "e0d064765f1758a1a0823b2c02d399caa5cae0d8ac5a1e2ed96548a647717433", - "vit_b_em_organelles": "e0b5ab781c42e6f68b746fc056c918d56559ccaeedb4e4f2848b1e5e8f1bec58", + "vit_b_em_organelles": "8621591469a783c50a0fddbab1a0ff1bbfeb360f196069712960f70b1c03a9d3", }, "nuclei_3d": { "vit_b": "82f5351486e484dda5a3a327381458515c89da5dda8a48a0b1ab96ef10d23f02", From 13ca32cb6ad2a7fb1e43776c93c936578e4c0ab6 Mon Sep 17 00:00:00 2001 From: Anwai Archit Date: Mon, 21 Oct 2024 16:09:36 +0200 Subject: [PATCH 2/4] Make downloading flexible - extend support for download files from google drive --- workshops/i2k_2024/download_datasets.py | 77 ++++++++++++++++------- workshops/i2k_2024/download_embeddings.py | 51 ++++++++++++--- 2 files changed, 98 insertions(+), 30 deletions(-) diff --git a/workshops/i2k_2024/download_datasets.py b/workshops/i2k_2024/download_datasets.py index d10c8223..5cd01c6a 100644 --- a/workshops/i2k_2024/download_datasets.py +++ b/workshops/i2k_2024/download_datasets.py @@ -6,24 +6,33 @@ from torch_em.util.image import load_data -def _download_sample_data(data_dir, url, checksum, download): +def _download_sample_data(data_dir, url, checksum, download, downloader): if os.path.exists(data_dir): return os.makedirs(data_dir, exist_ok=True) zip_path = os.path.join(data_dir, "data.zip") - datasets.util.download_source(path=zip_path, url=url, download=download, checksum=checksum) + + if downloader == "owncloud": + datasets.util.download_source(path=zip_path, url=url, download=download, checksum=checksum) + else: + datasets.util.download_source_gdrive(path=zip_path, url=url, download=download, checksum=checksum) + datasets.util.unzip(zip_path=zip_path, dst=data_dir) -def _get_cells_sample_data_paths(path, download): +def _get_cells_sample_data_paths(path, download, downloader): data_dir = os.path.join(path, "cells") - url = "https://owncloud.gwdg.de/index.php/s/c96cyWc1PpLAPOn/download" + if downloader == "owncloud": + url = "https://owncloud.gwdg.de/index.php/s/c96cyWc1PpLAPOn/download" + else: + url = "https://drive.google.com/uc?export=download&id=1SVC5Zgsbq9V7gPJGOFLvhClbeMC-GT-O" + checksum = "5d6cb5bc67a2b48c862c200d2df3afdfe6703f9c21bc33a3dd13d2422a396897" - _download_sample_data(data_dir, url, checksum, download) + _download_sample_data(data_dir, url, checksum, download, downloader) raw_paths = natsorted(glob(os.path.join(data_dir, "images", "*.png"))) label_paths = natsorted(glob(os.path.join(data_dir, "masks", "*.png"))) @@ -31,15 +40,20 @@ def _get_cells_sample_data_paths(path, download): return raw_paths, label_paths -def _get_hpa_data_paths(path, split, download): +def _get_hpa_data_paths(path, split, download, downloader): splits = ["train", "val", "test"] assert split in splits, f"'{split}' is not a valid split." data_dir = os.path.join(path, "hpa") - url = "https://owncloud.gwdg.de/index.php/s/IrzUcaMxQKVRLTs/download" + + if downloader == "owncloud": + url = "https://owncloud.gwdg.de/index.php/s/IrzUcaMxQKVRLTs/download" + else: + url = "https://drive.google.com/uc?export=download&id=1EuBj2UkVTy2DRfKGltaFga5zuXHqVOvI" + checksum = "f2c41be1761cdd96635ee30bee9dcbdeda4ebe3ab3467ad410c28417d46cdaad" - _download_sample_data(data_dir, url, checksum, download) + _download_sample_data(data_dir, url, checksum, download, downloader) raw_paths = natsorted(glob(os.path.join(data_dir, split, "images", "*.tif"))) @@ -50,39 +64,55 @@ def _get_hpa_data_paths(path, split, download): return raw_paths, label_paths -def _get_nuclei_3d_data_paths(path, download): +def _get_nuclei_3d_data_paths(path, download, downloader): data_dir = os.path.join(path, "nuclei_3d") - url = "https://owncloud.gwdg.de/index.php/s/QdibduvClGmruIV/download" + + if downloader == "owncloud": + url = "https://owncloud.gwdg.de/index.php/s/QdibduvClGmruIV/download" + else: + url = "https://drive.google.com/uc?export=download&id=1rveZC4OKfC7eXQsX21MyrRb_VOy5MQ-X" + checksum = "551d2c55e0e5614ae21c03e75e7a0afb765b312cb569dd4c32d1d634d8798c91" - _download_sample_data(data_dir, url, checksum, download=download) + + _download_sample_data(data_dir, url, checksum, download, downloader) raw_paths = [os.path.join(data_dir, "images", "X1.tif")] label_paths = [os.path.join(data_dir, "masks", "Y1.tif")] return raw_paths, label_paths -def _get_volume_em_data_paths(path, download): +def _get_volume_em_data_paths(path, download, downloader): data_dir = os.path.join(path, "volume_em") - url = "https://owncloud.gwdg.de/index.php/s/5CzsV6bsqX0kvSv/download" + + if downloader == "owncloud": + url = "https://owncloud.gwdg.de/index.php/s/5CzsV6bsqX0kvSv/download" + else: + url = "https://drive.google.com/uc?export=download&id=1En2TX9M6aw3UtoZUuMl8otMs0nXzPsa6" + checksum = "e820e2a89ffb5d466fb4646945b8697269501cce18376f47b946c7773ede4653" - _download_sample_data(data_dir, url, checksum, download=download) + + _download_sample_data(data_dir, url, checksum, download, downloader) raw_paths = [os.path.join(data_dir, "images", "train_data_membrane_02.tif")] label_paths = [os.path.join(data_dir, "masks", "train_data_membrane_02_labels.tif")] return raw_paths, label_paths -def _get_dataset_paths(path, dataset_name, view=False): +def _get_dataset_paths(path, dataset_name, view=False, downloader="owncloud"): + if downloader not in ["owncloud", "drive"]: + raise ValueError(f"'{downloader}' is not a valid way to download.") + dataset_paths = { # 2d LM dataset for cell segmentation - "cells": lambda: _get_cells_sample_data_paths(path=path, download=True), - "hpa": lambda: _get_hpa_data_paths(path=path, download=True, split="train"), + "cells": lambda: _get_cells_sample_data_paths(path=path, download=True, downloader=downloader), + "hpa": lambda: _get_hpa_data_paths(path=path, download=True, split="train", downloader=downloader), # 3d LM dataset for nuclei segmentation - "nuclei_3d": lambda: _get_nuclei_3d_data_paths(path=path, download=True), + "nuclei_3d": lambda: _get_nuclei_3d_data_paths(path=path, download=True, downloader=downloader), # 3d EM dataset for membrane segmentation - "volume_em": lambda: _get_volume_em_data_paths(path=path, download=True), + "volume_em": lambda: _get_volume_em_data_paths(path=path, download=True, downloader=downloader), } dataset_keys = { "cells": [None, None], + "hpa": [None, None], "nuclei_3d": [None, None], "volume_em": [None, None] } @@ -100,7 +130,7 @@ def _get_dataset_paths(path, dataset_name, view=False): ) paths = dataset_paths[dname]() - print(f"'{dataset_name}' is download at {path}.") + print(f"'{dname}' is downloaded at {path}.") if view: import napari @@ -139,9 +169,14 @@ def main(): parser.add_argument( "-v", "--view", action="store_true", help="Whether to view the downloaded data." ) + parser.add_argument( + "--downloader", type=str, default="owncloud", + help="The source of urls for downloading datasets. The available choices are 'owncloud' or 'drive'. " + "For downloading from drive, you need to install 'gdown' using 'conda install gdown==4.6.3'." + ) args = parser.parse_args() - _get_dataset_paths(path=args.input_path, dataset_name=args.dataset_name, view=args.view) + _get_dataset_paths(path=args.input_path, dataset_name=args.dataset_name, view=args.view, downloader=args.downloader) if __name__ == "__main__": diff --git a/workshops/i2k_2024/download_embeddings.py b/workshops/i2k_2024/download_embeddings.py index 8616f5c1..4dcc44f8 100644 --- a/workshops/i2k_2024/download_embeddings.py +++ b/workshops/i2k_2024/download_embeddings.py @@ -1,9 +1,9 @@ import os -from torch_em.data.datasets.util import download_source, unzip +from torch_em.data.datasets.util import download_source, unzip, download_source_gdrive -URLS = { +URLS_OWNCLOUD = { "lucchi": { "vit_b_em_organelles": "https://owncloud.gwdg.de/index.php/s/a2ljJVsignmItHh/download", }, @@ -17,6 +17,21 @@ }, } +URLS_DRIVE = { + "lucchi": { + "vit_b_em_organelles": "https://drive.google.com/uc?export=download&id=1Ls1lq3eLgmiSMmPmqJdBJAmRSA57w_Ga", + }, + "nuclei_3d": { + "vit_b": "https://drive.google.com/uc?export=download&id=1aFkANRAqbkop2M3Df9zcZIct7Bab0jpA", + "vit_b_lm": "https://drive.google.com/uc?export=download&id=129JvneG3th9fFXxH4iQFAFIY7_VGlupu", + }, + "volume_em": { + "vit_b": "https://drive.google.com/uc?export=download&id=1_4zhezz5PEX1kudPaEfxI8JfTd1AOSCd", + "vit_b_em_organelles": "https://drive.google.com/uc?export=download&id=1K_Az5ti-P215sHvI2dCoUKHpTFX17KK8", + }, +} + + CHECKSUMS = { "lucchi": { "vit_b_em_organelles": "8621591469a783c50a0fddbab1a0ff1bbfeb360f196069712960f70b1c03a9d3", @@ -32,19 +47,27 @@ } -def _download_embeddings(embedding_dir, dataset_name): +def _download_embeddings(embedding_dir, dataset_name, downloader="owncloud"): + if downloader == "drive": + chosen_urls = URLS_DRIVE + elif downloader == "owncloud": + chosen_urls = URLS_OWNCLOUD + else: + raise ValueError(f"'{downloader}' is not a valid way to download.") + if dataset_name is None: # Download embeddings for all datasets. - dataset_names = list(URLS.keys()) + dataset_names = list(chosen_urls.keys()) else: # Download embeddings for specific dataset. dataset_names = [dataset_name] for dname in dataset_names: - if dname not in URLS: + if dname not in chosen_urls: raise ValueError( - f"'{dname}' does not have precomputed embeddings to download. Please choose from {list(URLS.keys())}." + f"'{dname}' does not have precomputed embeddings to download. " + f"Please choose from {list(chosen_urls.keys())}." ) - urls = URLS[dname] + urls = chosen_urls[dname] checksums = CHECKSUMS[dname] data_embedding_dir = os.path.join(embedding_dir, dname) @@ -58,7 +81,12 @@ def _download_embeddings(embedding_dir, dataset_name): checksum = checksums[name] zip_path = os.path.join(data_embedding_dir, "embeddings.zip") - download_source(path=zip_path, url=url, download=True, checksum=checksum) + + if downloader == "owncloud": + download_source(path=zip_path, url=url, download=True, checksum=checksum) + else: + download_source_gdrive(path=zip_path, url=url, download=True, checksum=checksum) + unzip(zip_path=zip_path, dst=data_embedding_dir) print(f"The precompted embeddings for '{dname}' are downloaded at {data_embedding_dir}") @@ -80,9 +108,14 @@ def main(): "By default, it downloads all the precomputed embeddings. Optionally, you can choose to download either of the " "volumetric datasets: 'lucchi', 'nuclei_3d' or 'volume_em'." ) + parser.add_argument( + "--downloader", type=str, default="owncloud", + help="The source of urls for downloading embeddings. The available choices are 'owncloud' or 'drive'. " + "For downloading from drive, you need to install 'gdown' using 'conda install gdown==4.6.3'." + ) args = parser.parse_args() - _download_embeddings(embedding_dir=args.embedding_dir, dataset_name=args.dataset_name) + _download_embeddings(embedding_dir=args.embedding_dir, dataset_name=args.dataset_name, downloader=args.downloader) if __name__ == "__main__": From ed53a9581d5fa816680811752c346cf1c999702a Mon Sep 17 00:00:00 2001 From: Anwai Archit Date: Mon, 21 Oct 2024 16:17:49 +0200 Subject: [PATCH 3/4] Update links in the nb --- workshops/i2k_2024/finetune_sam.ipynb | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/workshops/i2k_2024/finetune_sam.ipynb b/workshops/i2k_2024/finetune_sam.ipynb index 85a8a669..5d0ccc68 100644 --- a/workshops/i2k_2024/finetune_sam.ipynb +++ b/workshops/i2k_2024/finetune_sam.ipynb @@ -273,13 +273,24 @@ "# Download the data into a directory\n", "DATA_FOLDER = os.path.join(root_dir, \"hpa\")\n", "\n", - "URL = \"https://owncloud.gwdg.de/index.php/s/IrzUcaMxQKVRLTs/download\"\n", + "downloader = \"owncloud\" # Switch to 'drive' if the download fails at 'owncloud'\n", + "\n", + "if downloader == \"owncloud\":\n", + " URL = \"https://owncloud.gwdg.de/index.php/s/IrzUcaMxQKVRLTs/download\"\n", + "else:\n", + " URL = \"https://drive.google.com/uc?export=download&id=1EuBj2UkVTy2DRfKGltaFga5zuXHqVOvI\"\n", + "\n", "CHECKSUM = \"f2c41be1761cdd96635ee30bee9dcbdeda4ebe3ab3467ad410c28417d46cdaad\"\n", "\n", "os.makedirs(DATA_FOLDER, exist_ok=True)\n", "if not os.path.exists(os.path.join(DATA_FOLDER, \"train/images\")):\n", " zip_path = os.path.join(DATA_FOLDER, \"data.zip\")\n", - " datasets.util.download_source(path=zip_path, url=URL, download=True, checksum=CHECKSUM)\n", + "\n", + " if downloader == \"owncloud\":\n", + " datasets.util.download_source(path=zip_path, url=URL, download=True, checksum=CHECKSUM)\n", + " else:\n", + " datasets.util.download_source_gdrive(path=zip_path, url=URL, download=True, checksum=CHECKSUM)\n", + "\n", " datasets.util.unzip(zip_path=zip_path, dst=DATA_FOLDER)\n", "\n", "# Get filepaths to the image data.\n", From 8dadea14edf435d1e6e881e24ba8cb0e5642641a Mon Sep 17 00:00:00 2001 From: Anwai Archit Date: Mon, 21 Oct 2024 16:19:14 +0200 Subject: [PATCH 4/4] Add check for downloader methods --- workshops/i2k_2024/finetune_sam.ipynb | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/workshops/i2k_2024/finetune_sam.ipynb b/workshops/i2k_2024/finetune_sam.ipynb index 5d0ccc68..27584fb9 100644 --- a/workshops/i2k_2024/finetune_sam.ipynb +++ b/workshops/i2k_2024/finetune_sam.ipynb @@ -277,8 +277,12 @@ "\n", "if downloader == \"owncloud\":\n", " URL = \"https://owncloud.gwdg.de/index.php/s/IrzUcaMxQKVRLTs/download\"\n", - "else:\n", + "elif downloader == \"drive\":\n", " URL = \"https://drive.google.com/uc?export=download&id=1EuBj2UkVTy2DRfKGltaFga5zuXHqVOvI\"\n", + "else:\n", + " raise ValueError(\n", + " f\"'{downloader}' is not a valid way to download. Please choose either 'owncloud' or 'drive'.\"\n", + " )\n", "\n", "CHECKSUM = \"f2c41be1761cdd96635ee30bee9dcbdeda4ebe3ab3467ad410c28417d46cdaad\"\n", "\n",