Dev/main (#192)

janelia-cellmap · Mar 19, 2024 · 0b5d88e · 0b5d88e
2 parents 72358ef + 9496ae9
commit 0b5d88e
Show file tree

Hide file tree

Showing 9 changed files with 1,209 additions and 1,391 deletions.
diff --git a/dacapo/examples/distance_task/cosem_example.ipynb b/dacapo/examples/distance_task/cosem_example.ipynb
@@ -0,0 +1,260 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# First we need to create a config store to store our configurations\n",
+    "from dacapo.store.create_store import create_config_store\n",
+    "\n",
+    "config_store = create_config_store()\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " ## Datasplit\n",
+    " Where can you find your data? What format is it in? Does it need to be normalized? What data do you want to use for validation?\n",
+    " We'll assume your data is in a zarr file, and that you have a raw and a ground truth dataset, all stored in your `runs_base_dir` as `example_{type}.zarr` where `{type}` is either `train` or `validate`.\n",
+    " NOTE: You may need to delete old config stores if you are re-running this cell with modifications to the configs. The config names are unique and will throw an error if you try to store a config with the same name as an existing config. For the `files` backend, you can delete the `runs_base_dir/configs` directory to remove all stored configs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dacapo.experiments.datasplits import DataSplitGenerator\n",
+    "from funlib.geometry import Coordinate\n",
+    "\n",
+    "input_resolution = Coordinate(8, 8, 8)\n",
+    "output_resolution = Coordinate(4, 4, 4)\n",
+    "datasplit_config = DataSplitGenerator.generate_from_csv(\n",
+    "    \"/misc/public/dacapo_learnathon/datasplit_csvs/cosem_example.csv\",\n",
+    "    input_resolution,\n",
+    "    output_resolution,\n",
+    ").compute()\n",
+    "\n",
+    "datasplit = datasplit_config.datasplit_type(datasplit_config)\n",
+    "viewer = datasplit._neuroglancer()\n",
+    "config_store.store_datasplit_config(datasplit_config)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " ## Task\n",
+    " What do you want to learn? An instance segmentation? If so, how? Affinities,\n",
+    " Distance Transform, Foreground/Background, etc. Each of these tasks are commonly learned\n",
+    " and evaluated with specific loss functions and evaluation metrics. Some tasks may\n",
+    " also require specific non-linearities or output formats from your model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dacapo.experiments.tasks import DistanceTaskConfig\n",
+    "\n",
+    "task_config = DistanceTaskConfig(\n",
+    "    name=\"cosem_distance_task_4nm\",\n",
+    "    channels=[\"mito\"],\n",
+    "    clip_distance=40.0,\n",
+    "    tol_distance=40.0,\n",
+    "    scale_factor=80.0,\n",
+    ")\n",
+    "config_store.store_task_config(task_config)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " ## Architecture\n",
+    "\n",
+    " The setup of the network you will train. Biomedical image to image translation often utilizes a UNet, but even after choosing a UNet you still need to provide some additional parameters. How much do you want to downsample? How many convolutional layers do you want?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dacapo.experiments.architectures import CNNectomeUNetConfig\n",
+    "\n",
+    "architecture_config = CNNectomeUNetConfig(\n",
+    "    name=\"upsample_unet\",\n",
+    "    input_shape=Coordinate(216, 216, 216),\n",
+    "    eval_shape_increase=Coordinate(72, 72, 72),\n",
+    "    fmaps_in=1,\n",
+    "    num_fmaps=12,\n",
+    "    fmaps_out=72,\n",
+    "    fmap_inc_factor=6,\n",
+    "    downsample_factors=[(2, 2, 2), (3, 3, 3), (3, 3, 3)],\n",
+    "    constant_upsample=True,\n",
+    "    upsample_factors=[(2, 2, 2)],\n",
+    ")\n",
+    "config_store.store_architecture_config(architecture_config)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " ## Trainer\n",
+    "\n",
+    " How do you want to train? This config defines the training loop and how the other three components work together. What sort of augmentations to apply during training, what learning rate and optimizer to use, what batch size to train with."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dacapo.experiments.trainers import GunpowderTrainerConfig\n",
+    "from dacapo.experiments.trainers.gp_augments import (\n",
+    "    ElasticAugmentConfig,\n",
+    "    GammaAugmentConfig,\n",
+    "    IntensityAugmentConfig,\n",
+    "    IntensityScaleShiftAugmentConfig,\n",
+    ")\n",
+    "\n",
+    "trainer_config = GunpowderTrainerConfig(\n",
+    "    name=\"cosem\",\n",
+    "    batch_size=1,\n",
+    "    learning_rate=0.0001,\n",
+    "    num_data_fetchers=20,\n",
+    "    augments=[\n",
+    "        ElasticAugmentConfig(\n",
+    "            control_point_spacing=[100, 100, 100],\n",
+    "            control_point_displacement_sigma=[10.0, 10.0, 10.0],\n",
+    "            rotation_interval=(0.0, 1.5707963267948966),\n",
+    "            subsample=8,\n",
+    "            uniform_3d_rotation=True,\n",
+    "        ),\n",
+    "        IntensityAugmentConfig(scale=(0.25, 1.75), shift=(-0.5, 0.35), clip=True),\n",
+    "        GammaAugmentConfig(gamma_range=(0.5, 2.0)),\n",
+    "        IntensityScaleShiftAugmentConfig(scale=2.0, shift=-1.0),\n",
+    "    ],\n",
+    "    snapshot_interval=10000,\n",
+    "    min_masked=0.05,\n",
+    "    clip_raw=True,\n",
+    ")\n",
+    "config_store.store_trainer_config(trainer_config)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " ## Run\n",
+    " Now that we have our components configured, we just need to combine them into a run and start training. We can have multiple repetitions of a single set of configs in order to increase our chances of finding an optimum."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dacapo.experiments import RunConfig\n",
+    "from dacapo.experiments.run import Run\n",
+    "\n",
+    "start_config = None\n",
+    "\n",
+    "# Uncomment to start from a pretrained model\n",
+    "# start_config = StartConfig(\n",
+    "#     \"setup04\",\n",
+    "#     \"best\",\n",
+    "# )\n",
+    "\n",
+    "iterations = 2000\n",
+    "validation_interval = iterations // 2\n",
+    "repetitions = 1\n",
+    "for i in range(repetitions):\n",
+    "    run_config = RunConfig(\n",
+    "        name=\"cosem_distance_run_4nm\",\n",
+    "        # # NOTE: This is a template for the name of the run. You can customize it as you see fit.\n",
+    "        # name=(\"_\").join(\n",
+    "        #     [\n",
+    "        #         \"example\",\n",
+    "        #         \"scratch\" if start_config is None else \"finetuned\",\n",
+    "        #         datasplit_config.name,\n",
+    "        #         task_config.name,\n",
+    "        #         architecture_config.name,\n",
+    "        #         trainer_config.name,\n",
+    "        #     ]\n",
+    "        # )\n",
+    "        # + f\"__{i}\",\n",
+    "        datasplit_config=datasplit_config,\n",
+    "        task_config=task_config,\n",
+    "        architecture_config=architecture_config,\n",
+    "        trainer_config=trainer_config,\n",
+    "        num_iterations=iterations,\n",
+    "        validation_interval=validation_interval,\n",
+    "        repetition=i,\n",
+    "        start_config=start_config,\n",
+    "    )\n",
+    "\n",
+    "    print(run_config.name)\n",
+    "    config_store.store_run_config(run_config)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " ## Train\n",
+    " To train one of the runs, you can either do it by first creating a **Run** directly from the run config\n",
+    " NOTE: The run stats are stored in the `runs_base_dir/stats` directory. You can delete this directory to remove all stored stats if you want to re-run training. Otherwise, the stats will be appended to the existing files, and the run won't start from scratch. This may cause errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dacapo.train import train_run\n",
+    "from dacapo.experiments.run import Run\n",
+    "from dacapo.store.create_store import create_config_store\n",
+    "\n",
+    "config_store = create_config_store()\n",
+    "\n",
+    "run = Run(config_store.retrieve_run_config(\"cosem_distance_run_4nm\"))\n",
+    "train_run(run)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/dacapo/examples/distance_task/cosem_example.py b/dacapo/examples/distance_task/cosem_example.py
@@ -1,3 +1,4 @@
+
 # %% [markdown]
 # # Dacapo
 #
@@ -77,7 +78,9 @@
 input_resolution = Coordinate(8, 8, 8)
 output_resolution = Coordinate(4, 4, 4)
 datasplit_config = DataSplitGenerator.generate_from_csv(
-    "cosem_example.csv", input_resolution, output_resolution
+    "/misc/public/dacapo_learnathon/datasplit_csvs/cosem_example.csv",
+    input_resolution,
+    output_resolution,
 ).compute()
 
 datasplit = datasplit_config.datasplit_type(datasplit_config)
@@ -96,7 +99,7 @@
 
 task_config = DistanceTaskConfig(
     name="cosem_distance_task_4nm",
-    channels=["labels"],
+    channels=["mito"],
     clip_distance=40.0,
     tol_distance=40.0,
     scale_factor=80.0,
@@ -179,7 +182,7 @@
 # )
 
 iterations = 2000
-validation_interval = 50
+validation_interval = iterations // 2
 repetitions = 1
 for i in range(repetitions):
     run_config = RunConfig(
@@ -223,13 +226,3 @@
 
 run = Run(config_store.retrieve_run_config("cosem_distance_run_4nm"))
 train_run(run)
-
-# %% [markdown]
-# If you want to start your run on some compute cluster, you might want to use the command line interface: dacapo train -r {run_config.name}. This makes it particularly convenient to run on compute nodes where you can specify specific compute requirements.
-
-# # %%
-# from dacapo.validate import validate
-
-# # validate(run_config.name, iterations, num_workers=32)
-# validate("cosem_distance_run", 1500, num_workers=10)
-# # %%