diff --git a/02_Noise2Void/exercise.ipynb b/02_Noise2Void/exercise.ipynb new file mode 100644 index 0000000..a52cd59 --- /dev/null +++ b/02_Noise2Void/exercise.ipynb @@ -0,0 +1,845 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "# Noise2Void\n", + "\n", + "In the first exercise, we denoised images with CARE using supervised training. As \n", + "discussed during the lecture, ground-truth data is not always available in life \n", + "sciences. But no panic, Noise2Void is here to help!\n", + "\n", + "Indeed Noise2Void is a self-supervised algorithm, meaning that it trains on the data\n", + "itself and does not require clean images. The idea is to predict the value of a masked\n", + "pixels based on the information from the surrounding pixels. Two underlying hypothesis\n", + "allow N2V to work: the structures are continuous and the noise is pixel-independent, \n", + "that is to say the amount of noise in one pixel is independent from the amount of noise\n", + "in the surrounding pixels. Fortunately for us, it is very often the case in microscopy images!\n", + "\n", + "If N2V does not require pairs of noisy and clean images, then how does it train?\n", + "\n", + "First it selects random pixels in each patch, then it masks them. The masking is \n", + "not done by setting their value to 0 (which could disturb the network since it is an\n", + "unexpected value) but by replacing the value with that of one of the neighboring pixels.\n", + "\n", + "Then, the network is trained to predict the value of the masked pixels. Since the masked\n", + "value is different from the original value, the network needs to use the information\n", + "contained in all the pixels surrounding the masked pixel. If the noise is pixel-independent,\n", + "then the network cannot predict the amount of noise in the original pixel and it ends\n", + "up predicting a value close to the \"clean\", or denoised, value.\n", + "\n", + "In this notebook, we will use an existing library called [Careamics](https://careamics.github.io)\n", + "that includes N2V and other algorithms:\n", + "\n", + "

\n", + " \n", + "

\n", + "\n", + "\n", + "## References\n", + "\n", + "- Alexander Krull, Tim-Oliver Buchholz, and Florian Jug. \"[Noise2Void - learning denoising from single noisy images.](https://openaccess.thecvf.com/content_CVPR_2019/html/Krull_Noise2Void_-_Learning_Denoising_From_Single_Noisy_Images_CVPR_2019_paper.html)\" Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2019.\n", + "- Joshua Batson, and Loic Royer. \"[Noise2self: Blind denoising by self-supervision.](http://proceedings.mlr.press/v97/batson19a.html)\" International Conference on Machine Learning. PMLR, 2019." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "\n", + "

Objectives

\n", + " \n", + "- Understand how N2V masks pixels for training\n", + "- Learn how to use CAREamics to train N2V\n", + "- Think about pixel noise and noise correlation\n", + " \n", + "
\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Mandatory actions\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "
\n", + "Set your python kernel to 05_image_restoration\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import tifffile\n", + "\n", + "from careamics import CAREamist\n", + "from careamics.config import (\n", + " create_n2v_configuration,\n", + ")\n", + "from careamics.transforms import N2VManipulate\n", + "\n", + "%matplotlib inline" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + "\n", + "## Part 1 Visualize the masking algorithm\n", + "\n", + "In this first part, let's inspect how this pixel masking is done before training a N2V network!\n", + "\n", + "Before feeding patches to the network, a set of transformations, or augmentations, are \n", + "applied to them. For instance in microscopy, we usually apply random 90 degrees rotations\n", + "or flip the images. In Noise2Void, we apply one more transformation that replace random pixels\n", + "by a value from their surrounding.\n", + "\n", + "In CAREamics, the transformation is called `N2VManipulate`. It has different \n", + "parameters: `roi_size`, `masked_pixel_percentage` and `strategy`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define a patch size for this exercise\n", + "dummy_patch_size = 10\n", + "\n", + "# Define masking parameters\n", + "roi_size = 3\n", + "masked_pixel_percentage = 10\n", + "strategy = 'uniform'" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Task 1: Explore the N2VManipulate parameters

\n", + "\n", + "Can you understand what `roi_size` and `masked_pixel_percentage` do? What can go wrong if they are too small or too high?\n", + "\n", + "\n", + "Run the cell below to observe the effects!\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Create a dummy patch\n", + "patch = np.arange(dummy_patch_size**2).reshape(dummy_patch_size, dummy_patch_size)\n", + "\n", + "# The pixel manipulator expects a channel dimension, so we need to add it to the patch\n", + "patch = patch[np.newaxis]\n", + "\n", + "# Instantiate the pixel manipulator\n", + "manipulator = N2VManipulate(\n", + " roi_size=roi_size,\n", + " masked_pixel_percentage=masked_pixel_percentage,\n", + " strategy=strategy,\n", + ")\n", + "\n", + "# And apply it\n", + "masked_patch, original_patch, mask = manipulator(patch)\n", + "\n", + "# Visualize the masked patch and the mask\n", + "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(masked_patch[0])\n", + "ax[0].title.set_text(\"Manipulated patch\")\n", + "ax[1].imshow(mask[0], cmap=\"gray\")\n", + "ax[1].title.set_text(\"Mask\")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Questions: Noise2Void masking strategy

\n", + "\n", + "\n", + "So what's really happening on a technical level? \n", + "\n", + "In the basic setting N2V algorithm replaces certain pixels with the values from the vicinity\n", + "Other masking stategies also exist, e.g. median, where replacement value is the median off all the pixels inside the region of interest.\n", + "\n", + "Feel free to play around with the ROI size, patch size and masked pixel percentage parameters\n", + "\n", + "
\n", + "\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Checkpoint 1: What is N2V really doing?

\n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "## Part 2: Prepare the data\n", + "\n", + "Now that we understand how the masking works, let's train a Noise2Void network! We will\n", + "use a scanning electron microscopy image (SEM)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define the paths\n", + "root_path = Path(\"./../data\")\n", + "root_path = root_path / \"denoising-N2V_SEM.unzip\"\n", + "assert root_path.exists(), f\"Path {root_path} does not exist\"\n", + "\n", + "train_images_path = root_path / \"train.tif\"\n", + "validation_images_path = root_path / \"validation.tif\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Visualize training data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Load images\n", + "train_image = tifffile.imread(train_images_path)\n", + "print(f\"Train image shape: {train_image.shape}\")\n", + "plt.imshow(train_image, cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Visualize validation data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "val_image = tifffile.imread(validation_images_path)\n", + "print(f\"Validation image shape: {val_image.shape}\")\n", + "plt.imshow(val_image, cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 3: Create a configuration\n", + "\n", + "CAREamics can be configured either from a yaml file, or with an explicitly created config object.\n", + "In this note book we will create the config object using helper functions. CAREamics will \n", + "validate all the parameters and will output explicit error if some parameters or a combination of parameters isn't allowed. It will also provide default values for missing parameters.\n", + "\n", + "The helper function limits the parameters to what is relevant for N2V, here is a break down of these parameters:\n", + "\n", + "- `experiment_name`: name used to identify the experiment\n", + "- `data_type`: data type, in CAREamics it can only be `tiff` or `array` \n", + "- `axes`: axes of the data, here it would be `YX`. Remember: pytorch and numpy order axes in reverse of what you might be used to. If the data were 3D, the axes would be `ZYX`.\n", + "- `patch_size`: size of the patches used for training\n", + "- `batch_size`: size of each batch\n", + "- `num_epochs`: number of epochs\n", + "\n", + "\n", + "There are also optional parameters, for more fine grained details:\n", + "\n", + "- `use_augmentations`: whether to use augmentations (flip and rotation)\n", + "- `use_n2v2`: whether to use N2V2, a N2V variant (see optional exercise)\n", + "- `n_channels`: the number of channels \n", + "- `roi_size`: size of the N2V manipulation region (remember that parameter?)\n", + "- `masked_pixel_percentage`: percentage of pixels to mask\n", + "- `logger`: which logger to use\n", + "\n", + "\n", + "Have a look at the [documentation](https://careamics.github.io) to see the full list of parameters and \n", + "their use!\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create a configuration using the helper function\n", + "training_config = create_n2v_configuration(\n", + " experiment_name=\"dl4mia_n2v_sem\",\n", + " data_type=\"tiff\",\n", + " axes=\"YX\",\n", + " patch_size=[64, 64],\n", + " batch_size=128,\n", + " num_epochs=10,\n", + " roi_size=3,\n", + " masked_pixel_percentage=0.05,\n", + " logger=\"tensorboard\"\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Initialize the Model\n", + "\n", + "Let's instantiate the model with the configuration we just created. CAREamist is the main class of the library, it will handle creation of the data pipeline, the model, training and inference methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "careamist = CAREamist(source=training_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%load_ext tensorboard\n", + "%tensorboard --logdir logs/lightning_logs" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 4: Train\n", + "\n", + "Here, we need to specify the paths to training and validation data. We can point to a folder containing \n", + "the data or to a single file. If it fits in memory, then CAREamics will load everything and train on it. If it doesn't, then CAREamics will load the data file by file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "careamist.train(train_source=train_images_path, val_source=validation_images_path)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Task 2: Tensorboard

\n", + "\n", + "Remember the configuration? Didn't we set `logger` to `tensorboard`? Then we can visualize the loss curve!\n", + "\n", + "Explore the local folder next to this notebook and find where the logs are stored. Then, \n", + "start a tensorboard server and visualize the loss curve.\n", + "
\n", + "\n", + "

Question: N2V loss curve

\n", + "\n", + "Do you remember what the loss is in Noise2Void? What is the meaning of the loss curve in that case? Can\n", + "it be easily interpreted?\n", + "
\n", + "\n", + "

Checkpoint 2: Training Noise2Void

\n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "We trained, but how well did it do?" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 5. Prediction\n", + "\n", + "In order to predict on an image, we also need to specify the path. We also typically need\n", + "to cut the image into patches, predict on each patch and then stitch the patches back together.\n", + "\n", + "To make the process faster, we can choose bigger tiles than the patches used during training. By default CAREamics uses tiled prediction to handle large images. The tile size can be set via the `tile_size` parameter. Tile overlap is computed automatically based on the network architecture." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "preds = careamist.predict(source=train_images_path, tile_size=(256, 256))[0]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize predictions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Show the full image\n", + "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(train_image, cmap=\"gray\")\n", + "ax[1].imshow(preds.squeeze(), cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "

Question: Inspect the image closely

\n", + "\n", + "If you got a good result, try to inspect the image closely. For instance, the default\n", + "window we used for the close-up image:\n", + "\n", + "`y_start` = 200\n", + "\n", + "`y_end` = 450\n", + "\n", + "`x_start` = 600\n", + "\n", + "`x_end` = 850\n", + "\n", + "Do you see anything peculiar in the fine grained details? What could be the reason for that?\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + } + }, + "outputs": [], + "source": [ + "# Show a close up image\n", + "y_start = 200\n", + "y_end = 450\n", + "x_start = 600\n", + "x_end = 850\n", + "\n", + "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(train_image[y_start:y_end, x_start:x_end], cmap=\"gray\")\n", + "ax[1].imshow(preds.squeeze()[y_start:y_end, x_start:x_end], cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "

Question: Check the residuals

\n", + "\n", + "Compute the absolute difference between original and denoised image. What do you see? \n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + } + }, + "outputs": [], + "source": [ + "plt.imshow(preds.squeeze() - train_image, cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Task 4(Optional): Improving the results

\n", + "\n", + "CAREamics configuration won't allow you to use parameters which are clearly wrong. However, there are many parameters that can be tuned to improve the results. Try to play around with the `roi_size` and `masked_pixel_percentage` and see if you can improve the results.\n", + "\n", + "Do the fine-grained structures observed in task 5 disappear?\n", + "\n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### How to predict without training?\n", + "\n", + "Here again, CAREamics provides a way to create a CAREamist from a checkpoint only,\n", + "allowing predicting without having to retrain." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Instantiate a CAREamist from a checkpoint\n", + "other_careamist = CAREamist(source=\"checkpoints/last.ckpt\")\n", + "\n", + "# And predict\n", + "new_preds = other_careamist.predict(source=train_images_path, tile_size=(256, 256))[0]\n", + "\n", + "# Show the full image\n", + "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(train_image, cmap=\"gray\")\n", + "ax[1].imshow(new_preds.squeeze(), cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Checkpoint 3: Prediction

\n", + "
\n", + "\n", + "
\n", + "\n", + "## Part 6: Exporting the model\n", + "\n", + "Have you heard of the [BioImage Model Zoo](https://bioimage.io/#/)? It provides a format for FAIR AI models and allows\n", + "researchers to exchange and reproduce models. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Export model as BMZ\n", + "careamist.export_to_bmz(\n", + " path_to_archive=\"n2v_model.zip\",\n", + " input_array=train_image[:128, :128],\n", + " friendly_model_name=\"SEM_N2V\",\n", + " authors= [{\"name\": \"Jane\", \"affiliation\": \"Doe University\"}],\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "

Task 5: Train N2V(2) on a different dataset

\n", + "\n", + "As you remember from the lecture, N2V can only deal with the noise that is pixelwise independent. \n", + "\n", + "Use these cells to train on a different dataset: Mito Confocal, which has noise that is not pixelwise independent, but is spatially correlated. This will be loaded in the following cell.\n", + "\n", + "In the next cells we'll show you how the result of training a N2V model on this dataset looks.\n", + "\n", + "In the next exercise of the course we'll learn how to deal with this kind of noise! \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "mito_path = \"./../data/mito-confocal-lowsnr.tif\"\n", + "mito_image = tifffile.imread(mito_path)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Configure the model\n", + "mito_training_config = create_n2v_configuration(\n", + " experiment_name=\"dl4mia_n2v2_mito\",\n", + " data_type=\"array\",\n", + " axes=\"SYX\", # <-- we are adding S because we have a stack of images\n", + " patch_size=[64, 64],\n", + " batch_size=64,\n", + " num_epochs=10,\n", + " logger=\"tensorboard\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "careamist = CAREamist(source=mito_training_config)\n", + "careamist.train(\n", + " train_source=mito_image,\n", + " val_percentage=0.1\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "preds = careamist.predict(\n", + " source=mito_image[:1], # <-- we predict on a small subset\n", + " data_type=\"array\",\n", + " tile_size=(64, 64),\n", + ")[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the following cell, look closely at the denoising result of applying N2V to data with spatially correlated noise. Zoom in and see if you can find the horizontal artifacts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "vmin = np.percentile(mito_image, 1)\n", + "vmax = np.percentile(mito_image, 99)\n", + "\n", + "y_start = 0\n", + "y_end = 1024\n", + "x_start = 0\n", + "x_end = 1024\n", + "\n", + "# Feel free to play around with the visualization\n", + "_, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(preds[0, 0, 600:700, 300:400], vmin=vmin, vmax=vmax)\n", + "ax[0].title.set_text(\"Predicted\")\n", + "ax[1].imshow(mito_image[0, 600:700, 300:400], vmin=vmin, vmax=vmax)\n", + "ax[1].title.set_text(\"Original\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "

Checkpoint 4: Dealing with artifacts

\n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "

Take away questions

\n", + "\n", + "- Which is the best saved checkpoint for Noise2Void, the one at the end of the training or the one with lowest validation loss?\n", + "\n", + "- Is validation useful in Noise2Void?\n", + "\n", + "- We predicted on the same image we trained on, is that a good idea?\n", + "\n", + "- Can you reuse the model on another image?\n", + "\n", + "- Can you train on images with multiple channels? RGB images? Biological channels (GFP, RFP, DAPI)?\n", + "\n", + "- N2V training is unsupervised, how can you be sure that the training worked and is not hallucinating?\n", + "
\n", + "\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

End of the exercise

\n", + "
" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "05_image_restoration", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/02_Noise2Void/solution.ipynb b/02_Noise2Void/solution.ipynb new file mode 100755 index 0000000..8ff23ad --- /dev/null +++ b/02_Noise2Void/solution.ipynb @@ -0,0 +1,852 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "# Noise2Void\n", + "\n", + "In the first exercise, we denoised images with CARE using supervised training. As \n", + "discussed during the lecture, ground-truth data is not always available in life \n", + "sciences. But no panic, Noise2Void is here to help!\n", + "\n", + "Indeed Noise2Void is a self-supervised algorithm, meaning that it trains on the data\n", + "itself and does not require clean images. The idea is to predict the value of a masked\n", + "pixels based on the information from the surrounding pixels. Two underlying hypothesis\n", + "allow N2V to work: the structures are continuous and the noise is pixel-independent, \n", + "that is to say the amount of noise in one pixel is independent from the amount of noise\n", + "in the surrounding pixels. Fortunately for us, it is very often the case in microscopy images!\n", + "\n", + "If N2V does not require pairs of noisy and clean images, then how does it train?\n", + "\n", + "First it selects random pixels in each patch, then it masks them. The masking is \n", + "not done by setting their value to 0 (which could disturb the network since it is an\n", + "unexpected value) but by replacing the value with that of one of the neighboring pixels.\n", + "\n", + "Then, the network is trained to predict the value of the masked pixels. Since the masked\n", + "value is different from the original value, the network needs to use the information\n", + "contained in all the pixels surrounding the masked pixel. If the noise is pixel-independent,\n", + "then the network cannot predict the amount of noise in the original pixel and it ends\n", + "up predicting a value close to the \"clean\", or denoised, value.\n", + "\n", + "In this notebook, we will use an existing library called [Careamics](https://careamics.github.io)\n", + "that includes N2V and other algorithms:\n", + "\n", + "

\n", + " \n", + "

\n", + "\n", + "\n", + "## References\n", + "\n", + "- Alexander Krull, Tim-Oliver Buchholz, and Florian Jug. \"[Noise2Void - learning denoising from single noisy images.](https://openaccess.thecvf.com/content_CVPR_2019/html/Krull_Noise2Void_-_Learning_Denoising_From_Single_Noisy_Images_CVPR_2019_paper.html)\" Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2019.\n", + "- Joshua Batson, and Loic Royer. \"[Noise2self: Blind denoising by self-supervision.](http://proceedings.mlr.press/v97/batson19a.html)\" International Conference on Machine Learning. PMLR, 2019." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "\n", + "

Objectives

\n", + " \n", + "- Understand how N2V masks pixels for training\n", + "- Learn how to use CAREamics to train N2V\n", + "- Think about pixel noise and noise correlation\n", + " \n", + "
\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Mandatory actions\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "
\n", + "Set your python kernel to 05_image_restoration\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import tifffile\n", + "\n", + "from careamics import CAREamist\n", + "from careamics.config import (\n", + " create_n2v_configuration,\n", + ")\n", + "from careamics.transforms import N2VManipulate\n", + "\n", + "%matplotlib inline" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + "\n", + "## Part 1 Visualize the masking algorithm\n", + "\n", + "In this first part, let's inspect how this pixel masking is done before training a N2V network!\n", + "\n", + "Before feeding patches to the network, a set of transformations, or augmentations, are \n", + "applied to them. For instance in microscopy, we usually apply random 90 degrees rotations\n", + "or flip the images. In Noise2Void, we apply one more transformation that replace random pixels\n", + "by a value from their surrounding.\n", + "\n", + "In CAREamics, the transformation is called `N2VManipulate`. It has different \n", + "parameters: `roi_size`, `masked_pixel_percentage` and `strategy`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define a patch size for this exercise\n", + "dummy_patch_size = 10\n", + "\n", + "# Define masking parameters\n", + "roi_size = 3\n", + "masked_pixel_percentage = 10\n", + "strategy = 'uniform'" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Task 1: Explore the N2VManipulate parameters

\n", + "\n", + "Can you understand what `roi_size` and `masked_pixel_percentage` do? What can go wrong if they are too small or too high?\n", + "\n", + "\n", + "Run the cell below to observe the effects!\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Create a dummy patch\n", + "patch = np.arange(dummy_patch_size**2).reshape(dummy_patch_size, dummy_patch_size)\n", + "\n", + "# The pixel manipulator expects a channel dimension, so we need to add it to the patch\n", + "patch = patch[np.newaxis]\n", + "\n", + "# Instantiate the pixel manipulator\n", + "manipulator = N2VManipulate(\n", + " roi_size=roi_size,\n", + " masked_pixel_percentage=masked_pixel_percentage,\n", + " strategy=strategy,\n", + ")\n", + "\n", + "# And apply it\n", + "masked_patch, original_patch, mask = manipulator(patch)\n", + "\n", + "# Visualize the masked patch and the mask\n", + "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(masked_patch[0])\n", + "ax[0].title.set_text(\"Manipulated patch\")\n", + "ax[1].imshow(mask[0], cmap=\"gray\")\n", + "ax[1].title.set_text(\"Mask\")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Questions: Noise2Void masking strategy

\n", + "\n", + "\n", + "So what's really happening on a technical level? \n", + "\n", + "In the basic setting N2V algorithm replaces certain pixels with the values from the vicinity\n", + "Other masking stategies also exist, e.g. median, where replacement value is the median off all the pixels inside the region of interest.\n", + "\n", + "Feel free to play around with the ROI size, patch size and masked pixel percentage parameters\n", + "\n", + "
\n", + "\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Checkpoint 1: What is N2V really doing?

\n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "## Part 2: Prepare the data\n", + "\n", + "Now that we understand how the masking works, let's train a Noise2Void network! We will\n", + "use a scanning electron microscopy image (SEM)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define the paths\n", + "root_path = Path(\"./../data\")\n", + "root_path = root_path / \"denoising-N2V_SEM.unzip\"\n", + "assert root_path.exists(), f\"Path {root_path} does not exist\"\n", + "\n", + "train_images_path = root_path / \"train.tif\"\n", + "validation_images_path = root_path / \"validation.tif\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Visualize training data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Load images\n", + "train_image = tifffile.imread(train_images_path)\n", + "print(f\"Train image shape: {train_image.shape}\")\n", + "plt.imshow(train_image, cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Visualize validation data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "val_image = tifffile.imread(validation_images_path)\n", + "print(f\"Validation image shape: {val_image.shape}\")\n", + "plt.imshow(val_image, cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 3: Create a configuration\n", + "\n", + "CAREamics can be configured either from a yaml file, or with an explicitly created config object.\n", + "In this note book we will create the config object using helper functions. CAREamics will \n", + "validate all the parameters and will output explicit error if some parameters or a combination of parameters isn't allowed. It will also provide default values for missing parameters.\n", + "\n", + "The helper function limits the parameters to what is relevant for N2V, here is a break down of these parameters:\n", + "\n", + "- `experiment_name`: name used to identify the experiment\n", + "- `data_type`: data type, in CAREamics it can only be `tiff` or `array` \n", + "- `axes`: axes of the data, here it would be `YX`. Remember: pytorch and numpy order axes in reverse of what you might be used to. If the data were 3D, the axes would be `ZYX`.\n", + "- `patch_size`: size of the patches used for training\n", + "- `batch_size`: size of each batch\n", + "- `num_epochs`: number of epochs\n", + "\n", + "\n", + "There are also optional parameters, for more fine grained details:\n", + "\n", + "- `use_augmentations`: whether to use augmentations (flip and rotation)\n", + "- `use_n2v2`: whether to use N2V2, a N2V variant (see optional exercise)\n", + "- `n_channels`: the number of channels \n", + "- `roi_size`: size of the N2V manipulation region (remember that parameter?)\n", + "- `masked_pixel_percentage`: percentage of pixels to mask\n", + "- `logger`: which logger to use\n", + "\n", + "\n", + "Have a look at the [documentation](https://careamics.github.io) to see the full list of parameters and \n", + "their use!\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create a configuration using the helper function\n", + "training_config = create_n2v_configuration(\n", + " experiment_name=\"dl4mia_n2v_sem\",\n", + " data_type=\"tiff\",\n", + " axes=\"YX\",\n", + " patch_size=[64, 64],\n", + " batch_size=128,\n", + " num_epochs=10,\n", + " roi_size=11,\n", + " masked_pixel_percentage=0.2,\n", + " logger=\"tensorboard\"\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Initialize the Model\n", + "\n", + "Let's instantiate the model with the configuration we just created. CAREamist is the main class of the library, it will handle creation of the data pipeline, the model, training and inference methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "careamist = CAREamist(source=training_config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%load_ext tensorboard\n", + "%tensorboard --logdir logs/lightning_logs" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 4: Train\n", + "\n", + "Here, we need to specify the paths to training and validation data. We can point to a folder containing \n", + "the data or to a single file. If it fits in memory, then CAREamics will load everything and train on it. If it doesn't, then CAREamics will load the data file by file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "careamist.train(train_source=train_images_path, val_source=validation_images_path)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Task 2: Tensorboard

\n", + "\n", + "Remember the configuration? Didn't we set `logger` to `tensorboard`? Then we can visualize the loss curve!\n", + "\n", + "
\n", + "\n", + "

Question: N2V loss curve

\n", + "\n", + "Do you remember what the loss is in Noise2Void? What is the meaning of the loss curve in that case? Can\n", + "it be easily interpreted?\n", + "
\n", + "\n", + "

Checkpoint 2: Training Noise2Void

\n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "We trained, but how well did it do?" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 5. Prediction\n", + "\n", + "In order to predict on an image, we also need to specify the path. We also typically need\n", + "to cut the image into patches, predict on each patch and then stitch the patches back together.\n", + "\n", + "To make the process faster, we can choose bigger tiles than the patches used during training. By default CAREamics uses tiled prediction to handle large images. The tile size can be set via the `tile_size` parameter. Tile overlap is computed automatically based on the network architecture." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "preds = careamist.predict(source=train_images_path, tile_size=(256, 256))[0]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize predictions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Show the full image\n", + "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(train_image, cmap=\"gray\")\n", + "ax[1].imshow(preds.squeeze(), cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "

Question: Inspect the image closely

\n", + "\n", + "If you got a good result, try to inspect the image closely. For instance, the default\n", + "window we used for the close-up image:\n", + "\n", + "`y_start` = 200\n", + "\n", + "`y_end` = 450\n", + "\n", + "`x_start` = 600\n", + "\n", + "`x_end` = 850\n", + "\n", + "Do you see anything peculiar in the fine grained details? What could be the reason for that?\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + } + }, + "outputs": [], + "source": [ + "# Show a close up image\n", + "y_start = 200\n", + "y_end = 450\n", + "x_start = 600\n", + "x_end = 850\n", + "\n", + "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(train_image[y_start:y_end, x_start:x_end], cmap=\"gray\")\n", + "ax[1].imshow(preds.squeeze()[y_start:y_end, x_start:x_end], cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "

Question: Check the residuals

\n", + "\n", + "Compute the absolute difference between original and denoised image. What do you see? \n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + } + }, + "outputs": [], + "source": [ + "plt.imshow(preds.squeeze() - train_image, cmap=\"gray\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Task 4(Optional): Improving the results

\n", + "\n", + "CAREamics configuration won't allow you to use parameters which are clearly wrong. However, there are many parameters that can be tuned to improve the results. Try to play around with the `roi_size` and `masked_pixel_percentage` and see if you can improve the results.\n", + "\n", + "Do the fine-grained structures observed in during the closer look at the image disappear?\n", + "\n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### How to predict without training?\n", + "\n", + "Here again, CAREamics provides a way to create a CAREamist from a checkpoint only,\n", + "allowing predicting without having to retrain." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# Instantiate a CAREamist from a checkpoint\n", + "other_careamist = CAREamist(source=\"checkpoints/last.ckpt\")\n", + "\n", + "# And predict\n", + "new_preds = other_careamist.predict(source=train_images_path, tile_size=(256, 256))[0]\n", + "\n", + "# Show the full image\n", + "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(train_image, cmap=\"gray\")\n", + "ax[1].imshow(new_preds.squeeze(), cmap=\"gray\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "train_image[:128, :128].shape" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Checkpoint 3: Prediction

\n", + "
\n", + "\n", + "
\n", + "\n", + "## Part 6: Exporting the model\n", + "\n", + "Have you heard of the [BioImage Model Zoo](https://bioimage.io/#/)? It provides a format for FAIR AI models and allows\n", + "researchers to exchange and reproduce models. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Export model as BMZ\n", + "careamist.export_to_bmz(\n", + " path_to_archive=\"n2v_model.zip\",\n", + " input_array=train_image[:128, :128],\n", + " friendly_model_name=\"SEM_N2V\",\n", + " authors= [{\"name\": \"Jane\", \"affiliation\": \"Doe University\"}],\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "

Task 5: Train N2V(2) on a different dataset

\n", + "\n", + "As you remember from the lecture, N2V can only deal with the noise that is pixelwise independent. \n", + "\n", + "Use these cells to train on a different dataset: Mito Confocal, which has noise that is not pixelwise independent, but is spatially correlated. This will be loaded in the following cell.\n", + "\n", + "In the next cells we'll show you how the result of training a N2V model on this dataset looks.\n", + "\n", + "In the next exercise of the course we'll learn how to deal with this kind of noise! \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "mito_path = \"./../data/mito-confocal-lowsnr.tif\"\n", + "mito_image = tifffile.imread(mito_path)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Configure the model\n", + "mito_training_config = create_n2v_configuration(\n", + " experiment_name=\"dl4mia_n2v2_mito\",\n", + " data_type=\"array\",\n", + " axes=\"SYX\", # <-- we are adding S because we have a stack of images\n", + " patch_size=[64, 64],\n", + " batch_size=64,\n", + " num_epochs=10,\n", + " logger=\"tensorboard\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "careamist = CAREamist(source=mito_training_config)\n", + "careamist.train(\n", + " train_source=mito_image,\n", + " val_percentage=0.1\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "preds = careamist.predict(\n", + " source=mito_image[:1], # <-- we predict on a small subset\n", + " data_type=\"array\",\n", + " tile_size=(64, 64),\n", + ")[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the following cell, look closely at the denoising result of applying N2V to data with spatially correlated noise. Zoom in and see if you can find the horizontal artifacts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "vmin = np.percentile(mito_image, 1)\n", + "vmax = np.percentile(mito_image, 99)\n", + "\n", + "y_start = 0\n", + "y_end = 1024\n", + "x_start = 0\n", + "x_end = 1024\n", + "\n", + "# Feel free to play around with the visualization\n", + "_, ax = plt.subplots(1, 2, figsize=(10, 5))\n", + "ax[0].imshow(preds[0, 0, 600:700, 300:400], vmin=vmin, vmax=vmax)\n", + "ax[0].title.set_text(\"Predicted\")\n", + "ax[1].imshow(mito_image[0, 600:700, 300:400], vmin=vmin, vmax=vmax)\n", + "ax[1].title.set_text(\"Original\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "

Checkpoint 4: Dealing with artifacts

\n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "

Take away questions

\n", + "\n", + "- Which is the best saved checkpoint for Noise2Void, the one at the end of the training or the one with lowest validation loss?\n", + "\n", + "- Is validation useful in Noise2Void?\n", + "\n", + "- We predicted on the same image we trained on, is that a good idea?\n", + "\n", + "- Can you reuse the model on another image?\n", + "\n", + "- Can you train on images with multiple channels? RGB images? Biological channels (GFP, RFP, DAPI)?\n", + "\n", + "- N2V training is unsupervised, how can you be sure that the training worked and is not hallucinating?\n", + "
\n", + "\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

End of the exercise

\n", + "
" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "05_image_restoration", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}