From 1bc42378708b89d9c6627be2045c15a7cc37e214 Mon Sep 17 00:00:00 2001
From: Igor Tatarnikov <61896994+IgorTatarnikov@users.noreply.github.com>
Date: Tue, 4 Jun 2024 14:41:23 +0100
Subject: [PATCH] TensorFlow to PyTorch migration (#189)

* Changed cellfinder/troubleshooting/index.md and speed-up.md to refer to PyTorch instead of tensorflow

* Changed setting-up gpu instruction

* Added some extra information about installing PyTorch with GPU support

* Added suggested selections when installing from the PyTorch website

* References to trained models specify .keras extension not .h5
---
 .../cellfinder/troubleshooting/index.md       |  2 +-
 .../cellfinder/troubleshooting/speed-up.md    | 26 +++++-------------
 .../napari-plugin/training-the-network.md     |  2 +-
 docs/source/documentation/setting-up/gpu.md   | 27 ++++++-------------
 .../source/tutorials/cellfinder-retraining.md |  2 +-
 5 files changed, 17 insertions(+), 42 deletions(-)

diff --git a/docs/source/documentation/cellfinder/troubleshooting/index.md b/docs/source/documentation/cellfinder/troubleshooting/index.md
index 6b79e7a0..a96c51bd 100644
--- a/docs/source/documentation/cellfinder/troubleshooting/index.md
+++ b/docs/source/documentation/cellfinder/troubleshooting/index.md
@@ -25,7 +25,7 @@ For more details, please see the guide to [retraining the pre-trained network](/
 
 As `brainmapper` relies on a number of third party libraries, notably
 
-- [TensorFlow](https://www.tensorflow.org/),
+- [PyTorch](https://pytorch.org/),
 - [CUDA](https://developer.nvidia.com/cuda-zone),
 - [cuDNN](https://developer.nvidia.com/cudnn),
 
diff --git a/docs/source/documentation/cellfinder/troubleshooting/speed-up.md b/docs/source/documentation/cellfinder/troubleshooting/speed-up.md
index a6826520..8795d3db 100644
--- a/docs/source/documentation/cellfinder/troubleshooting/speed-up.md
+++ b/docs/source/documentation/cellfinder/troubleshooting/speed-up.md
@@ -47,40 +47,26 @@ Open a terminal (or Anaconda Prompt), start Python,
   python
 ```
 
-and check that tensorflow can use the GPU,
+and check that PyTorch can use the GPU,
 
 ```python
-import tensorflow as tf
-tf.test.is_gpu_available()
+import torch
+print(torch.cuda.is_available())
+print([(i, torch.cuda.get_device_properties(i)) for i in range(torch.cuda.device_count())])
 ```
 
 If you see something like the output below, then all is well.
 
 ```bash
-2019-06-26 10:51:34.697900: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
-2019-06-26 10:51:34.881183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
-name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
-pciBusID: 0000:2d:00.0
-totalMemory: 23.62GiB freeMemory: 504.25MiB
-2019-06-26 10:51:34.881217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
-2019-06-26 10:51:35.251465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
-2019-06-26 10:51:35.251505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
-2019-06-26 10:51:35.251511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
-2019-06-26 10:51:35.251729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 195 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:2d:00.0, compute capability: 7.5)
 True
+[(0, _CudaDeviceProperties(name='NVIDIA GeForce RTX 4080', major=8, minor=9, total_memory=16049MB, multi_processor_count=76))]
 ```
 
 If you see something like this:
 
 ```bash
-2020-05-11 12:02:11.891275: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
-2020-05-11 12:02:11.948022: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1992000000 Hz
-2020-05-11 12:02:11.949756: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ae9ffc5860 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
-2020-05-11 12:02:11.949823: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
-2020-05-11 12:02:11.954796: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
-2020-05-11 12:02:11.954847: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
-2020-05-11 12:02:11.954894: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (hostname): /proc/driver/nvidia/version does not exist
 False
+[]
 ```
 
 Then your GPU is not properly configured.
diff --git a/docs/source/documentation/cellfinder/user-guide/napari-plugin/training-the-network.md b/docs/source/documentation/cellfinder/user-guide/napari-plugin/training-the-network.md
index 3b6e9f0b..94d021d3 100644
--- a/docs/source/documentation/cellfinder/user-guide/napari-plugin/training-the-network.md
+++ b/docs/source/documentation/cellfinder/user-guide/napari-plugin/training-the-network.md
@@ -57,4 +57,4 @@ The values can be reset by clicking the **Reset defaults** button.
 Click the **Run** button.&#x20;
 
 The plugin will then run (this may take a while if you have lots of training data, or you have set many epochs).
-Trained models (`.h5` files) will be saved into your output directory, to be used for cell detection.
+Trained models (`.keras` files) will be saved into your output directory, to be used for cell detection.
diff --git a/docs/source/documentation/setting-up/gpu.md b/docs/source/documentation/setting-up/gpu.md
index 4534a90e..8a273de9 100644
--- a/docs/source/documentation/setting-up/gpu.md
+++ b/docs/source/documentation/setting-up/gpu.md
@@ -5,7 +5,7 @@ Some BrainGlobe software will run faster if you have an NVIDIA GPU, and the appr
 
 ### Requirements
 
-The requirements are the same as those for [tensorflow GPU support](https://www.tensorflow.org/install/pip), 
+The requirements are the same as those for [PyTorch GPU support](https://pytorch.org/get-started/locally/), 
 but essentially you need:
 
 * A relatively modern **Windows or Linux based machine** (unfortunately, CUDA acceleration on macOS is not supported).
@@ -25,23 +25,12 @@ The first thing you definitely need is the drivers for your GPU, which can be do
 [here](https://www.nvidia.com/download/index.aspx?lang=en-us). Hopefully, these will have been installed when your 
 machine was set up, but for GPU support in BrainGlobe, you will need version **450.x or greater**.
 
-### Installing CUDA and cuDNN
+### Installing PyTorch with GPU support
 
-BrainGlobe uses [TensorFlow](https://www.tensorflow.org/) which relies upon [CUDA](https://en.wikipedia.org/wiki/CUDA) 
-and [cuDNN](https://developer.nvidia.com/cudnn). BrainGlobe requires **CUDA** and **cuDNN.**
+BrainGlobe uses [PyTorch](https://pytorch.org/) which relies upon [CUDA](https://en.wikipedia.org/wiki/CUDA) 
+and [cuDNN](https://developer.nvidia.com/cudnn). PyTorch will install the correct versions of CUDA and cuDNN
+for you based on the choices you make when [installing PyTorch](https://pytorch.org/get-started/locally/).
 
-CUDA and cuDNN are not too hard to install, but sometimes other software on your machine relies on different versions. 
-It is possible to switch between the two, and it is easier if you are [using conda](gpu) 
-(see [here](https://blog.kovalevskyi.com/multiple-version-of-cuda-libraries-on-the-same-machine-b9502d50ae77)).
-
-However, we recommend that you install CUDA 11.2 and cuDNN 8.1 via conda if possible:
-
-```
-conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
-```
-
-This method is easier and also doesn't require any admin rights (useful on a cluster or shared machine).
-
-If this does not work for any reason, or you wish to have a system-wide installation of CUDA and cuDNN, then CUDA can 
-be downloaded [here](https://developer.nvidia.com/cuda-toolkit-archive) and cuDNN from 
-[here](https://developer.nvidia.com/cudnn). N.B. you will need to sign up for a (free) account to download cuDNN.
+We recommend selecting the `Stable` PyTorch build, the `Conda` package, and
+`CUDA 11.8` as the compute platform. Ensure you have the `cellfinder` conda
+environment activated before running the command provided.
diff --git a/docs/source/tutorials/cellfinder-retraining.md b/docs/source/tutorials/cellfinder-retraining.md
index 1091d89d..06ec473f 100644
--- a/docs/source/tutorials/cellfinder-retraining.md
+++ b/docs/source/tutorials/cellfinder-retraining.md
@@ -46,7 +46,7 @@ is inside the `cellfinder-retraining` directory created earlier
 
 18. Click `run`
 19. The training will then run, watch the terminal for updates. Once complete, there will be trained models
-(files ending in `.h5`) in the `trained_network` directory that can be used for cell detection.
+(files ending in `.keras`) in the `trained_network` directory that can be used for cell detection.
 
 :::{hint}
 For more information about how to use the curation and training plugins, please see