From 1bc42378708b89d9c6627be2045c15a7cc37e214 Mon Sep 17 00:00:00 2001 From: Igor Tatarnikov <61896994+IgorTatarnikov@users.noreply.github.com> Date: Tue, 4 Jun 2024 14:41:23 +0100 Subject: [PATCH] TensorFlow to PyTorch migration (#189) * Changed cellfinder/troubleshooting/index.md and speed-up.md to refer to PyTorch instead of tensorflow * Changed setting-up gpu instruction * Added some extra information about installing PyTorch with GPU support * Added suggested selections when installing from the PyTorch website * References to trained models specify .keras extension not .h5 --- .../cellfinder/troubleshooting/index.md | 2 +- .../cellfinder/troubleshooting/speed-up.md | 26 +++++------------- .../napari-plugin/training-the-network.md | 2 +- docs/source/documentation/setting-up/gpu.md | 27 ++++++------------- .../source/tutorials/cellfinder-retraining.md | 2 +- 5 files changed, 17 insertions(+), 42 deletions(-) diff --git a/docs/source/documentation/cellfinder/troubleshooting/index.md b/docs/source/documentation/cellfinder/troubleshooting/index.md index 6b79e7a0..a96c51bd 100644 --- a/docs/source/documentation/cellfinder/troubleshooting/index.md +++ b/docs/source/documentation/cellfinder/troubleshooting/index.md @@ -25,7 +25,7 @@ For more details, please see the guide to [retraining the pre-trained network](/ As `brainmapper` relies on a number of third party libraries, notably -- [TensorFlow](https://www.tensorflow.org/), +- [PyTorch](https://pytorch.org/), - [CUDA](https://developer.nvidia.com/cuda-zone), - [cuDNN](https://developer.nvidia.com/cudnn), diff --git a/docs/source/documentation/cellfinder/troubleshooting/speed-up.md b/docs/source/documentation/cellfinder/troubleshooting/speed-up.md index a6826520..8795d3db 100644 --- a/docs/source/documentation/cellfinder/troubleshooting/speed-up.md +++ b/docs/source/documentation/cellfinder/troubleshooting/speed-up.md @@ -47,40 +47,26 @@ Open a terminal (or Anaconda Prompt), start Python, python ``` -and check that tensorflow can use the GPU, +and check that PyTorch can use the GPU, ```python -import tensorflow as tf -tf.test.is_gpu_available() +import torch +print(torch.cuda.is_available()) +print([(i, torch.cuda.get_device_properties(i)) for i in range(torch.cuda.device_count())]) ``` If you see something like the output below, then all is well. ```bash -2019-06-26 10:51:34.697900: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F -2019-06-26 10:51:34.881183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: -name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 -pciBusID: 0000:2d:00.0 -totalMemory: 23.62GiB freeMemory: 504.25MiB -2019-06-26 10:51:34.881217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 -2019-06-26 10:51:35.251465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: -2019-06-26 10:51:35.251505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 -2019-06-26 10:51:35.251511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N -2019-06-26 10:51:35.251729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 195 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:2d:00.0, compute capability: 7.5) True +[(0, _CudaDeviceProperties(name='NVIDIA GeForce RTX 4080', major=8, minor=9, total_memory=16049MB, multi_processor_count=76))] ``` If you see something like this: ```bash -2020-05-11 12:02:11.891275: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA -2020-05-11 12:02:11.948022: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1992000000 Hz -2020-05-11 12:02:11.949756: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ae9ffc5860 initialized for platform Host (this does not guarantee that XLA will be used). Devices: -2020-05-11 12:02:11.949823: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version -2020-05-11 12:02:11.954796: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory -2020-05-11 12:02:11.954847: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303) -2020-05-11 12:02:11.954894: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (hostname): /proc/driver/nvidia/version does not exist False +[] ``` Then your GPU is not properly configured. diff --git a/docs/source/documentation/cellfinder/user-guide/napari-plugin/training-the-network.md b/docs/source/documentation/cellfinder/user-guide/napari-plugin/training-the-network.md index 3b6e9f0b..94d021d3 100644 --- a/docs/source/documentation/cellfinder/user-guide/napari-plugin/training-the-network.md +++ b/docs/source/documentation/cellfinder/user-guide/napari-plugin/training-the-network.md @@ -57,4 +57,4 @@ The values can be reset by clicking the **Reset defaults** button. Click the **Run** button. The plugin will then run (this may take a while if you have lots of training data, or you have set many epochs). -Trained models (`.h5` files) will be saved into your output directory, to be used for cell detection. +Trained models (`.keras` files) will be saved into your output directory, to be used for cell detection. diff --git a/docs/source/documentation/setting-up/gpu.md b/docs/source/documentation/setting-up/gpu.md index 4534a90e..8a273de9 100644 --- a/docs/source/documentation/setting-up/gpu.md +++ b/docs/source/documentation/setting-up/gpu.md @@ -5,7 +5,7 @@ Some BrainGlobe software will run faster if you have an NVIDIA GPU, and the appr ### Requirements -The requirements are the same as those for [tensorflow GPU support](https://www.tensorflow.org/install/pip), +The requirements are the same as those for [PyTorch GPU support](https://pytorch.org/get-started/locally/), but essentially you need: * A relatively modern **Windows or Linux based machine** (unfortunately, CUDA acceleration on macOS is not supported). @@ -25,23 +25,12 @@ The first thing you definitely need is the drivers for your GPU, which can be do [here](https://www.nvidia.com/download/index.aspx?lang=en-us). Hopefully, these will have been installed when your machine was set up, but for GPU support in BrainGlobe, you will need version **450.x or greater**. -### Installing CUDA and cuDNN +### Installing PyTorch with GPU support -BrainGlobe uses [TensorFlow](https://www.tensorflow.org/) which relies upon [CUDA](https://en.wikipedia.org/wiki/CUDA) -and [cuDNN](https://developer.nvidia.com/cudnn). BrainGlobe requires **CUDA** and **cuDNN.** +BrainGlobe uses [PyTorch](https://pytorch.org/) which relies upon [CUDA](https://en.wikipedia.org/wiki/CUDA) +and [cuDNN](https://developer.nvidia.com/cudnn). PyTorch will install the correct versions of CUDA and cuDNN +for you based on the choices you make when [installing PyTorch](https://pytorch.org/get-started/locally/). -CUDA and cuDNN are not too hard to install, but sometimes other software on your machine relies on different versions. -It is possible to switch between the two, and it is easier if you are [using conda](gpu) -(see [here](https://blog.kovalevskyi.com/multiple-version-of-cuda-libraries-on-the-same-machine-b9502d50ae77)). - -However, we recommend that you install CUDA 11.2 and cuDNN 8.1 via conda if possible: - -``` -conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0 -``` - -This method is easier and also doesn't require any admin rights (useful on a cluster or shared machine). - -If this does not work for any reason, or you wish to have a system-wide installation of CUDA and cuDNN, then CUDA can -be downloaded [here](https://developer.nvidia.com/cuda-toolkit-archive) and cuDNN from -[here](https://developer.nvidia.com/cudnn). N.B. you will need to sign up for a (free) account to download cuDNN. +We recommend selecting the `Stable` PyTorch build, the `Conda` package, and +`CUDA 11.8` as the compute platform. Ensure you have the `cellfinder` conda +environment activated before running the command provided. diff --git a/docs/source/tutorials/cellfinder-retraining.md b/docs/source/tutorials/cellfinder-retraining.md index 1091d89d..06ec473f 100644 --- a/docs/source/tutorials/cellfinder-retraining.md +++ b/docs/source/tutorials/cellfinder-retraining.md @@ -46,7 +46,7 @@ is inside the `cellfinder-retraining` directory created earlier 18. Click `run` 19. The training will then run, watch the terminal for updates. Once complete, there will be trained models -(files ending in `.h5`) in the `trained_network` directory that can be used for cell detection. +(files ending in `.keras`) in the `trained_network` directory that can be used for cell detection. :::{hint} For more information about how to use the curation and training plugins, please see