Skip to content

Commit

Permalink
TensorFlow to PyTorch migration (#189)
Browse files Browse the repository at this point in the history
* Changed cellfinder/troubleshooting/index.md and speed-up.md to refer to PyTorch instead of tensorflow

* Changed setting-up gpu instruction

* Added some extra information about installing PyTorch with GPU support

* Added suggested selections when installing from the PyTorch website

* References to trained models specify .keras extension not .h5
  • Loading branch information
IgorTatarnikov authored Jun 4, 2024
1 parent 96f2118 commit 1bc4237
Show file tree
Hide file tree
Showing 5 changed files with 17 additions and 42 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ For more details, please see the guide to [retraining the pre-trained network](/

As `brainmapper` relies on a number of third party libraries, notably

- [TensorFlow](https://www.tensorflow.org/),
- [PyTorch](https://pytorch.org/),
- [CUDA](https://developer.nvidia.com/cuda-zone),
- [cuDNN](https://developer.nvidia.com/cudnn),

Expand Down
26 changes: 6 additions & 20 deletions docs/source/documentation/cellfinder/troubleshooting/speed-up.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,40 +47,26 @@ Open a terminal (or Anaconda Prompt), start Python,
python
```

and check that tensorflow can use the GPU,
and check that PyTorch can use the GPU,

```python
import tensorflow as tf
tf.test.is_gpu_available()
import torch
print(torch.cuda.is_available())
print([(i, torch.cuda.get_device_properties(i)) for i in range(torch.cuda.device_count())])
```

If you see something like the output below, then all is well.

```bash
2019-06-26 10:51:34.697900: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2019-06-26 10:51:34.881183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:2d:00.0
totalMemory: 23.62GiB freeMemory: 504.25MiB
2019-06-26 10:51:34.881217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-26 10:51:35.251465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-26 10:51:35.251505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-26 10:51:35.251511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-26 10:51:35.251729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 195 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:2d:00.0, compute capability: 7.5)
True
[(0, _CudaDeviceProperties(name='NVIDIA GeForce RTX 4080', major=8, minor=9, total_memory=16049MB, multi_processor_count=76))]
```

If you see something like this:

```bash
2020-05-11 12:02:11.891275: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-11 12:02:11.948022: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1992000000 Hz
2020-05-11 12:02:11.949756: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ae9ffc5860 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-11 12:02:11.949823: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-05-11 12:02:11.954796: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-05-11 12:02:11.954847: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
2020-05-11 12:02:11.954894: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (hostname): /proc/driver/nvidia/version does not exist
False
[]
```

Then your GPU is not properly configured.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,4 @@ The values can be reset by clicking the **Reset defaults** button.
Click the **Run** button. 

The plugin will then run (this may take a while if you have lots of training data, or you have set many epochs).
Trained models (`.h5` files) will be saved into your output directory, to be used for cell detection.
Trained models (`.keras` files) will be saved into your output directory, to be used for cell detection.
27 changes: 8 additions & 19 deletions docs/source/documentation/setting-up/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Some BrainGlobe software will run faster if you have an NVIDIA GPU, and the appr

### Requirements

The requirements are the same as those for [tensorflow GPU support](https://www.tensorflow.org/install/pip),
The requirements are the same as those for [PyTorch GPU support](https://pytorch.org/get-started/locally/),
but essentially you need:

* A relatively modern **Windows or Linux based machine** (unfortunately, CUDA acceleration on macOS is not supported).
Expand All @@ -25,23 +25,12 @@ The first thing you definitely need is the drivers for your GPU, which can be do
[here](https://www.nvidia.com/download/index.aspx?lang=en-us). Hopefully, these will have been installed when your
machine was set up, but for GPU support in BrainGlobe, you will need version **450.x or greater**.

### Installing CUDA and cuDNN
### Installing PyTorch with GPU support

BrainGlobe uses [TensorFlow](https://www.tensorflow.org/) which relies upon [CUDA](https://en.wikipedia.org/wiki/CUDA)
and [cuDNN](https://developer.nvidia.com/cudnn). BrainGlobe requires **CUDA** and **cuDNN.**
BrainGlobe uses [PyTorch](https://pytorch.org/) which relies upon [CUDA](https://en.wikipedia.org/wiki/CUDA)
and [cuDNN](https://developer.nvidia.com/cudnn). PyTorch will install the correct versions of CUDA and cuDNN
for you based on the choices you make when [installing PyTorch](https://pytorch.org/get-started/locally/).

CUDA and cuDNN are not too hard to install, but sometimes other software on your machine relies on different versions.
It is possible to switch between the two, and it is easier if you are [using conda](gpu)
(see [here](https://blog.kovalevskyi.com/multiple-version-of-cuda-libraries-on-the-same-machine-b9502d50ae77)).

However, we recommend that you install CUDA 11.2 and cuDNN 8.1 via conda if possible:

```
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
```

This method is easier and also doesn't require any admin rights (useful on a cluster or shared machine).

If this does not work for any reason, or you wish to have a system-wide installation of CUDA and cuDNN, then CUDA can
be downloaded [here](https://developer.nvidia.com/cuda-toolkit-archive) and cuDNN from
[here](https://developer.nvidia.com/cudnn). N.B. you will need to sign up for a (free) account to download cuDNN.
We recommend selecting the `Stable` PyTorch build, the `Conda` package, and
`CUDA 11.8` as the compute platform. Ensure you have the `cellfinder` conda
environment activated before running the command provided.
2 changes: 1 addition & 1 deletion docs/source/tutorials/cellfinder-retraining.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ is inside the `cellfinder-retraining` directory created earlier

18. Click `run`
19. The training will then run, watch the terminal for updates. Once complete, there will be trained models
(files ending in `.h5`) in the `trained_network` directory that can be used for cell detection.
(files ending in `.keras`) in the `trained_network` directory that can be used for cell detection.

:::{hint}
For more information about how to use the curation and training plugins, please see
Expand Down

0 comments on commit 1bc4237

Please sign in to comment.