GPU Tutorial #28

iancze · 2021-04-15T01:17:02Z

Is your feature request related to a problem or opportunity? Please describe.
One great benefit of PyTorch is the ability to easily run on GPU-accelerated hardware for substantial speedups. Currently, none of the tutorials show off this functionality, though it exists.

Describe the solution you'd like
At minimum, a section of a tutorial incorporating transfer code like in the PyTorch tutorial. For monolithic batches (common to most simple RML imaging problems), GPU accelerated training loops are the quickest way to a significant training speedup.

Alternative solutions
For batched training workflows (e.g., where each batch is an execution block of a measurement set), a distributed training loop (across multiple CPUs or even GPUs) has the potential to be faster.

trq5014 · 2021-06-01T20:38:28Z

I was able to download and install Cuda for gpu use, however, I am not able to get pytorch to be able to see it (torch.cuda.is_available() returns false). I have restarted my computer and checked the $CUDA_PATH variables and everything looks good. I can use numba with the gpu as far as I can tell, but pytorch wont recognize it. Does pytorch require the TCC driver?

iancze · 2021-06-01T21:00:39Z

I'm not whether you need to specifically configure CUDA as long as you install the right version of PyTorch depending on your hardware.

There is a little selector install widget on the Pytorch homepage: https://pytorch.org/ that might give a hint of what to try.

It looks like for Windows you might need to do something like
pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

If this resolves it, we should make a note in the installation instructions. I've just used default pip install torch torchvision and that's always seemed to work picking up the GPU on the Roar compute environment.

trq5014 · 2021-06-01T21:06:26Z

Sounds good, I will check on this and get back with any further issues. This specific issue I am having is not well documented online so it will be helpful to include this in the tutorial

trq5014 · 2021-06-01T22:03:31Z

Using that worked, the pip install torch torchvision was not working on my system for whatever reason. I will include this fix in the tutorial

iancze · 2021-06-01T22:20:06Z

Great! A brief note in the tutorial would be good, as well as a short note about the possibility of GPU-specific installs in the "Installation.rst" file would be great, thanks!

trq5014 · 2021-06-01T22:30:14Z

Would it be possible to include a screenshot (.png) of the Install PyTorch section on their homepage for greater clarification on this in the .rst file?

iancze · 2021-06-01T22:32:34Z

It's possible, but probably not necessary. I would just add a sentence or two that torch may need to be (re)installed separately and that more information is available on the pytorch homepage.

trq5014 · 2021-06-02T03:03:32Z

For the GPU tutorial itself, would it be sufficient to follow the Optimization Loop tutorial on the documentations site and show this running on the GPU and then compare the time of the loop ran on the CPU to that with the GPU and also to the time on the cluster (with multiple GPUs)? Or would you prefer something else? Also, using the GPU may be worth adding to the Issue #25 chapters

iancze · 2021-06-02T15:16:23Z

I think this first GPU tutorial can be on a much smaller scale, and doesn't need to require a fully running optimization loop. The main challenge here is that Github Actions (where we run the tests, and build the documentation and tutorials for the docs) does't yet provide a GPU environment. So, we won't be able to test any particular tutorial as part of the continuous integration environment, and will just need to rely on our local tests where we have GPUs. If we did have this functionality available, then I agree a loop comparing the runtime would be really nice.

I was thinking this could be a very short example of how to initialize a model and then transfer it to the GPU to run. Given the restrictions on GPU CI, it might be easiest to write this as a static *.rst file with Python code snippets (e.g., https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#showing-code-examples) highlighting the main idea. For example, a line

# query to see if we have a GPU
if torch.cuda.is_available():
    device = "cuda:0"
else:
    device = "cpu"

and then a few examples on how to initialize/transfer tensors and nn.modules to and from the GPU, like in here: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

We'll most likely return to this idea in the production-ready scripts of #63 , that we ourselves will want to run on Roar w/ multiple GPUs. So we will cite/link to this tutorial for more explanation.

Updated installation.rst file to include potential issues when installing torch and torchvision for CUDA work Added gpu_setup_tutorial.rst file to the docs/tutorias folder

iancze added documentation Improvements or additions to documentation roadmap Planned development labels Apr 15, 2021

iancze mentioned this issue Apr 15, 2021

Development Roadmap #5

Closed

10 tasks

iancze added this to the v0.1.1 milestone Apr 15, 2021

iancze modified the milestones: v0.1.1, v0.1.2 May 5, 2021

iancze closed this as completed in 10e77a4 Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Tutorial #28

GPU Tutorial #28

iancze commented Apr 15, 2021 •

edited

Loading

trq5014 commented Jun 1, 2021

iancze commented Jun 1, 2021

trq5014 commented Jun 1, 2021

trq5014 commented Jun 1, 2021

iancze commented Jun 1, 2021

trq5014 commented Jun 1, 2021

iancze commented Jun 1, 2021

trq5014 commented Jun 2, 2021 •

edited

Loading

iancze commented Jun 2, 2021

GPU Tutorial #28

GPU Tutorial #28

Comments

iancze commented Apr 15, 2021 • edited Loading

trq5014 commented Jun 1, 2021

iancze commented Jun 1, 2021

trq5014 commented Jun 1, 2021

trq5014 commented Jun 1, 2021

iancze commented Jun 1, 2021

trq5014 commented Jun 1, 2021

iancze commented Jun 1, 2021

trq5014 commented Jun 2, 2021 • edited Loading

iancze commented Jun 2, 2021

iancze commented Apr 15, 2021 •

edited

Loading

trq5014 commented Jun 2, 2021 •

edited

Loading