-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Tutorial #28
Comments
I was able to download and install Cuda for gpu use, however, I am not able to get pytorch to be able to see it (torch.cuda.is_available() returns false). I have restarted my computer and checked the $CUDA_PATH variables and everything looks good. I can use numba with the gpu as far as I can tell, but pytorch wont recognize it. Does pytorch require the TCC driver? |
I'm not whether you need to specifically configure CUDA as long as you install the right version of PyTorch depending on your hardware. There is a little selector install widget on the Pytorch homepage: https://pytorch.org/ that might give a hint of what to try. It looks like for Windows you might need to do something like If this resolves it, we should make a note in the installation instructions. I've just used default |
Sounds good, I will check on this and get back with any further issues. This specific issue I am having is not well documented online so it will be helpful to include this in the tutorial |
Using that worked, the pip install torch torchvision was not working on my system for whatever reason. I will include this fix in the tutorial |
Great! A brief note in the tutorial would be good, as well as a short note about the possibility of GPU-specific installs in the "Installation.rst" file would be great, thanks! |
Would it be possible to include a screenshot (.png) of the Install PyTorch section on their homepage for greater clarification on this in the .rst file? |
It's possible, but probably not necessary. I would just add a sentence or two that torch may need to be (re)installed separately and that more information is available on the pytorch homepage. |
For the GPU tutorial itself, would it be sufficient to follow the Optimization Loop tutorial on the documentations site and show this running on the GPU and then compare the time of the loop ran on the CPU to that with the GPU and also to the time on the cluster (with multiple GPUs)? Or would you prefer something else? Also, using the GPU may be worth adding to the Issue #25 chapters |
I think this first GPU tutorial can be on a much smaller scale, and doesn't need to require a fully running optimization loop. The main challenge here is that Github Actions (where we run the tests, and build the documentation and tutorials for the docs) does't yet provide a GPU environment. So, we won't be able to test any particular tutorial as part of the continuous integration environment, and will just need to rely on our local tests where we have GPUs. If we did have this functionality available, then I agree a loop comparing the runtime would be really nice. I was thinking this could be a very short example of how to initialize a model and then transfer it to the GPU to run. Given the restrictions on GPU CI, it might be easiest to write this as a static *.rst file with Python code snippets (e.g., https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#showing-code-examples) highlighting the main idea. For example, a line
and then a few examples on how to initialize/transfer tensors and nn.modules to and from the GPU, like in here: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html We'll most likely return to this idea in the production-ready scripts of #63 , that we ourselves will want to run on Roar w/ multiple GPUs. So we will cite/link to this tutorial for more explanation. |
Updated installation.rst file to include potential issues when installing torch and torchvision for CUDA work Added gpu_setup_tutorial.rst file to the docs/tutorias folder
Is your feature request related to a problem or opportunity? Please describe.
One great benefit of PyTorch is the ability to easily run on GPU-accelerated hardware for substantial speedups. Currently, none of the tutorials show off this functionality, though it exists.
Describe the solution you'd like
At minimum, a section of a tutorial incorporating transfer code like in the PyTorch tutorial. For monolithic batches (common to most simple RML imaging problems), GPU accelerated training loops are the quickest way to a significant training speedup.
Alternative solutions
For batched training workflows (e.g., where each batch is an execution block of a measurement set), a distributed training loop (across multiple CPUs or even GPUs) has the potential to be faster.
The text was updated successfully, but these errors were encountered: