Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA enabled torchvision silently broken builds if not built from a host with a physical GPU. #2566

Open
Micket opened this issue Sep 6, 2021 · 1 comment

Comments

@Micket
Copy link
Contributor

Micket commented Sep 6, 2021

It still links to all the CUDA libraries, so at a glance it doesn't look obviously broken. But, apparently a vital CUDA backend for torchvision::nms is missing.

The following code will reveal if the issue is present with the installation;

import torch
from torchvision.ops import boxes as box_ops

N = 10

device = torch.device('cuda:0')
boxes = torch.rand((N,4), device=device)
scores = torch.rand((N), device=device)
idxs = torch.zeros((N), device=device)
iou_threshold = 0.5
box_ops.batched_nms(boxes.float(), scores, idxs, iou_threshold)

I think we should either try to detect if it's about to be a problem, or fix it somehow (perhaps some extra flags can coerce torchvision to build the backend regardless.

@branfosj
Copy link
Member

branfosj commented Sep 6, 2021

Switch to using CMakePythonPackage for the easyblock?

Also, setting env.setvar('FORCE_CUDA', '1') builds the C++ bindings on a non-GPU node. We might be able to use -DWITH_CUDA=on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants