Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #245

Open
jayshent opened this issue Jul 25, 2024 · 0 comments

Comments

@jayshent
Copy link

jayshent commented Jul 25, 2024

Hi all, I tried to run the train.py file but encounter this issue. Anyone have any idea on how to fix it?

$ python3 train.py --batch-size 32 --cfg cfg/yolov3.cfg --data data/coco.data --weights ''
Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex
Namespace(adam=False, batch_size=32, bucket='', cache_images=False, cfg='cfg/yolov3.cfg', data='data/coco.data', device='', epochs=300, evolve=False, freeze_layers=False, img_size=[320, 640, 640], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='')
Using CUDA device0 _CudaDeviceProperties(name='NVIDIA A100 80GB PCIe MIG 3g.40gb', total_memory=40448MB)

Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/
WARNING: smart bias initialization failure.
WARNING: smart bias initialization failure.
WARNING: smart bias initialization failure.
Model Summary: 222 layers, 6.19491e+07 parameters, 6.19491e+07 gradients
Optimizer groups: 75 .bias, 75 Conv2d.weight, 72 other
Caching labels /home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/data/coco/labels/train2014.npy (117264 found, 0 missing, 0 empty, 4514 duplicat
Caching labels /home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/data/coco/labels/val2014.npy (4954 found, 0 missing, 0 empty, 197 duplicate, fo
Image sizes 320 - 640 train, 640 test
Using 8 dataloader workers
Starting training for 300 epochs...

 Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size

0%| | 0/3665 [00:15<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 435, in
train(hyp) # train normally
File "train.py", line 283, in train
loss, loss_items = compute_loss(pred, targets, model)
File "/ibm/gpfs/home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/utils/utils.py", line 356, in compute_loss
tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets
File "/ibm/gpfs/home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/utils/utils.py", line 441, in build_targets
a, t = at[j], t.repeat(na, 1, 1)[j] # filter
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant