Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda error #16

Open
mc171819 opened this issue Aug 13, 2021 · 3 comments
Open

cuda error #16

mc171819 opened this issue Aug 13, 2021 · 3 comments

Comments

@mc171819
Copy link

hi, when i run your train.py, it comes out an error:
2021-08-13 15:43:07,291 INFO Start training voxel_rcnn/voxel_rcnn_car(default)
epochs: 0%| | 0/80 [00:00<?, ?it/sError!: 0%| | 0/3741 [00:00<?, ?it/s]
Error!
epochs: 0%| | 0/80 [01:15<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 198, in
main()
File "train.py", line 170, in main
merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch
File "/data/mc_data/Voxel-R-CNN-main/tools/train_utils/train_utils.py", line 93, in train_model
dataloader_iter=dataloader_iter
File "/data/mc_data/Voxel-R-CNN-main/tools/train_utils/train_utils.py", line 38, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/home/mc/Project/OpenPCDet/pcdet/models/init.py", line 42, in model_func
ret_dict, tb_dict, disp_dict = model(batch_dict)
File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/mc/Project/OpenPCDet/pcdet/models/detectors/voxel_rcnn.py", line 11, in forward
batch_dict = cur_module(batch_dict)
File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/voxelrcnn_head.py", line 227, in forward
targets_dict = self.assign_targets(batch_dict)
File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/roi_head_template.py", line 104, in assign_targets
targets_dict = self.proposal_target_layer.forward(batch_dict)
File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/target_assigner/proposal_target_layer.py", line 33, in forward
batch_dict=batch_dict
File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/target_assigner/proposal_target_layer.py", line 101, in sample_rois_for_rcnn
gt_boxes=cur_gt[:, 0:7], gt_labels=cur_gt[:, -1].long()
File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/target_assigner/proposal_target_layer.py", line 223, in get_max_iou_with_same_class
iou3d = iou3d_nms_utils.boxes_iou3d_gpu(cur_roi, cur_gt) # (M, N)
File "/home/mc/Project/OpenPCDet/pcdet/ops/iou3d_nms/iou3d_nms_utils.py", line 71, in boxes_iou3d_gpu
overlaps_h = torch.clamp(min_of_max - max_of_min, min=0)
RuntimeError: CUDA error: invalid device function
Traceback (most recent call last):
File "/opt/anaconda3/envs/objfuse/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/anaconda3/envs/objfuse/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/objfuse/bin/python', '-u', 'train.py', '--local_rank=0', '--launcher', 'pytorch', '--cfg_file', 'cfgs/voxel_rcnn/voxel_rcnn_car.yaml', '--epochs', '80', '--workers', '8']' died with <Signals.SIGSEGV: 11>.

i use pytorch1.4,cudatoolkit=10.1,gpu is 2080ti. canyou give me some advice?

@djiajunustc
Copy link
Owner

Hi @mc171819 ,

This error hasn't occurred to me.
I suggest you run the code with the docker image I provide.

@mc171819
Copy link
Author

Hi @mc171819 ,

This error hasn't occurred to me.
I suggest you run the code with the docker image I provide.

hi, i tried using the docker image you provide, but it still doesn't work. i wonder if i mistake something. can you show me the concret step to use the docker image?

@mc171819
Copy link
Author

mc171819 commented Sep 17, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants