-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault Or Memory Aceess Exception During Test or Evaluation #61
Comments
conda list: packages in environment at /home/david/anaconda3/envs/rvt:Name Version Build Channel_libgcc_mutex 0.1 main rvt) david@:RVT-master$ echo $LD_LIBRARY_PATH echo $HDF5_PLUGIN_PATH |
I am running into segmentation fault error most of the time with no python exception showing. Run the same test/evaluation I am getting python exceptions:
File "/home/david/project/rvt/RVT-master/modules/utils/detection.py", line 41, in add_backbone_features
self.features[k].append(v[selected_indices] if selected_indices is not None else v)
It's consistent at this line 41. Anyone knows why and how to resolve the issue?
Command line for test:
python validation.py dataset=gen1 dataset.path=/home/david/project/rvt/gen1 checkpoint=/home/david/project/rvt/rvt-s.ckpt use_test_set=0 hardware.gpus=0 +experiment/gen1="small.yaml" batch_size.eval=8 model.postprocess.confidence_threshold=0.001
--------------------------- exception log----------------------------------
_Loaded model weights from checkpoint at /home/david/project/rvt/rvt-s.ckpt
Validation DataLoader 0: : 0it [00:00, ?it/s]Error executing job with overrides: ['dataset=gen1', 'dataset.path=/home/david/project/rvt/gen1', 'checkpoint=/home/david/project/rvt/rvt-s.ckpt', 'use_test_set=0', 'hardware.gpus=0', '+experiment/gen1=small.yaml', 'batch_size.eval=8', 'model.postprocess.confidence_threshold=0.001']
Traceback (most recent call last):
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 736, in _validate_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run
results = self._run_stage()
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1174, in _run_stage
return self._run_evaluate()
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_evaluate
eval_loop_results = self._evaluation_loop.run()
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 152, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 137, in advance
output = self._evaluation_step(**kwargs)
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 234, in _evaluation_step
output = self.trainer._call_strategy_hook(hook_name, *kwargs.values())
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1480, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/david/anaconda3/envs/rvt/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 390, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/home/david/project/rvt/RVT-master/modules/detection.py", line 283, in validation_step
return self._val_test_step_impl(batch=batch, mode=Mode.VAL)
File "/home/david/project/rvt/RVT-master/modules/detection.py", line 249, in _val_test_step_impl
backbone_feature_selector.add_backbone_features(backbone_features=backbone_features,
File "/home/david/project/rvt/RVT-master/modules/utils/detection.py", line 41, in add_backbone_features
self.features[k].append(v[selected_indices] if selected_indices is not None else v)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions._
The text was updated successfully, but these errors were encountered: