You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
... (1-13 epoch info)
------------------------------------------------------------------
Epoch: 14 Time: 61.3263 Loss: 6.0985 LearningRate 0.000200
------------------------------------------------------------------
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1120, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/multiprocessing/queues.py", line 104, in get
if not self._poll(timeout):
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/multiprocessing/connection.py", line 414, in _poll
r = wait([self], timeout)
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/multiprocessing/connection.py", line 921, in wait
ready = selector.select(timeout)
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 4112666) is killed by signal: Segmentation fault.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 174, in <module>
for ii, data_val in enumerate((val_loader), 0):
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
data = self._next_data()
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1316, in _next_data
idx, data = self._get_data()
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1282, in _get_data
success, data = self._try_get_data()
File "/DATA_EDS/x123/anaconda3/envs/ShadowFormer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 4112666, 4112989) exited unexpectedly
The text was updated successfully, but these errors were encountered:
When I try to use custom dataset for training, Segmentation fault occurs after completing 14 epochs training.
How can I fix it?
dataset: custom data (training set: 475 , validation set: 53)
image size: 960*480
Environment and configuration are as follows:
Python: 3.7.16
PyTorch: 1.13.1
CUDA: 11.6
Trainning command:
python train.py --warmup --checkpoint 1 --win_size 10 --train_ps 320 --env _self_dataset --gpu 6,7
error info:
The text was updated successfully, but these errors were encountered: