Training Error #28

DanChen001 · 2018-11-05T23:34:04Z

Hi,

Thank you for sharing the code.

I meet the following error whlie training the network. Do you know the reason? Thanks.

#####################

Namespace(batchSize=128, clip=0.4, cuda=True, gpus='0', lr=0.1, momentum=0.9, nEpochs=50, pretrained='', resume='', start_epoch=1, step=10, threads=1, weight_decay=0.0001)
=> use gpu id: '0'
Random Seed: 3131
===> Loading datasets
===> Building model
C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
===> Setting GPU
===> Setting Optimizer
===> Training
Epoch = 1, lr = 0.1
Traceback (most recent call last):
File "main_vdsr.py", line 130, in
main()
File "main_vdsr.py", line 85, in main
train(training_data_loader, optimizer, model, criterion, epoch)
File "main_vdsr.py", line 103, in train
for iteration, batch in enumerate(training_data_loader, 1):
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 501, in iter
return _DataLoaderIter(self)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 289, in init
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread._local objects
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

DanChen001 · 2018-11-06T00:28:56Z

Hi, @twtygqyy
I solved this problem as suggested #11 @ZhaoJinHA Thanks.

However, I do not know the reason, do you know why? thanks

DanChen001 · 2018-11-06T01:12:24Z

for testing, I have the following problem, do you know why? Thanks. @twtygqyy
#####################
=> use gpu id: '0'
C:\ProgramData\Anaconda3\lib\site-packages\torch\serialization.py:425: SourceChangeWarning: source code of class 'torch.nn.modules.container.Sequential' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
File "eval.py", line 33, in
model = torch.load(opt.model, map_location=lambda storage, loc: storage)["model"]
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\serialization.py", line 358, in load
return _load(f, map_location, pickle_module)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\serialization.py", line 542, in _load
result = unpickler.load()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 918: ordinal not in range(128)

twtygqyy · 2018-11-06T01:19:52Z

Hi @DanChen001 , the problem is due to the python version. Please refer #21 (comment) for the solution.
And I think the first issue you mentioned is due to training with multiple gpu and testing with single gpu.

DanChen001 · 2018-11-06T02:44:56Z

Hi @DanChen001 , the problem is due to the python version. Please refer #21 (comment) for the solution.
And I think the first issue you mentioned is due to training with multiple gpu and testing with single gpu.

@twtygqyy Thanks. I will try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Error #28

Training Error #28

DanChen001 commented Nov 5, 2018

DanChen001 commented Nov 6, 2018

DanChen001 commented Nov 6, 2018

twtygqyy commented Nov 6, 2018

DanChen001 commented Nov 6, 2018

Training Error #28

Training Error #28

Comments

DanChen001 commented Nov 5, 2018

DanChen001 commented Nov 6, 2018

DanChen001 commented Nov 6, 2018

twtygqyy commented Nov 6, 2018

DanChen001 commented Nov 6, 2018