Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Error #28

Open
DanChen001 opened this issue Nov 5, 2018 · 4 comments
Open

Training Error #28

DanChen001 opened this issue Nov 5, 2018 · 4 comments

Comments

@DanChen001
Copy link

Hi,

Thank you for sharing the code.

I meet the following error whlie training the network. Do you know the reason? Thanks.

#####################

Namespace(batchSize=128, clip=0.4, cuda=True, gpus='0', lr=0.1, momentum=0.9, nEpochs=50, pretrained='', resume='', start_epoch=1, step=10, threads=1, weight_decay=0.0001)
=> use gpu id: '0'
Random Seed: 3131
===> Loading datasets
===> Building model
C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
===> Setting GPU
===> Setting Optimizer
===> Training
Epoch = 1, lr = 0.1
Traceback (most recent call last):
File "main_vdsr.py", line 130, in
main()
File "main_vdsr.py", line 85, in main
train(training_data_loader, optimizer, model, criterion, epoch)
File "main_vdsr.py", line 103, in train
for iteration, batch in enumerate(training_data_loader, 1):
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 501, in iter
return _DataLoaderIter(self)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 289, in init
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread._local objects
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

@DanChen001
Copy link
Author

Hi, @twtygqyy
I solved this problem as suggested #11 @ZhaoJinHA Thanks.

However, I do not know the reason, do you know why? thanks

@DanChen001
Copy link
Author

for testing, I have the following problem, do you know why? Thanks. @twtygqyy
#####################
=> use gpu id: '0'
C:\ProgramData\Anaconda3\lib\site-packages\torch\serialization.py:425: SourceChangeWarning: source code of class 'torch.nn.modules.container.Sequential' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
File "eval.py", line 33, in
model = torch.load(opt.model, map_location=lambda storage, loc: storage)["model"]
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\serialization.py", line 358, in load
return _load(f, map_location, pickle_module)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\serialization.py", line 542, in _load
result = unpickler.load()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 918: ordinal not in range(128)

@twtygqyy
Copy link
Owner

twtygqyy commented Nov 6, 2018

Hi @DanChen001 , the problem is due to the python version. Please refer #21 (comment) for the solution.
And I think the first issue you mentioned is due to training with multiple gpu and testing with single gpu.

@DanChen001
Copy link
Author

Hi @DanChen001 , the problem is due to the python version. Please refer #21 (comment) for the solution.
And I think the first issue you mentioned is due to training with multiple gpu and testing with single gpu.

@twtygqyy Thanks. I will try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants