You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run the code on a Chinese ner train data(around 70 thousand sentences, and I set the LM-LSTM-crf to co-train model), and I got the OMM error:
When I set the batch_size to 10, it results in:
Tot it 6916 (epoch 0): 6308it [26:09, 4.02it/s]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train_wc.py", line 243, in
loss.backward()
File "/usr/local/lib/python3.5/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/usr/local/lib/python3.5/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
When I set the batch_size to 128, it results in:
Tot it 543 (epoch 0): 455it [03:57, 1.91it/s]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train_wc.py", line 241, in
loss = loss + args.lambda0 * crit_lm(cbs, cf_y.view(-1))
File "/usr/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line325, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/site-packages/torch/nn/modules/loss.py", line 601, in forward
self.ignore_index, self.reduce)
File "/usr/local/lib/python3.5/site-packages/torch/nn/functional.py", line 1140, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, size_average, ignore_index, reduce)
File "/usr/local/lib/python3.5/site-packages/torch/nn/functional.py", line 786, in log_softmax
return torch._C._nn.log_softmax(input, dim)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
Could any one give me some advise to solve it?
The text was updated successfully, but these errors were encountered:
Hi, what type of GPU you are using, and how large is its memory?
For chinese, even the character-level language modeling would result in a large dictionary (and also large GPU memory consumptions). One way to alleviate this problem is to filter some low-frequency words as unknown tokens.
The type of GPU is Tesla K40c, We have 4 piece and each has 10 Memory.
Both of using only one GPU or set it to multi-GPU in the pytorch code have the same OOM error.
And set mini_count to 5 even 10 also doesn't work.
But if I do not use the co_train, it works well~
I run the code on a Chinese ner train data(around 70 thousand sentences, and I set the LM-LSTM-crf to co-train model), and I got the OMM error:
When I set the batch_size to 10, it results in:
Traceback (most recent call last):
File "train_wc.py", line 243, in
loss.backward()
File "/usr/local/lib/python3.5/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/usr/local/lib/python3.5/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
When I set the batch_size to 128, it results in:
Traceback (most recent call last):
File "train_wc.py", line 241, in
loss = loss + args.lambda0 * crit_lm(cbs, cf_y.view(-1))
File "/usr/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line325, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/site-packages/torch/nn/modules/loss.py", line 601, in forward
self.ignore_index, self.reduce)
File "/usr/local/lib/python3.5/site-packages/torch/nn/functional.py", line 1140, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, size_average, ignore_index, reduce)
File "/usr/local/lib/python3.5/site-packages/torch/nn/functional.py", line 786, in log_softmax
return torch._C._nn.log_softmax(input, dim)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
Could any one give me some advise to solve it?
The text was updated successfully, but these errors were encountered: