Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layerwise GPU memory use #98

Open
abramhindle opened this issue Sep 8, 2015 · 1 comment
Open

Layerwise GPU memory use #98

abramhindle opened this issue Sep 8, 2015 · 1 comment

Comments

@abramhindle
Copy link

Hi, I have a feeling that layerwise optimizer, by creating numerous networks is not freeing past networks and using more GPU memory than it should. I'm having a heck of time doing layerwise training

With this network:

inputs = 4096*2
win_size = 2048
swin_size = win_size / 2 + 1
output_size = swin_size
hidlayersize = win_size
exp = theanets.Experiment(theanets.Regressor,layers=[inputs, inputs, inputs/2, inputs/3, inputs/4, output_size, output_size])

With the following pretraining:

logging.info("Pretraining")
net.train([ttrain[0:1*trains/4], toutputs[0:1*trains/4]],
          [vtrain[0:1*trains/4], voutputs[0:1*trains/4]],
          algo='layerwise',
          learning_rate=1e-3,
          save_every=25,
          batch_size=32, # this is small!
          patience = 6,
          min_improvement = 0.1,
          save_progress="current_pre_brain.pkl",
          momentum=0.9)

I get the following error after training on layer hid1 and hid2 once it tries to train on hid3 it borks at validation.

I 2015-09-08 12:26:42 downhill.base:402 patience elapsed!
I 2015-09-08 12:26:42 theanets.layers.base:303 layer Feedforward "lwout": (hid3:out)2730 ->
1025, linear, 2799275 parameters
I 2015-09-08 12:26:42 theanets.trainer:250 layerwise: training in -> hid1 -> hid2 -> hid3 ->
 lwout
I 2015-09-08 12:26:43 downhill.base:378 -- patience = 6
I 2015-09-08 12:26:43 downhill.base:379 -- validate_every = 10
I 2015-09-08 12:26:43 downhill.base:380 -- min_improvement = 0.1
I 2015-09-08 12:26:43 downhill.base:381 -- max_gradient_norm = 0
I 2015-09-08 12:26:43 downhill.base:382 -- max_gradient_elem = 0
I 2015-09-08 12:26:43 downhill.base:383 -- learning_rate = 0.001
I 2015-09-08 12:26:43 downhill.base:384 -- momentum = 0.9
I 2015-09-08 12:26:43 downhill.base:385 -- nesterov = False
I 2015-09-08 12:26:43 downhill.adaptive:220 -- rms_halflife = 14
I 2015-09-08 12:26:43 downhill.adaptive:221 -- rms_regularizer = 1e-08
I 2015-09-08 12:26:43 downhill.base:112 compiling evaluation function
I 2015-09-08 12:26:43 downhill.base:118 compiling RMSProp function
Error allocating 11193000 bytes of device memory (out of memory). Driver report 966656 bytes
 free and 4294246400 bytes total
Traceback (most recent call last):
  File "stft-theanet.py", line 62, in <module>
    momentum=0.9)
  File "build/bdist.linux-x86_64/egg/theanets/graph.py", line 400, in train
  File "build/bdist.linux-x86_64/egg/theanets/graph.py", line 376, in itertrain
  File "build/bdist.linux-x86_64/egg/theanets/trainer.py", line 253, in itertrain
  File "build/bdist.linux-x86_64/egg/theanets/trainer.py", line 66, in itertrain
  File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 388, in iterate
    self._compile()
  File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 119, in _compile
    updates = list(self._updates) + list(self._get_updates())
  File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 134, in _get_updates
    for var, expr in self._get_updates_for(param, grad):
  File "/usr/local/lib/python2.7/dist-packages/downhill/adaptive.py", line 226, in _get_upda
tes_for
    g2_tm1 = shared_like(param, 'g2_ewma')
  File "/usr/local/lib/python2.7/dist-packages/downhill/util.py", line 45, in shared_like
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/sharedvalue.py", line 208, in
shared
    allow_downcast=allow_downcast, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/var.py", line 203, in flo
at32_shared_constructor
    deviceval = type_support_filter(value, type.broadcastable, False, None)
MemoryError: ('Error allocating 11193000 bytes of device memory (out of memory).', "you migh
t consider using 'theano.shared(..., borrow=True)'")

Yet if I just do training it works fine. It does use a lot of GPU memory, it's a big network and I have a lot of training examples.

batch_size = 4096 # way bigger!
logging.info("Finetune Training")
net.train([ttrain, toutputs],
          [vtrain, voutputs],
          algo='rmsprop',
          learning_rate=1e-4,
          save_every=25,
          batch_size=batch_size,
          patience = 100,
          min_improvement = 0.001,
          save_progress="current_brain.pkl",
          momentum=0.9)

My theory is that shared variables and whatnot are not being freed appropriately. I was looking at the code and new layers are being created but I cannot tell how much sharing or copying is being done.

@lmjohns3
Copy link
Owner

lmjohns3 commented Sep 8, 2015

Yes, I wouldn't be surprised, theanets doesn't try to do any memory management at all, so it's up to Python/Theano to clean up things that have disappeared from the active set. There's probably a bunch that could be done within theanets to help with this, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants