Doesn't seemingly work with latest TF versions. #29

joeyearsley · 2018-08-28T17:20:02Z

Having successfully used this when it came out with TF 1.5 it seemingly doesn't work anymore in TF 1.9.

@yaroslavvb Do you have a working version still? Or do you have any insights as to what might have changed?

yaroslavvb · 2018-09-04T14:23:27Z

I have not tried with versions after 1.5.

What's the error? Maybe try turning off all optimizations? Something like this (probably needs adjustment for latest version)

  from tensorflow.core.protobuf import rewriter_config_pb2
  optimizer_options = tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)
  rewrite_options = rewriter_config_pb2.RewriterConfig(
    disable_model_pruning=True,
    constant_folding=rewriter_config_pb2.RewriterConfig.OFF,    
     memory_optimization=rewriter_config_pb2.RewriterConfig.OFF)
  graph_options=tf.GraphOptions(optimizer_options=optimizer_options,
                                rewrite_options=rewrite_options)
  config = tf.ConfigProto(graph_options=graph_options)

joeyearsley · 2018-09-04T21:39:32Z

No explicit error, just doesn’t seemingly work like in 1.5 .

varun19299 · 2018-10-25T18:22:22Z

Nope this doesn't seem to work even in 1.11; even if graph optimisations are turned off.

stefano-marchesin · 2018-11-19T16:01:40Z

Not woking in 1.11, it works smooth for a while and then (randomly) throws OOM exception.

joeyearsley · 2018-11-19T19:50:26Z

As a work around I’ve been using tf.contrib.layers.recompute_grad in predefined places.

rahulkulhalli · 2018-12-07T11:26:30Z

Same here. A batch of 100 and above (224*224*3) throws the OOM. VGG16 can load a batch of UP TO 100 images if it's checkpointed manually.

ekyy2 · 2019-01-21T05:58:06Z

I find that the tests (./run_all_tests.sh) work fine for TensorFlow 1.8, but break for versions 1.9 to 1.12 with the following error:

Traceback (most recent call last):
File "./keras_test.py", line 4, in
from tensorflow.python.keras._impl.keras import backend as K
ModuleNotFoundError: No module named 'tensorflow.python.keras._impl'

racinmat · 2019-02-09T22:21:17Z

Today I compiled the 1.12 version from sources and this library works well.
It disappointed me a bit, because the 'memory' option failed. I use it for CycleGAN (batch size 1), and it failed to find any bottleneck.
When using the 'speed' option, it was even slower than the original gradient calculation, by 10%.

@ekyy2 this is just because of moving packages, if you replace the import by from tensorflow.python.keras import backend as K it will work as charm and you can run the tests on 1.12 without problem.

joeyearsley · 2019-02-09T22:47:31Z

If using TF you can manually wrap layers like I mention above you can make larger memory gains but your files per second will drop if done inefficiently - probably matters less if you need more memory.

racinmat · 2019-02-09T23:20:01Z

@joeyearsley did the tf.contrib.layers.recompute_grad help you with the memory management? Do you have some examples how well it performed when placed correctly to the code?

joeyearsley · 2019-02-10T00:18:02Z

Here is an example with Densenet:
https://github.com/joeyearsley/efficient_densenet_tensorflow

Be wary when using with dropout though, you'll need to implement a new dropout layer which takes an is_recomputing kwarg which stores the random vector created in the forward pass to be used in the backward pass also. Otherwise the gradients will not be correct.

ekyy2 · 2019-02-11T05:08:15Z

Today I compiled the 1.12 version from sources and this library works well.
It disappointed me a bit, because the 'memory' option failed. I use it for CycleGAN (batch size 1), and it failed to find any bottleneck.
When using the 'speed' option, it was even slower than the original gradient calculation, by 10%.

@ekyy2 this is just because of moving packages, if you replace the import by from tensorflow.python.keras import backend as K it will work as charm and you can run the tests on 1.12 without problem.

That did the trick!

ghost · 2019-06-11T10:45:27Z

Here is an example with Densenet:
https://github.com/joeyearsley/efficient_densenet_tensorflow

Be wary when using with dropout though, you'll need to implement a new dropout layer which takes an is_recomputing kwarg which stores the random vector created in the forward pass to be used in the backward pass also. Otherwise the gradients will not be correct.

Hi Joey, how do you use tf.contrib.layers.recompute_grad in conjunction with memory_saving_gradients.gradients(...). From your code, it looks like you only used recompute_grad.

joeyearsley · 2019-06-11T10:54:19Z

Yes I stopped using memory_saving_gradients As recompute grad works just as well but with more manual work.

ghost · 2019-06-11T11:11:25Z

Yes I stopped using memory_saving_gradients As recompute grad works just as well but with more manual work.

Thanks for your reply Joey. I can see that you recompute the gradient. I'm just slightly lost as to where in the code you drop gradients from memory as well.

yaroslavvb · 2019-06-11T12:57:01Z

A tensor is dropped automatically as long as it no longer has downstream consumers, this is what checkpoints_disconnected is for -- because of stop_gradient it doesn't need to keep activations in memory, and then reroute_ts substitutes those disconnected tensors in place of originals to drop the activations

ghost · 2019-06-11T13:14:04Z

dropped automatically

Thanks yaroslavvb. Unfortunately, I have not found memory_saving_gradients to work for tensorflow1.12 like others above. I'm hoping tf.contrib.layers.recompute_grad works.

ghost · 2019-07-07T21:00:04Z

Same issue. memory_saving_gradients does not work in tf 1.14.

golden0080 · 2020-10-06T06:21:47Z

Tested in TF 1.15, using Keras. No luck for me.
I got some errors like:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'training/Adam/gradients/header/conv_4/batch_normalization_63/cond/ReadVariableOp_2/Switch' has no attr named '_XlaCompile'.

achyudh mentioned this issue Dec 15, 2018

Add gradient checkpointing achyudh/tardis#19

Open

ghost mentioned this issue Jul 15, 2019

recompute_grad Does Not Work joeyearsley/efficient_densenet_tensorflow#5

Open

kartik4949 mentioned this issue Aug 28, 2020

Gradient checkpointing google/automl#711

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't seemingly work with latest TF versions. #29

Doesn't seemingly work with latest TF versions. #29

joeyearsley commented Aug 28, 2018

yaroslavvb commented Sep 4, 2018

joeyearsley commented Sep 4, 2018

varun19299 commented Oct 25, 2018

stefano-marchesin commented Nov 19, 2018

joeyearsley commented Nov 19, 2018

rahulkulhalli commented Dec 7, 2018 •

edited

Loading

ekyy2 commented Jan 21, 2019

racinmat commented Feb 9, 2019 •

edited

Loading

joeyearsley commented Feb 9, 2019

racinmat commented Feb 9, 2019

joeyearsley commented Feb 10, 2019

ekyy2 commented Feb 11, 2019 •

edited

Loading

ghost commented Jun 11, 2019

joeyearsley commented Jun 11, 2019

ghost commented Jun 11, 2019

yaroslavvb commented Jun 11, 2019

ghost commented Jun 11, 2019

ghost commented Jul 7, 2019

golden0080 commented Oct 6, 2020

Doesn't seemingly work with latest TF versions. #29

Doesn't seemingly work with latest TF versions. #29

Comments

joeyearsley commented Aug 28, 2018

yaroslavvb commented Sep 4, 2018

joeyearsley commented Sep 4, 2018

varun19299 commented Oct 25, 2018

stefano-marchesin commented Nov 19, 2018

joeyearsley commented Nov 19, 2018

rahulkulhalli commented Dec 7, 2018 • edited Loading

ekyy2 commented Jan 21, 2019

racinmat commented Feb 9, 2019 • edited Loading

joeyearsley commented Feb 9, 2019

racinmat commented Feb 9, 2019

joeyearsley commented Feb 10, 2019

ekyy2 commented Feb 11, 2019 • edited Loading

ghost commented Jun 11, 2019

joeyearsley commented Jun 11, 2019

ghost commented Jun 11, 2019

yaroslavvb commented Jun 11, 2019

ghost commented Jun 11, 2019

ghost commented Jul 7, 2019

golden0080 commented Oct 6, 2020

rahulkulhalli commented Dec 7, 2018 •

edited

Loading

racinmat commented Feb 9, 2019 •

edited

Loading

ekyy2 commented Feb 11, 2019 •

edited

Loading