Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't seemingly work with latest TF versions. #29

Open
joeyearsley opened this issue Aug 28, 2018 · 19 comments
Open

Doesn't seemingly work with latest TF versions. #29

joeyearsley opened this issue Aug 28, 2018 · 19 comments

Comments

@joeyearsley
Copy link

Having successfully used this when it came out with TF 1.5 it seemingly doesn't work anymore in TF 1.9.

@yaroslavvb Do you have a working version still? Or do you have any insights as to what might have changed?

@yaroslavvb
Copy link
Collaborator

I have not tried with versions after 1.5.

What's the error? Maybe try turning off all optimizations? Something like this (probably needs adjustment for latest version)

  from tensorflow.core.protobuf import rewriter_config_pb2
  optimizer_options = tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)
  rewrite_options = rewriter_config_pb2.RewriterConfig(
    disable_model_pruning=True,
    constant_folding=rewriter_config_pb2.RewriterConfig.OFF,    
     memory_optimization=rewriter_config_pb2.RewriterConfig.OFF)
  graph_options=tf.GraphOptions(optimizer_options=optimizer_options,
                                rewrite_options=rewrite_options)
  config = tf.ConfigProto(graph_options=graph_options)

@joeyearsley
Copy link
Author

No explicit error, just doesn’t seemingly work like in 1.5 .

@varun19299
Copy link

Nope this doesn't seem to work even in 1.11; even if graph optimisations are turned off.

@stefano-marchesin
Copy link

Not woking in 1.11, it works smooth for a while and then (randomly) throws OOM exception.

@joeyearsley
Copy link
Author

As a work around I’ve been using tf.contrib.layers.recompute_grad in predefined places.

@rahulkulhalli
Copy link

rahulkulhalli commented Dec 7, 2018

Same here. A batch of 100 and above (224*224*3) throws the OOM. VGG16 can load a batch of UP TO 100 images if it's checkpointed manually.

@ekyy2
Copy link

ekyy2 commented Jan 21, 2019

I find that the tests (./run_all_tests.sh) work fine for TensorFlow 1.8, but break for versions 1.9 to 1.12 with the following error:

Traceback (most recent call last):
File "./keras_test.py", line 4, in
from tensorflow.python.keras._impl.keras import backend as K
ModuleNotFoundError: No module named 'tensorflow.python.keras._impl'

@racinmat
Copy link

racinmat commented Feb 9, 2019

Today I compiled the 1.12 version from sources and this library works well.
It disappointed me a bit, because the 'memory' option failed. I use it for CycleGAN (batch size 1), and it failed to find any bottleneck.
When using the 'speed' option, it was even slower than the original gradient calculation, by 10%.

@ekyy2 this is just because of moving packages, if you replace the import by from tensorflow.python.keras import backend as K it will work as charm and you can run the tests on 1.12 without problem.

@joeyearsley
Copy link
Author

If using TF you can manually wrap layers like I mention above you can make larger memory gains but your files per second will drop if done inefficiently - probably matters less if you need more memory.

@racinmat
Copy link

racinmat commented Feb 9, 2019

@joeyearsley did the tf.contrib.layers.recompute_grad help you with the memory management? Do you have some examples how well it performed when placed correctly to the code?

@joeyearsley
Copy link
Author

Here is an example with Densenet:
https://github.com/joeyearsley/efficient_densenet_tensorflow

Be wary when using with dropout though, you'll need to implement a new dropout layer which takes an is_recomputing kwarg which stores the random vector created in the forward pass to be used in the backward pass also. Otherwise the gradients will not be correct.

@ekyy2
Copy link

ekyy2 commented Feb 11, 2019

Today I compiled the 1.12 version from sources and this library works well.
It disappointed me a bit, because the 'memory' option failed. I use it for CycleGAN (batch size 1), and it failed to find any bottleneck.
When using the 'speed' option, it was even slower than the original gradient calculation, by 10%.

@ekyy2 this is just because of moving packages, if you replace the import by from tensorflow.python.keras import backend as K it will work as charm and you can run the tests on 1.12 without problem.

That did the trick!

@ghost
Copy link

ghost commented Jun 11, 2019

Here is an example with Densenet:
https://github.com/joeyearsley/efficient_densenet_tensorflow

Be wary when using with dropout though, you'll need to implement a new dropout layer which takes an is_recomputing kwarg which stores the random vector created in the forward pass to be used in the backward pass also. Otherwise the gradients will not be correct.

Hi Joey, how do you use tf.contrib.layers.recompute_grad in conjunction with memory_saving_gradients.gradients(...). From your code, it looks like you only used recompute_grad.

@joeyearsley
Copy link
Author

Yes I stopped using memory_saving_gradients As recompute grad works just as well but with more manual work.

@ghost
Copy link

ghost commented Jun 11, 2019

Yes I stopped using memory_saving_gradients As recompute grad works just as well but with more manual work.

Thanks for your reply Joey. I can see that you recompute the gradient. I'm just slightly lost as to where in the code you drop gradients from memory as well.

@yaroslavvb
Copy link
Collaborator

A tensor is dropped automatically as long as it no longer has downstream consumers, this is what checkpoints_disconnected is for -- because of stop_gradient it doesn't need to keep activations in memory, and then reroute_ts substitutes those disconnected tensors in place of originals to drop the activations

@ghost
Copy link

ghost commented Jun 11, 2019

dropped automatically

Thanks yaroslavvb. Unfortunately, I have not found memory_saving_gradients to work for tensorflow1.12 like others above. I'm hoping tf.contrib.layers.recompute_grad works.

@ghost
Copy link

ghost commented Jul 7, 2019

Same issue. memory_saving_gradients does not work in tf 1.14.

@golden0080
Copy link

Tested in TF 1.15, using Keras. No luck for me.
I got some errors like:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'training/Adam/gradients/header/conv_4/batch_normalization_63/cond/ReadVariableOp_2/Switch' has no attr named '_XlaCompile'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants