-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some high level descriptions on the MEMORY policy will be very helpful #46
Comments
The high level goal is to find nodes which separate the graph into two parts From https://medium.com/tensorflow/fitting-larger-networks-into-memory-583e3c758ff9 A bottleneck is a graph separate which is a single node, and that line is a heuristic which tries to find it. The first part of heuristic looks at ops when walking forward and backward from the point, and checking if there's an overlap. No overlap suggests that x3 is a separator in a diagram below.
The second part, |
Thanks for the explanation. It is very clear and helpful. May I ask what's the purpose of the following code block? Thanks in advance.
|
It saves new ops added to the graph inside this block to |
Got it. Thanks. |
I am trying to understand the heuristic algorithm used in the
memory
policy. However I could not fully understand the whole logic, especially the followingif statement
as shown below.gradient-checkpointing/memory_saving_gradients.py
Line 143 in 43444e0
Some explanations or guidance will be highly appreciated.
Thanks.
The text was updated successfully, but these errors were encountered: