Skip to content

Commit

Permalink
Support garbage collection after pt2 compilation (#143364)
Browse files Browse the repository at this point in the history
Summary:
**Context:**
recently we observed ~10% Training GPU memory regression, due to the not efficient recycling of the memory at Pytorch2 compilation time. This diff is to save the memory regression caused by the PT2 compilation.
Detailed debugging notes: https://docs.google.com/document/d/1EPopAyYyXwTnkyVaUJ5Xa_Uw9iWv3zimK7FkagKsKIY/edit?tab=t.0#bookmark=id.e5b26tcdfl5g

In this diff, we support garbage collection after pt2 compilation.

**Rollout / rollback plan:**
To ensure the system reliability, we design 2 layers of control for this change's rollout:
- Add jk to control the global rollout / rollback of this functionality. The jk is on by default
- Add env var to control individual job's rollout. The env var is on by default.

X-link: pytorch/pytorch#143364
Approved by: https://github.com/ezyang

Reviewed By: ezyang

Differential Revision: D67328568

Pulled By: huydhn

fbshipit-source-id: d0c856846bef3bdd3b060df90cf5888d57245ff8
  • Loading branch information
qiurc authored and facebook-github-bot committed Dec 19, 2024
1 parent 65789d4 commit 6b0d4a3
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions userbenchmark/dynamo/dynamobench/_dynamo/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -917,6 +917,7 @@ class CompilationMetrics:
feature_usage: Optional[dict[str, bool]] = None
compile_time_autotune_time_us: Optional[int] = None
is_runtime: Optional[bool] = False
gc_time_us: Optional[int] = None


DEFAULT_COMPILATION_METRICS_LIMIT = 64
Expand Down

0 comments on commit 6b0d4a3

Please sign in to comment.