-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add memray plugin #2875
base: master
Are you sure you want to change the base?
Add memray plugin #2875
Conversation
875779e
to
5ed181e
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2875 +/- ##
==========================================
- Coverage 45.53% 38.52% -7.02%
==========================================
Files 196 199 +3
Lines 20418 20765 +347
Branches 2647 2665 +18
==========================================
- Hits 9298 7999 -1299
- Misses 10658 12552 +1894
+ Partials 462 214 -248 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
b2c7770
to
8e67334
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wicked cool!
Can you add flytekit-memray
to https://github.com/flyteorg/flytekit/blob/master/.github/workflows/pythonbuild.yml#L319-L364 ?
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
✅ |
plugins/flytekit-memray/README.md
Outdated
image = ImageSpec( | ||
name="memray_demo", | ||
packages=["flytekitplugins_memray"], | ||
env={"PYTHONMALLOC": "malloc"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hard coding this into the environment, can we now trace_python_allocators=True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested that and its not throwing any warnings but the results look different though:
Thats the task I tested without having the env variable set:
@task(container_image=image, enable_deck=True)
@memray_profiling(trace_python_allocators=True, memray_reporter_args=["--leaks"])
def memory_leakage(n: int) -> str:
generate_data(n=n)
return "Well"
Not sure if this is expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, what do you see when you set trace_python_allocators=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird that it gives two different flamegraphs. The new flamegraph makes more sense to me because the tracker is wrapping the user code and you can clearly see the generete_data
.
I can not really see where the generate_data
is on your original flamegraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
self.trace_python_allocators = trace_python_allocators | ||
self.follow_fork = follow_fork | ||
self.memory_interval_ms = memory_interval_ms | ||
self.dir_name = "memray" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it obvious that this is a directory for memray files:
self.dir_name = "memray" | |
self.dir_name = "memray_bin" |
if not os.path.exists(self.dir_name): | ||
os.makedirs(self.dir_name) | ||
|
||
bin_filepath = f"{self.dir_name}/{self.task_function.__name__}.{time.strftime('%Y%m%d%H%M%S')}.bin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With pathlib
:
bin_filepath = f"{self.dir_name}/{self.task_function.__name__}.{time.strftime('%Y%m%d%H%M%S')}.bin" | |
bin_filepath = os.path.join(self.dir_name, f"{self.task_function.__name__}.{time.strftime('%Y%m%d%H%M%S')}.bin") |
|
||
memray_reporter_args_str = " ".join(self.memray_reporter_args) | ||
|
||
if os.system(f"memray {reporter} -o {html_filepath} {memray_reporter_args_str} {bin_filepath}") == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be completely sure we are using the memray that is installed in the current python environment:
if os.system(f"memray {reporter} -o {html_filepath} {memray_reporter_args_str} {bin_filepath}") == 0: | |
if os.system(f"{sys.executable} -m memray {reporter} -o {html_filepath} {memray_reporter_args_str} {bin_filepath}") == 0: |
It's unfortunate, that they do not document their Python API for writing reports, and only document using the CLI. So I'm okay with using the CLI from here.
@task(enable_deck=True) | ||
@memray_profiling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not actionable for this PR I wish there was a way to ensure that enable_deck=True
when using memray_profiling
. Otherwise, we just add overhead without any reports.
@eapolinario @pingsutw What do you think of making deck_fields=None
and set enable_decks=True
?
https://github.com/flyteorg/flytekit/blob/master/flytekit/core/task.py#L203-L210
Why are the changes needed?
What changes were proposed in this pull request?
How was this patch tested?
Setup process
Screenshots
Flamegraph
Table
Check all the applicable boxes
Related PRs
Docs link