Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memray plugin #2875

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open

Add memray plugin #2875

wants to merge 23 commits into from

Conversation

fiedlerNr9
Copy link
Contributor

@fiedlerNr9 fiedlerNr9 commented Oct 29, 2024

Why are the changes needed?

  • Enables memray profiling on Flyte task level
  • renders memray report into Flytedeck

What changes were proposed in this pull request?

  • Adding memray flytekit plugin

How was this patch tested?

  • unit tests
  • tested local & remote run

Setup process

from flytekit import workflow, task, ImageSpec
from flytekitplugins.memray import memray_profiling
import time


flytekit_hash = "82d5ac739f5f02998edb9538c58cf93c8f6e501b"
flytekitplugins_memray = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}#subdirectory=plugins/flytekit-memray"

image = ImageSpec(
    name="memray_demo",
    python_version="3.11.10",
    apt_packages=["git"],
    packages=[flytekitplugins_memray],
    registry="ghcr.io/fiedlernr9",
)


def generate_data(n: int):
    leak_list = []
    for _ in range(n):  # Arbitrary large number for demonstration
        large_data = " " * 10**6  # 1 MB string
        leak_list.append(large_data)  # Keeps appending without releasing
        time.sleep(0.1)  # Slow down the loop to observe memory changes


@task(container_image=image, enable_deck=True)
@memray_profiling(memray_html_reporter="table")
def memory_usage(n: int) -> str:
    generate_data(n=n)

    return "Well"


@task(container_image=image, enable_deck=True)
@memray_profiling(trace_python_allocators=True, memray_reporter_args=["--leaks"])
def memory_leakage(n: int) -> str:
    generate_data(n=n)

    return "Well"


@workflow
def wf(n: int = 500):
    memory_usage(n=n)
    memory_leakage(n=n)

Screenshots

Flamegraph

image

Table

image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

codecov bot commented Oct 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 38.52%. Comparing base (3fc51af) to head (ed5cdaf).
Report is 23 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2875      +/-   ##
==========================================
- Coverage   45.53%   38.52%   -7.02%     
==========================================
  Files         196      199       +3     
  Lines       20418    20765     +347     
  Branches     2647     2665      +18     
==========================================
- Hits         9298     7999    -1299     
- Misses      10658    12552    +1894     
+ Partials      462      214     -248     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Copy link
Collaborator

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiedlerNr9
Copy link
Contributor Author

This is wicked cool!

Can you add flytekit-memray to https://github.com/flyteorg/flytekit/blob/master/.github/workflows/pythonbuild.yml#L319-L364 ?

image = ImageSpec(
name="memray_demo",
packages=["flytekitplugins_memray"],
env={"PYTHONMALLOC": "malloc"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard coding this into the environment, can we now trace_python_allocators=True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested that and its not throwing any warnings but the results look different though:
Thats the task I tested without having the env variable set:

@task(container_image=image, enable_deck=True)
@memray_profiling(trace_python_allocators=True, memray_reporter_args=["--leaks"])
def memory_leakage(n: int) -> str:
    generate_data(n=n)

    return "Well"

Thats the result:
image

Not sure if this is expected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, what do you see when you set trace_python_allocators=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, which is expected i guess
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that it gives two different flamegraphs. The new flamegraph makes more sense to me because the tracker is wrapping the user code and you can clearly see the generete_data.

I can not really see where the generate_data is on your original flamegraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man, I mixed up my demos. Its looking exactly the same with trace_python_allocaters=False and having the env variable set. Sorry for the confusion, I will update the Readme shortly
image

self.trace_python_allocators = trace_python_allocators
self.follow_fork = follow_fork
self.memory_interval_ms = memory_interval_ms
self.dir_name = "memray"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it obvious that this is a directory for memray files:

Suggested change
self.dir_name = "memray"
self.dir_name = "memray_bin"

if not os.path.exists(self.dir_name):
os.makedirs(self.dir_name)

bin_filepath = f"{self.dir_name}/{self.task_function.__name__}.{time.strftime('%Y%m%d%H%M%S')}.bin"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With pathlib:

Suggested change
bin_filepath = f"{self.dir_name}/{self.task_function.__name__}.{time.strftime('%Y%m%d%H%M%S')}.bin"
bin_filepath = os.path.join(self.dir_name, f"{self.task_function.__name__}.{time.strftime('%Y%m%d%H%M%S')}.bin")


memray_reporter_args_str = " ".join(self.memray_reporter_args)

if os.system(f"memray {reporter} -o {html_filepath} {memray_reporter_args_str} {bin_filepath}") == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be completely sure we are using the memray that is installed in the current python environment:

Suggested change
if os.system(f"memray {reporter} -o {html_filepath} {memray_reporter_args_str} {bin_filepath}") == 0:
if os.system(f"{sys.executable} -m memray {reporter} -o {html_filepath} {memray_reporter_args_str} {bin_filepath}") == 0:

It's unfortunate, that they do not document their Python API for writing reports, and only document using the CLI. So I'm okay with using the CLI from here.

Comment on lines +7 to +8
@task(enable_deck=True)
@memray_profiling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not actionable for this PR I wish there was a way to ensure that enable_deck=True when using memray_profiling. Otherwise, we just add overhead without any reports.

@eapolinario @pingsutw What do you think of making deck_fields=None and set enable_decks=True?

https://github.com/flyteorg/flytekit/blob/master/flytekit/core/task.py#L203-L210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants