Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ray-project:master #2374

Merged
merged 15 commits into from
Oct 17, 2023
Merged

Conversation

pull[bot]
Copy link

@pull pull bot commented Oct 17, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

zcin and others added 15 commits October 16, 2023 17:02
Signed-off-by: Yi Cheng <[email protected]>
Nsight internal docs: https://docs.google.com/document/d/11RlNTbGLf6fat7HYARU8yWhodBD9j5uiZCdAB0geEik
Related issue: #39094

Nsight integration with Ray using runtime_env. Currently nsight can't profile the GPU usage from Ray tasks/actors since the processes that can be traced by nsight must be driver processes and it's subprocesses, whereas Ray tasks/actors are run by worker process. Thus, we added nsight native to runtime_env in order to modify the worker process to run with nsys profile which can produce the report for each worker processes once it exits.

The nsight API in the runtime_env can be specified with flags that user want to add to the nsys profile for example

@ray.remote(runtime_env={"nsight": ["-t", "cuda,nvtx", "--cudabacktrace=True"]})
def task():
    ....
Now that we drop Python 3.7 support for Ray 2.8, we can remove the `typing_extensions` dependency.
Migrate ml gpu tests to civ2. Merge train, air and example tests into one job. This reduces 2.5x of total job time.

---------

Signed-off-by: can <[email protected]>
The test sometimes got timeout is because the key generation takes a long time. This fix just pre-generate the related files and use that to save time.
This PR makes the RuntimeEnvAgent process bind on 0.0.0.0 when --node-ip-address is set, rather than trying to bind on the node IP address itself.

This behaviour is consistent with other processes such as the dashboard agent:
ray/dashboard/agent.py

Line 116 in 67593a9

 grpc_ip = "127.0.0.1" if self.ip == "127.0.0.1" else "0.0.0.0"
Now that we remove Python 3.7 support, we don't need the pickle5 backport any more :)
@pull pull bot added the ⤵️ pull label Oct 17, 2023
@pull pull bot merged commit dc944fe into miqdigital:master Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants