Multinode batch inference #2

lizard-boy · 2024-10-04T00:12:30Z

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

greptile-apps

PR Summary

This PR introduces initial support for multinode batch inference using Ray in the vLLM batch inference process. The changes focus on setting up the infrastructure for distributed computing while maintaining compatibility with the existing system.

Added init_ray.py to initialize a Ray cluster for multinode batch inference
Modified Dockerfile_vllm and build_and_upload_image.sh to include init_ray.py and improve script flexibility
Updated vllm_batch.py with TODOs for multinode setup, indicating ongoing implementation
Potential conflict between JOB_COMPLETION_INDEX usage in vllm_batch.py and multinode setup noted in TODO comment

_{4 file(s) reviewed, 4 comment(s)}
_{Edit PR Review Bot Settings}

greptile-apps · 2024-10-04T00:14:03Z

model-engine/model_engine_server/inference/batch_inference/init_ray.py

+        if result.returncode == 0:
+            print(f"Worker: Ray runtime started with head address {ray_address}:{ray_port}")
+            sys.exit(0)
+        print(result.returncode)


style: Print the error message from the result object for better debugging

greptile-apps · 2024-10-04T00:14:04Z

model-engine/model_engine_server/inference/batch_inference/init_ray.py

+    is_leader = os.getenv("JOB_COMPLETION_INDEX") == "0"
+    ray_address = os.getenv("MASTER_ADDR")
+    ray_port = os.getenv("MASTER_PORT")
+    ray_cluster_size = os.getenv("NUM_INSTANCES")


logic: Ensure NUM_INSTANCES is converted to an integer

greptile-apps · 2024-10-04T00:14:40Z

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py

@@ -312,12 +312,14 @@ def tool_func(text: str, past_context: Optional[str]):


 async def batch_inference():
-    job_index = int(os.getenv("JOB_COMPLETION_INDEX", 0))
+    job_index = int(os.getenv("JOB_COMPLETION_INDEX", 0))  # TODO this conflicts with multinode


logic: This TODO suggests a conflict with multinode setup. Consider resolving this before merging.

greptile-apps · 2024-10-04T00:14:41Z

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py

+        download_model(
+            request.model_cfg.checkpoint_path, MODEL_WEIGHTS_FOLDER
+        )  # TODO move this out


style: Moving model download out of this function could improve performance for multinode setups.

seanshi-scale added 6 commits August 7, 2024 14:51

initial init_ray.py

00cc2a9

some notes

3e27a8a

use the env vars

e4ee24e

add to docker image

be4dc00

notes

b3ba531

so I can test and not push to the actual repo

17439fb

greptile-apps bot reviewed Oct 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multinode batch inference #2

Multinode batch inference #2

lizard-boy commented Oct 4, 2024

greptile-apps bot left a comment

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

Multinode batch inference #2

Are you sure you want to change the base?

Multinode batch inference #2

Conversation

lizard-boy commented Oct 4, 2024

Pull Request Summary

Test Plan and Usage Guide

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment