Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multinode batch inference #2

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

lizard-boy
Copy link

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR introduces initial support for multinode batch inference using Ray in the vLLM batch inference process. The changes focus on setting up the infrastructure for distributed computing while maintaining compatibility with the existing system.

  • Added init_ray.py to initialize a Ray cluster for multinode batch inference
  • Modified Dockerfile_vllm and build_and_upload_image.sh to include init_ray.py and improve script flexibility
  • Updated vllm_batch.py with TODOs for multinode setup, indicating ongoing implementation
  • Potential conflict between JOB_COMPLETION_INDEX usage in vllm_batch.py and multinode setup noted in TODO comment

4 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings

if result.returncode == 0:
print(f"Worker: Ray runtime started with head address {ray_address}:{ray_port}")
sys.exit(0)
print(result.returncode)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Print the error message from the result object for better debugging

is_leader = os.getenv("JOB_COMPLETION_INDEX") == "0"
ray_address = os.getenv("MASTER_ADDR")
ray_port = os.getenv("MASTER_PORT")
ray_cluster_size = os.getenv("NUM_INSTANCES")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Ensure NUM_INSTANCES is converted to an integer

@@ -312,12 +312,14 @@ def tool_func(text: str, past_context: Optional[str]):


async def batch_inference():
job_index = int(os.getenv("JOB_COMPLETION_INDEX", 0))
job_index = int(os.getenv("JOB_COMPLETION_INDEX", 0)) # TODO this conflicts with multinode
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: This TODO suggests a conflict with multinode setup. Consider resolving this before merging.

Comment on lines +320 to +322
download_model(
request.model_cfg.checkpoint_path, MODEL_WEIGHTS_FOLDER
) # TODO move this out
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Moving model download out of this function could improve performance for multinode setups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants