Add OpenAI compatible v2 completion #1

lizard-boy · 2024-10-04T00:12:25Z

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Add openai compatible APIs for legacy completions
HTTP forwarder improvements
- Fix route registration logic - dependency injection wasn't working
- Add async methods for predict forwarding - this was causing major bottlenecks
- Make post inference handler optional - makes testing locally easier

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

greptile-apps

PR Summary

This PR adds OpenAI-compatible APIs for legacy completions and improves HTTP forwarding, focusing on enhancing performance, compatibility, and ease of testing. Key changes include:

Added new completion endpoints in model-engine/model_engine_server/api/v2/completion.py for OpenAI compatibility
Implemented async methods for predict forwarding in model-engine/model_engine_server/inference/forwarding/forwarding.py to address performance bottlenecks
Updated data transfer objects in model-engine/model_engine_server/common/dtos/llms/completion.py to align with OpenAI's API structure
Modified model-engine/model_engine_server/inference/forwarding/http_forwarder.py to fix route registration and dependency injection issues
Made post-inference handler optional in model-engine/model_engine_server/inference/forwarding/celery_forwarder.py for easier local testing

_{14 file(s) reviewed, 9 comment(s)}
_{Edit PR Review Bot Settings}

greptile-apps · 2024-10-04T00:13:05Z

model-engine/model_engine_server/api/v2/chat_completion.py

                    # if ttft is None and message.startswith("data"):
-                    #     ttft = use_case_timer.lap()
+                    #     ttft = timer.lap()


style: Remove or uncomment this code block

greptile-apps · 2024-10-04T00:13:36Z

model-engine/model_engine_server/api/v2/common.py

+    # note that this is ok because request will cache the body
+    body = await request.json()


style: Awaiting request.json() could impact performance. Consider caching this operation if used multiple times.

greptile-apps · 2024-10-04T00:14:06Z

model-engine/model_engine_server/api/v2/completion.py

+                    # if ttft is None and message.startswith("data"):
+                    #     ttft = timer.lap()


style: This commented-out code seems unnecessary. Consider removing it if it's no longer needed.

greptile-apps · 2024-10-04T00:16:19Z

model-engine/model_engine_server/common/dtos/llms/completion.py

@@ -288,7 +290,7 @@ def inter_token_latency(self) -> Optional[float]:  # Only for streaming requests
        return (self.total_duration - self.time_to_first_token) / (self.num_completion_tokens - 1)


-class CompletionV2Request(CreateCompletionRequest):
+class CompletionV2Request(CreateCompletionRequest, VLLMCompletionAdditionalParams):


logic: CompletionV2Request inherits from both CreateCompletionRequest and VLLMCompletionAdditionalParams. Ensure that there are no conflicting fields or unexpected behavior due to multiple inheritance.

greptile-apps · 2024-10-04T00:16:19Z

model-engine/model_engine_server/common/dtos/llms/completion.py

+CompletionV2StreamResponse: TypeAlias = (
+    EventSourceResponse  # EventSourceResponse[CompletionV2StreamChunk]
+)


style: CompletionV2StreamResponse is defined as EventSourceResponse, but the comment suggests it should be EventSourceResponse[CompletionV2StreamChunk]. Clarify if this is intentional or if it should be properly typed.

greptile-apps · 2024-10-04T00:16:20Z

model-engine/model_engine_server/common/dtos/llms/completion.py

+# This is a version of CompletionV2Response that is used by pydantic to determine the response model
+# Since EventSourceResponse isn't a pydantic model, we need to use a Union of the two response types
+CompletionV2ResponseItem: TypeAlias = CompletionV2SyncResponse | CompletionV2StreamChunk


style: The comment explains the use of Union for CompletionV2ResponseItem. Consider adding this explanation as a docstring for better code documentation.

greptile-apps · 2024-10-04T00:16:21Z

model-engine/model_engine_server/common/dtos/llms/completion.py

+from typing import Any, Dict, List, Optional, TypeAlias

+from model_engine_server.common.dtos.llms.vllm import VLLMCompletionAdditionalParams
 from model_engine_server.common.pydantic_types import BaseModel, Field
 from model_engine_server.common.types.gen.openai import (
    CreateCompletionRequest,
    CreateCompletionResponse,
 )
+from sse_starlette import EventSourceResponse
 from typing_extensions import Annotated


style: Multiple imports from different modules. Consider organizing imports alphabetically for better readability.

greptile-apps · 2024-10-04T00:18:09Z

model-engine/model_engine_server/inference/forwarding/celery_forwarder.py

+            if forwarder.post_inference_hooks_handler:
+                forwarder.post_inference_hooks_handler.handle(request_params_pydantic, retval, task_id)  # type: ignore


logic: Consider adding a null check for forwarder.post_inference_hooks_handler before accessing its handle method to prevent potential AttributeError.

greptile-apps · 2024-10-04T00:20:03Z

model-engine/model_engine_server/inference/forwarding/http_forwarder.py

+            def get_sync_forwarder(route=route):
+                return sync_forwarders.get(route)
+
+            def get_stream_forwarder(route=route):
+                return stream_forwarders.get(route)


style: Consider using a more descriptive name for these functions, like get_sync_forwarder_for_route and get_stream_forwarder_for_route.

dmchoiboi added 10 commits September 30, 2024 22:44

Add OpenAI compatible v2 completion

7b9b7ee

Fix batch completion response

8418acf

fix test coverage

297561a

Get model_name from v2 requests correctly

0be1647

Make post inference hook optional and add async version of forward

a15718d

update predict method to be fully async

7f2471c

woops forgot api route

e40bc61

Add async for stream forwarder

9e1efd0

Update to asyncgenerator

61b4884

Move aiohttp_sse_client to common to avoid unnecessary imports

636c0f3

greptile-apps bot reviewed Oct 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenAI compatible v2 completion #1

Add OpenAI compatible v2 completion #1

lizard-boy commented Oct 4, 2024

greptile-apps bot left a comment

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

greptile-apps bot Oct 4, 2024

		# note that this is ok because request will cache the body
		body = await request.json()

		# if ttft is None and message.startswith("data"):
		# ttft = timer.lap()

		if forwarder.post_inference_hooks_handler:
		forwarder.post_inference_hooks_handler.handle(request_params_pydantic, retval, task_id) # type: ignore

Add OpenAI compatible v2 completion #1

Are you sure you want to change the base?

Add OpenAI compatible v2 completion #1

Conversation

lizard-boy commented Oct 4, 2024

Pull Request Summary

Test Plan and Usage Guide

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Oct 4, 2024

Choose a reason for hiding this comment