feat: add sse streaming support to chat completions endpoint #223
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add SSE Streaming Support to Chat Completions Endpoint
Summary
This pull request introduces server-sent events (SSE) support for streaming chat completions to the
/v1/chat/completions
endpoint. It includes non-streaming and streaming handling paths, enabling real-time updates in client applications and enhancing responsiveness. Additionally, it adds a newStreamer
module that manages streaming responses, including signing response data and updating token usage in the Atoma state manager.Key Changes
Modularization:
streamer
module, which manages SSE streaming responses.Non-Streaming and Streaming Response Handlers:
chat_completions_handler
to differentiate between streaming and non-streaming requests by checking thestream
flag in the payload.handle_non_streaming_response
andhandle_streaming_response
functions to manage their respective response types:keep-alive
messages.Streamer Struct for Real-Time Chat Completions:
Streamer
struct encapsulates the SSE streaming logic:State Manager Updates:
update_state_manager
function toutils
, which abstracts state management for token usage and hash updates. This ensures a consistent state update approach across both streaming and non-streaming responses.Improved Error Handling:
Streamer
and state updates has been wrapped with error handling and logging to capture any issues during streaming, signing, or state management.Documentation and Code Comments:
Additional Notes
STREAM_KEEP_ALIVE_INTERVAL_IN_SECONDS
has been added to manage the frequency ofkeep-alive
messages in streaming connections.Streamer
module and related functions have been designed to ensure that token usage and response hashes are consistently updated, providing accurate tracking of inference costs and integrity for each request.Testing