Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add sse streaming support to chat completions endpoint #223

Merged
merged 14 commits into from
Nov 12, 2024

Conversation

jorgeantonio21
Copy link
Contributor

@jorgeantonio21 jorgeantonio21 commented Nov 7, 2024

Add SSE Streaming Support to Chat Completions Endpoint

Summary

This pull request introduces server-sent events (SSE) support for streaming chat completions to the /v1/chat/completions endpoint. It includes non-streaming and streaming handling paths, enabling real-time updates in client applications and enhancing responsiveness. Additionally, it adds a new Streamer module that manages streaming responses, including signing response data and updating token usage in the Atoma state manager.

Key Changes

  1. Modularization:

    • Added a new streamer module, which manages SSE streaming responses.
  2. Non-Streaming and Streaming Response Handlers:

    • Updated the chat_completions_handler to differentiate between streaming and non-streaming requests by checking the stream flag in the payload.
    • Implemented handle_non_streaming_response and handle_streaming_response functions to manage their respective response types:
      • Non-Streaming: Processes chat completions via a single API call, signing the response and updating token usage.
      • Streaming: Establishes an SSE connection to emit chunks in real-time, keeping the connection alive with periodic keep-alive messages.
  3. Streamer Struct for Real-Time Chat Completions:

    • The Streamer struct encapsulates the SSE streaming logic:
      • Accumulates response chunks for final processing.
      • Sends individual chunks as SSE events.
      • Signs the final chunk and updates the stack’s token usage and total hash in the state manager.
  4. State Manager Updates:

    • Added update_state_manager function to utils, which abstracts state management for token usage and hash updates. This ensures a consistent state update approach across both streaming and non-streaming responses.
  5. Improved Error Handling:

    • Each operation in Streamer and state updates has been wrapped with error handling and logging to capture any issues during streaming, signing, or state management.
  6. Documentation and Code Comments:

    • Added documentation comments for key functions, detailing parameters, return types, errors, and example structures for both non-streaming and streaming responses.

Additional Notes

  • A new constant STREAM_KEEP_ALIVE_INTERVAL_IN_SECONDS has been added to manage the frequency of keep-alive messages in streaming connections.
  • The new Streamer module and related functions have been designed to ensure that token usage and response hashes are consistently updated, providing accurate tracking of inference costs and integrity for each request.

Testing

  • Basic tests have been added for non-streaming responses, with future tests planned for SSE streaming functionality to ensure reliability in long-lived connections.

@jorgeantonio21 jorgeantonio21 requested a review from Cifko November 7, 2024 10:16
@jorgeantonio21 jorgeantonio21 self-assigned this Nov 7, 2024
@jorgeantonio21 jorgeantonio21 changed the title feat: add streaming to chat completions endpoint feat: add sse streaming support to chat completions endpoint Nov 7, 2024
@jorgeantonio21 jorgeantonio21 merged commit b9d9140 into main Nov 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants