Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ai-proxy buffers streamed responses #12680

Closed
1 task done
agarza22 opened this issue Mar 1, 2024 · 5 comments · Fixed by #12792
Closed
1 task done

ai-proxy buffers streamed responses #12680

agarza22 opened this issue Mar 1, 2024 · 5 comments · Fixed by #12792
Assignees

Comments

@agarza22
Copy link

agarza22 commented Mar 1, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

3.6

Current Behavior

When using the ai-proxy plugin, streamed responses are buffered by Kong before being returned to the client.

Expected Behavior

Streamed responses should not be buffered, instead being streamed back to the client.

Steps To Reproduce

Running Kong 3.6 with the ai-proxy plugin enabled, make a streaming API call (in my case to AzureOpenAI).

Response chunks will be buffered and then returned to client all at once.

Anything else?

No response

@chobits
Copy link
Contributor

chobits commented Mar 4, 2024

This plugin is introduced by #12323,

Hi @tysoekong, could you take a preliminary look at the code, and the non-stream output seems to be as expetect. It seems that the response content in the code is completely rebuilt.

@subnetmarco
Copy link
Member

subnetmarco commented Mar 6, 2024

@agarza22 we are planning to introduce support for streaming in the next minor release of Kong Gateway (3.7).

@agarza22
Copy link
Author

agarza22 commented Mar 6, 2024

@subnetmarco That's great to hear! Do you have a ballpark on when we'll see that release? Does this also mean the ai-proxy plugin will get an update to support the streaming use case?

@subnetmarco
Copy link
Member

@agarza22 in May most likely.

@ttyS0e
Copy link
Contributor

ttyS0e commented Apr 3, 2024

@agarza22 @chobits Sorry for direct mention, but I see your interest in this feature.

We have added the streaming support, which currently is in review. Code is subject to change slightly.

You can package the streaming-enabled ai-proxy plugin into the existing Kong 3.6.1 image, using (for example) this builder:

FROM kong:3.6.1 as builder

USER root
WORKDIR /builder

RUN apt update && \
    apt install -y zip unzip git

RUN git clone -b 'feat/KAG-4126-ai-proxy-streaming' https://github.com/Kong/kong.git

#---#

FROM kong:3.6.1

USER root

COPY --from=builder --chown=1001:1001 \
     /builder/kong/kong/plugins/ai-proxy \
     /usr/local/share/lua/5.1/kong/plugins/ai-proxy

COPY --from=builder --chown=1001:1001 \
     /builder/kong/kong/llm \
     /usr/local/share/lua/5.1/kong/llm

USER kong

Then you simply add "stream": true to the JSON in your request, and it should print tokens (SSE events) as they are transmitted.

Hope this helps you to start testing it out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants