-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ai-proxy buffers streamed responses #12680
Comments
This plugin is introduced by #12323, Hi @tysoekong, could you take a preliminary look at the code, and the non-stream output seems to be as expetect. It seems that the response content in the code is completely rebuilt. |
@agarza22 we are planning to introduce support for streaming in the next minor release of Kong Gateway (3.7). |
@subnetmarco That's great to hear! Do you have a ballpark on when we'll see that release? Does this also mean the ai-proxy plugin will get an update to support the streaming use case? |
@agarza22 in May most likely. |
@agarza22 @chobits Sorry for direct mention, but I see your interest in this feature. We have added the streaming support, which currently is in review. Code is subject to change slightly. You can package the streaming-enabled ai-proxy plugin into the existing Kong 3.6.1 image, using (for example) this builder: FROM kong:3.6.1 as builder
USER root
WORKDIR /builder
RUN apt update && \
apt install -y zip unzip git
RUN git clone -b 'feat/KAG-4126-ai-proxy-streaming' https://github.com/Kong/kong.git
#---#
FROM kong:3.6.1
USER root
COPY --from=builder --chown=1001:1001 \
/builder/kong/kong/plugins/ai-proxy \
/usr/local/share/lua/5.1/kong/plugins/ai-proxy
COPY --from=builder --chown=1001:1001 \
/builder/kong/kong/llm \
/usr/local/share/lua/5.1/kong/llm
USER kong Then you simply add Hope this helps you to start testing it out? |
Is there an existing issue for this?
Kong version (
$ kong version
)3.6
Current Behavior
When using the ai-proxy plugin, streamed responses are buffered by Kong before being returned to the client.
Expected Behavior
Streamed responses should not be buffered, instead being streamed back to the client.
Steps To Reproduce
Running Kong 3.6 with the ai-proxy plugin enabled, make a streaming API call (in my case to AzureOpenAI).
Response chunks will be buffered and then returned to client all at once.
Anything else?
No response
The text was updated successfully, but these errors were encountered: