Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ai-proxy): add streaming support and transformers #12792

Merged
merged 17 commits into from
Apr 12, 2024

Conversation

tysoekong
Copy link
Contributor

@tysoekong tysoekong commented Mar 27, 2024

Summary

Adds "streaming" support to AI Proxy plugin.

Streaming is a mode where a client can specify "stream": true in their request, and the LLM server will stream each piece piece of the response text (usually token-by-token) as a server-sent event.

We need to capture each (batch of) event(s) in order to translate them back into our inference format, so that all providers are compatible with the same framework that our users will create on their side.

Where "streaming=false" requests proxy directly to the LLM, and look like this:

image

the new streaming framework captures each event, and sends the chunk back to the client, like this:

image

and then it exits early. Docs will describe the limitations of this (no response transformer, etc).

It will also count/estimate tokens for LLM services that have decided to not stream back the token utilisation counts when the message has completed...

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Can this get reviewed for code standard and design, whilst I'm writing tests and docs?

Issue reference

Fix #12680

https://konghq.atlassian.net/browse/KAG-4124

@tysoekong
Copy link
Contributor Author

@RobSerafini @flrgh @locao It is done. This is (I really think...) the largest PR in AI plugins phase 2.

Is it okay for someone to do the quality/standards pass, whilst I'm writing the docs and the tests? Especially if big changes are suggested.

We can then meet in the middle.

@tysoekong tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch 3 times, most recently from c3f0936 to ee70e4b Compare March 27, 2024 12:15
@ttyS0e
Copy link
Contributor

ttyS0e commented Mar 27, 2024

I have no idea why it thinks changelog isn't done.

@tysoekong tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch 2 times, most recently from b8110f1 to 1c4139c Compare March 27, 2024 12:38
@tysoekong tysoekong requested a review from locao March 27, 2024 17:39
@locao
Copy link
Contributor

locao commented Mar 27, 2024

Hey @ttyS0e

I have no idea why it thinks changelog isn't done.

That's because you included:

jiras:
- "KAG-4126"

You can check the required format in this doc: https://github.com/Kong/gateway-changelog/blob/v1.0.0/README.md

@locao locao requested review from flrgh and zhongweiy March 27, 2024 22:07
@tysoekong tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from 1c4139c to c01b0ab Compare March 27, 2024 22:09
@tysoekong
Copy link
Contributor Author

@locao Yep I double-broke it myself, trying to figure out what it was. I fixed it now.

@tysoekong tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from c01b0ab to db197ab Compare April 1, 2024 03:02
@tysoekong tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from db197ab to f0ca2cb Compare April 1, 2024 03:09
@RobSerafini
Copy link
Contributor

@locao @kikito - can you help find reviewers for this PR?

@tysoekong
Copy link
Contributor Author

I have absolutely NO IDEA where these extra 8 commits got picked up

@tysoekong tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from e5c70eb to 19033b4 Compare April 5, 2024 17:08
kong/llm/init.lua Outdated Show resolved Hide resolved
@tysoekong
Copy link
Contributor Author

@flrgh Fixed all coments

@tysoekong tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from 624cdde to 48fff98 Compare April 11, 2024 16:53
@tysoekong
Copy link
Contributor Author

okay @flrgh NOW I think it all done

Copy link
Contributor

@flrgh flrgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more optimizations that could be made to the SSE loop in kong/llm/init.lua, but I'd rather not get into the weeds until there's good reason to.

This is looking ready to me. 👍

@flrgh flrgh merged commit cb1b163 into master Apr 12, 2024
25 checks passed
@flrgh flrgh deleted the feat/KAG-4126-ai-proxy-streaming branch April 12, 2024 02:04
@team-gateway-bot
Copy link
Collaborator

Successfully created cherry-pick PR for master:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ai-proxy buffers streamed responses
7 participants