feat(ai-proxy): add streaming support and transformers #12792

tysoekong · 2024-03-27T10:37:07Z

Summary

Adds "streaming" support to AI Proxy plugin.

Streaming is a mode where a client can specify "stream": true in their request, and the LLM server will stream each piece piece of the response text (usually token-by-token) as a server-sent event.

We need to capture each (batch of) event(s) in order to translate them back into our inference format, so that all providers are compatible with the same framework that our users will create on their side.

Where "streaming=false" requests proxy directly to the LLM, and look like this:

the new streaming framework captures each event, and sends the chunk back to the client, like this:

and then it exits early. Docs will describe the limitations of this (no response transformer, etc).

It will also count/estimate tokens for LLM services that have decided to not stream back the token utilisation counts when the message has completed...

Checklist

The Pull Request has tests
A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Can this get reviewed for code standard and design, whilst I'm writing tests and docs?

Issue reference

Fix #12680

https://konghq.atlassian.net/browse/KAG-4124

tysoekong · 2024-03-27T10:38:54Z

@RobSerafini @flrgh @locao It is done. This is (I really think...) the largest PR in AI plugins phase 2.

Is it okay for someone to do the quality/standards pass, whilst I'm writing the docs and the tests? Especially if big changes are suggested.

We can then meet in the middle.

ttyS0e · 2024-03-27T12:22:04Z

I have no idea why it thinks changelog isn't done.

locao · 2024-03-27T21:58:34Z

Hey @ttyS0e

I have no idea why it thinks changelog isn't done.

That's because you included:

jiras:
- "KAG-4126"

You can check the required format in this doc: https://github.com/Kong/gateway-changelog/blob/v1.0.0/README.md

tysoekong · 2024-03-27T22:15:38Z

@locao Yep I double-broke it myself, trying to figure out what it was. I fixed it now.

RobSerafini · 2024-04-03T14:00:28Z

@locao @kikito - can you help find reviewers for this PR?

spec/03-plugins/38-ai-proxy/01-unit_spec.lua

spec/fixtures/ai-proxy/openai/llm-v1-chat/requests/good.json

spec/fixtures/ai-proxy/unit/real-stream-frames/openai/llm-v1-chat.txt

spec/fixtures/ai-proxy/unit/real-stream-frames/openai/llm-v1-completions.txt

kong/llm/drivers/shared.lua

kong/llm/init.lua

tysoekong · 2024-04-05T15:57:19Z

I have absolutely NO IDEA where these extra 8 commits got picked up

kong/llm/init.lua

tysoekong · 2024-04-11T12:44:12Z

@flrgh Fixed all coments

Co-authored-by: Michael Martin <[email protected]>

tysoekong · 2024-04-11T16:56:57Z

okay @flrgh NOW I think it all done

flrgh

There are more optimizations that could be made to the SSE loop in kong/llm/init.lua, but I'd rather not get into the weeds until there's good reason to.

This is looking ready to me. 👍

team-gateway-bot · 2024-04-12T02:04:46Z

Successfully created cherry-pick PR for master:

https://github.com/kong/kong-ee/pull/8791

pull-request-size bot added the size/XL label Mar 27, 2024

github-actions bot added cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee plugins/ai-proxy plugins/ai-request-transformer plugins/ai-response-transformer labels Mar 27, 2024

github-actions bot assigned tysoekong Mar 27, 2024

tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch 3 times, most recently from c3f0936 to ee70e4b Compare March 27, 2024 12:15

tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch 2 times, most recently from b8110f1 to 1c4139c Compare March 27, 2024 12:38

tysoekong requested a review from locao March 27, 2024 17:39

locao requested review from flrgh and zhongweiy March 27, 2024 22:07

tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from 1c4139c to c01b0ab Compare March 27, 2024 22:09

tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from c01b0ab to db197ab Compare April 1, 2024 03:02

ttyS0e mentioned this pull request Apr 1, 2024

refactor(plugins/ai-proxy): simplify code with early return #12804

Merged

3 tasks

tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from db197ab to f0ca2cb Compare April 1, 2024 03:09

zhongweiy reviewed Apr 5, 2024

View reviewed changes

pull-request-size bot added size/XXL and removed size/XL labels Apr 5, 2024

tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from e5c70eb to 19033b4 Compare April 5, 2024 17:08

pull-request-size bot added size/XL and removed size/XXL labels Apr 5, 2024

flrgh reviewed Apr 10, 2024

View reviewed changes

kong/llm/init.lua Outdated Show resolved Hide resolved

tysoekong and others added 15 commits April 11, 2024 17:53

feat(ai-proxy): add streaming support and transformers

8be7494

feat(ai-proxy): streaming unit tests; hop-by-hop headers

a93f958

fix cohere empty comments

6170553

fix(syntax): shared text extractor for ai token

ec51517

fix(ai-proxy): integration tests for streaming

2f24d2d

fix(ai-proxy): integration tests for streaming

71f2df8

Update 09-streaming_integration_spec.lua

2703f45

Update kong/llm/init.lua

4639c88

Co-authored-by: Michael Martin <[email protected]>

discussion_r1560031734

31f585a

discussion_r1560047662

e21901d

discussion_r1560109626

75f448a

discussion_r1560117584

ec1f072

discussion_r1560120287

4949986

discussion_r1560121506

4d62e2c

discussion_r1560123437

48fff98

tysoekong force-pushed the feat/KAG-4126-ai-proxy-streaming branch from 624cdde to 48fff98 Compare April 11, 2024 16:53

tysoekong added 2 commits April 11, 2024 17:55

discussion_r1561272376

c31d7bd

discussion_r1561272376

ba247aa

flrgh approved these changes Apr 11, 2024

View reviewed changes

zhongweiy approved these changes Apr 11, 2024

View reviewed changes

flrgh merged commit cb1b163 into master Apr 12, 2024
25 checks passed

flrgh deleted the feat/KAG-4126-ai-proxy-streaming branch April 12, 2024 02:04

tysoekong added a commit that referenced this pull request Apr 14, 2024

fix(rebase): solve conflicts with #12792

142eb5f

tysoekong mentioned this pull request Apr 22, 2024

refactor(ai-plugins): streamline AI Proxy streaming system & add compatibilty for existing "client SDKs" #12903

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-proxy): add streaming support and transformers #12792

feat(ai-proxy): add streaming support and transformers #12792

tysoekong commented Mar 27, 2024 •

edited

Loading

tysoekong commented Mar 27, 2024

ttyS0e commented Mar 27, 2024 •

edited

Loading

locao commented Mar 27, 2024

tysoekong commented Mar 27, 2024

RobSerafini commented Apr 3, 2024

tysoekong commented Apr 5, 2024

tysoekong commented Apr 11, 2024

tysoekong commented Apr 11, 2024

flrgh left a comment

team-gateway-bot commented Apr 12, 2024

feat(ai-proxy): add streaming support and transformers #12792

feat(ai-proxy): add streaming support and transformers #12792

Conversation

tysoekong commented Mar 27, 2024 • edited Loading

Summary

Checklist

Issue reference

tysoekong commented Mar 27, 2024

ttyS0e commented Mar 27, 2024 • edited Loading

locao commented Mar 27, 2024

tysoekong commented Mar 27, 2024

RobSerafini commented Apr 3, 2024

tysoekong commented Apr 5, 2024

tysoekong commented Apr 11, 2024

tysoekong commented Apr 11, 2024

flrgh left a comment

Choose a reason for hiding this comment

team-gateway-bot commented Apr 12, 2024

tysoekong commented Mar 27, 2024 •

edited

Loading

ttyS0e commented Mar 27, 2024 •

edited

Loading