-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[inference] Add non-stream versions of the chatComplete
and output
APIs.
#198644
Labels
Team:AI Infra
AppEx AI Infrastructure Team
Comments
Pinging @elastic/appex-ai-infra (Team:AI Infra) |
pgayvallet
added a commit
to pgayvallet/kibana
that referenced
this issue
Nov 6, 2024
## Summary Fix elastic#198644 Add a `stream` parameter to the `chatComplete` and `output` APIs, defaulting to `false`, to switch between "full content response as promise" and "event observable" responses. Note: at the moment, in non-stream mode, the implementation is simply constructing the response from the observable. It should be possible later to improve this by having the LLM adapters handle the stream/no-stream logic, but this is out of scope of the current PR. ### Normal mode ```ts const response = await chatComplete({ connectorId: 'my-connector', system: "You are a helpful assistant", messages: [ { role: MessageRole.User, content: "Some question?"}, ] }); const { content, toolCalls } = response; // do something ``` ### Stream mode ```ts const events$ = chatComplete({ stream: true, connectorId: 'my-connector', system: "You are a helpful assistant", messages: [ { role: MessageRole.User, content: "Some question?"}, ] }); events$.subscribe((event) => { // do something }); ``` --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit fe16822)
mgadewoll
pushed a commit
to mgadewoll/kibana
that referenced
this issue
Nov 7, 2024
## Summary Fix elastic#198644 Add a `stream` parameter to the `chatComplete` and `output` APIs, defaulting to `false`, to switch between "full content response as promise" and "event observable" responses. Note: at the moment, in non-stream mode, the implementation is simply constructing the response from the observable. It should be possible later to improve this by having the LLM adapters handle the stream/no-stream logic, but this is out of scope of the current PR. ### Normal mode ```ts const response = await chatComplete({ connectorId: 'my-connector', system: "You are a helpful assistant", messages: [ { role: MessageRole.User, content: "Some question?"}, ] }); const { content, toolCalls } = response; // do something ``` ### Stream mode ```ts const events$ = chatComplete({ stream: true, connectorId: 'my-connector', system: "You are a helpful assistant", messages: [ { role: MessageRole.User, content: "Some question?"}, ] }); events$.subscribe((event) => { // do something }); ``` --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
At the moment, the
chatComplete
andoutput
APIs are always returning on observable to allow supporting "llm response streaming".If that's definitely useful in some scenarios (especially assistant-related calls), in most "task execution" scenario, we only really need the final and full response from the LLM, and having that observable-based API can be bothersome, as every call needs to be wrapped in the appropriate observable chaining to retrieve the data of the last event.
We should have a way to call those APIs in "non stream" mode, to have them return a promise of the complete response instead of the observable. One possible option for that would be to add a
stream
parameter, what would switch the shape of the response.The text was updated successfully, but these errors were encountered: