Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inference] Add non-stream versions of the chatComplete and output APIs. #198644

Closed
pgayvallet opened this issue Nov 1, 2024 · 1 comment · Fixed by #198646
Closed

[inference] Add non-stream versions of the chatComplete and output APIs. #198644

pgayvallet opened this issue Nov 1, 2024 · 1 comment · Fixed by #198646
Labels
Team:AI Infra AppEx AI Infrastructure Team

Comments

@pgayvallet
Copy link
Contributor

At the moment, the chatComplete and output APIs are always returning on observable to allow supporting "llm response streaming".

If that's definitely useful in some scenarios (especially assistant-related calls), in most "task execution" scenario, we only really need the final and full response from the LLM, and having that observable-based API can be bothersome, as every call needs to be wrapped in the appropriate observable chaining to retrieve the data of the last event.

We should have a way to call those APIs in "non stream" mode, to have them return a promise of the complete response instead of the observable. One possible option for that would be to add a stream parameter, what would switch the shape of the response.

@pgayvallet pgayvallet added llm-task-framework Team:AI Infra AppEx AI Infrastructure Team labels Nov 1, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/appex-ai-infra (Team:AI Infra)

pgayvallet added a commit to pgayvallet/kibana that referenced this issue Nov 6, 2024
## Summary

Fix elastic#198644

Add a `stream` parameter to the `chatComplete` and `output` APIs,
defaulting to `false`, to switch between "full content response as
promise" and "event observable" responses.

Note: at the moment, in non-stream mode, the implementation is simply
constructing the response from the observable. It should be possible
later to improve this by having the LLM adapters handle the
stream/no-stream logic, but this is out of scope of the current PR.

### Normal mode
```ts
const response = await chatComplete({
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

const { content, toolCalls } = response;
// do something
```

### Stream mode
```ts
const events$ = chatComplete({
  stream: true,
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

events$.subscribe((event) => {
   // do something
});

```

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit fe16822)
mgadewoll pushed a commit to mgadewoll/kibana that referenced this issue Nov 7, 2024
## Summary

Fix elastic#198644

Add a `stream` parameter to the `chatComplete` and `output` APIs,
defaulting to `false`, to switch between "full content response as
promise" and "event observable" responses.

Note: at the moment, in non-stream mode, the implementation is simply
constructing the response from the observable. It should be possible
later to improve this by having the LLM adapters handle the
stream/no-stream logic, but this is out of scope of the current PR.

### Normal mode
```ts
const response = await chatComplete({
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

const { content, toolCalls } = response;
// do something
```

### Stream mode
```ts
const events$ = chatComplete({
  stream: true,
  connectorId: 'my-connector',
  system: "You are a helpful assistant",
  messages: [
     { role: MessageRole.User, content: "Some question?"},
  ]
});

events$.subscribe((event) => {
   // do something
});

```

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:AI Infra AppEx AI Infrastructure Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants