Skip to content

Commit

Permalink
initial draft for AI proxy advanced
Browse files Browse the repository at this point in the history
  • Loading branch information
lena-larionova committed Aug 29, 2024
1 parent 240b554 commit 5c8a45d
Show file tree
Hide file tree
Showing 8 changed files with 242 additions and 0 deletions.
6 changes: 6 additions & 0 deletions app/_hub/kong-inc/ai-proxy-advanced/_changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
## Changelog

### {{site.base_gateway}} 3.8.x

* Introduced the AI Proxy plugin, which can mediate request and response formats, as well as authentication between users.
This plugin supports <x y z> and provides <a b c> over the regular AI Proxy plugin.
20 changes: 20 additions & 0 deletions app/_hub/kong-inc/ai-proxy-advanced/_metadata/_index.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: AI Proxy Advanced
search_aliases:
- ai
- llm
- artificial
- intelligence
- language
- model
dbless_compatible: yes
free: false
enterprise: true
konnect: true
network_config_opts: All
notes: --
categories:
- ai
weight: 100
publisher: Kong Inc.
desc: Route across different LLMs and models using advanced load balancing algorithms, including semantic routing
type: plugin
5 changes: 5 additions & 0 deletions app/_hub/kong-inc/ai-proxy-advanced/how-to/_load-balancing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
nav_title: Load Balancing
title: Load Balance between LLM targets
---

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
nav_title: Semantic Routing
title: Semantic Routing
---

199 changes: 199 additions & 0 deletions app/_hub/kong-inc/ai-proxy-advanced/overview/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
---
nav_title: Overview
---

The AI Proxy Advanced plugin lets you transform and proxy requests to multiple AI providers and models at the same time.
This lets you set up load balancing between targets.

The plugin accepts requests in one of a few defined and standardised formats, translates them to the configured target format, and then transforms the response back into a standard format.

The following table describes which providers and requests the AI Proxy Advanced plugin supports:

| Provider | Chat | Completion | Streaming |
| -------- | ---- | ---------- | --------- |
| OpenAI (GPT-4, GPT-3.5) ||||
| OpenAI (GPT-4o and Multi-Modal) ||||
| Cohere ||||
| Azure ||||
| Anthropic ||| Only chat type |
| Mistral (mistral.ai, OpenAI, raw, and OLLAMA formats) ||||
| Llama2 (raw, OLLAMA, and OpenAI formats) ||||
| Llama3 (OLLAMA and OpenAI formats) ||||

## How it works

The AI Proxy Advanced plugin will mediate the following for you:

* Request and response formats appropriate for the configured `provider` and `route_type`
* The following service request coordinates (unless the model is self-hosted):
* Protocol
* Host name
* Port
* Path
* HTTP method
* Authentication on behalf of the Kong API consumer
* Decorating the request with parameters from the `config.options` block, appropriate for the chosen provider
* Recording of usage statistics of the configured LLM provider and model into your selected [Kong log](/hub/?category=logging) plugin output
* Optionally, additionally recording all post-transformation request and response messages from users, to and from the configured LLM
* Fulfillment of requests to self-hosted models, based on select supported format transformations

Flattening all of the provider formats allows you to standardize the manipulation of the data before and after transmission. It also allows your to provide a choice of LLMs to the Kong consumers, using consistent request and response formats, regardless of the backend provider or model.

This plugin currently only supports REST-based full text responses.

## Load balancing

This plugin supports following load balancing alogrithms:

Check failure on line 46 in app/_hub/kong-inc/ai-proxy-advanced/overview/_index.md

View workflow job for this annotation

GitHub Actions / Vale

[vale] reported by reviewdog 🐶 [kong.Spelling] Did you really mean 'alogrithms'? Raw Output: {"message": "[kong.Spelling] Did you really mean 'alogrithms'?", "location": {"path": "app/_hub/kong-inc/ai-proxy-advanced/overview/_index.md", "range": {"start": {"line": 46, "column": 47}}}, "severity": "ERROR"}
* lowest-usage
* round-robin (weighted)
* consistent-hashing (sticky-session on given header value)

## Semantic routing

Info about semantic routing?

## Request and response formats

The plugin's [`config.route_type`](/hub/kong-inc/ai-proxy/configuration/#config-route_type) should be set based on the target upstream endpoint and model, based on this capability matrix:

| Provider Name | Provider Upstream Path | Kong `route_type` | Example Model Name |
|---------------|----------------------------------------------------------|----------------------|------------------------|
| OpenAI | `/v1/chat/completions` | `llm/v1/chat` | gpt-4 |
| OpenAI | `/v1/completions` | `llm/v1/completions` | gpt-3.5-turbo-instruct |
| Cohere | `/v1/chat` | `llm/v1/chat` | command |
| Cohere | `/v1/generate` | `llm/v1/completions` | command |
| Azure | `/openai/deployments/{deployment_name}/chat/completions` | `llm/v1/chat` | gpt-4 |
| Azure | `/openai/deployments/{deployment_name}/completions` | `llm/v1/completions` | gpt-3.5-turbo-instruct |

{% if_version gte:3.7.x %}
| Anthropic | `/v1/messages` | `llm/v1/chat` | claude-2.1 |
{% endif_version %}
{% if_version lte:3.6.x %}
| Anthropic | `/v1/complete` | `llm/v1/chat` | claude-2.1 |
{% endif_version %}

| Anthropic | `/v1/complete` | `llm/v1/completions` | claude-2.1 |
| Llama2 | User-defined | `llm/v1/chat` | User-defined |
| Llama2 | User-defined | `llm/v1/completions` | User-defined |
| Mistral | User-defined | `llm/v1/chat` | User-defined |
| Mistral | User-defined | `llm/v1/completions` | User-defined |

The following upstream URL patterns are used:

| Provider | URL |
|-----------|--------------------------------------------------------------------------------------------------------|
| OpenAI | `https://api.openai.com:443/{route_type_path}` |
| Cohere | `https://api.cohere.com:443/{route_type_path}` |
| Azure | `https://{azure_instance}.openai.azure.com:443/openai/deployments/{deployment_name}/{route_type_path}` |
| Anthropic | `https://api.anthropic.com:443/{route_type_path}` |
| Llama2 | As defined in `config.model.options.upstream_url` |
| Mistral | As defined in `config.model.options.upstream_url` |


{:.important}
> While only the **Llama2** and **Mistral** models are classed as self-hosted, the target URL can be overridden for any of the supported providers.
> For example, a self-hosted or otherwise OpenAI-compatible endpoint can be called by setting the same [`config.model.options.upstream_url`](/hub/kong-inc/ai-proxy/configuration/#config-model-options-upstream_url) plugin option.
### Input formats

Kong will mediate the request and response format based on the selected [`config.provider`](/hub/kong-inc/ai-proxy/configuration/#config-provider) and [`config.route_type`](/hub/kong-inc/ai-proxy/configuration/#config-route_type), as outlined in the table above.

The Kong AI Proxy accepts the following inputs formats, standardized across all providers; the `config.route_type` must be configured respective to the required request and response format examples:

{% navtabs %}
{% navtab llm/v1/chat %}
```json
{
"messages": [
{
"role": "system",
"content": "You are a scientist."
},
{
"role": "user",
"content": "What is the theory of relativity?"
}
]
}
```
{% endnavtab %}

{% navtab llm/v1/completions %}
```json
{
"prompt": "You are a scientist. What is the theory of relativity?"
}
```
{% endnavtab %}
{% endnavtabs %}

### Response formats

Conversely, the response formats are also transformed to a standard format across all providers:

{% navtabs %}
{% navtab llm/v1/chat %}
```json
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The theory of relativity is a...",
"role": "assistant"
}
}
],
"created": 1707769597,
"id": "chatcmpl-ID",
"model": "gpt-4-0613",
"object": "chat.completion",
"usage": {
"completion_tokens": 5,
"prompt_tokens": 26,
"total_tokens": 31
}
}
```
{% endnavtab %}

{% navtab llm/v1/completions %}

```json
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"text": "The theory of relativity is a..."
}
],
"created": 1707769597,
"id": "cmpl-ID",
"model": "gpt-3.5-turbo-instruct",
"object": "text_completion",
"usage": {
"completion_tokens": 10,
"prompt_tokens": 7,
"total_tokens": 17
}
}
```
{% endnavtab %}
{% endnavtabs %}

The request and response formats are loosely based on OpenAI.
See the [sample OpenAPI specification](https://github.com/kong/kong/blob/master/spec/fixtures/ai-proxy/oas.yaml) for more detail on the supported formats.

## Get started with the AI Proxy plugin

* [Configuration reference](/hub/kong-inc/ai-proxy-advanced/configuration/)
* [Basic configuration example](/hub/kong-inc/ai-proxy-advanced/how-to/basic-example/)
* Learn how to use the plugin with different providers:
* [something](/hub/kong-inc/ai-proxy/how-to/llm-provider-integration-guides/)

### All AI Gateway plugins

{% include_cached /md/ai-plugins-links.md release=page.release %}

3 changes: 3 additions & 0 deletions app/_hub/kong-inc/ai-proxy-advanced/versions.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
strategy: gateway
releases:
minimum_version: '3.8.x'
4 changes: 4 additions & 0 deletions app/_src/gateway/ai-gateway/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,12 @@ a provider-agnostic API. This normalized API layer affords developers and organi
* Request routing can be dynamic, allowing AI usage to be optimized based on various metrics: cost, usage, response accuracy, and so on.
* AI services can be used by other {{site.base_gateway}} plugins to augment non-AI API traffic

{% if_version lte:3.7.x %}
This core AI Gateway feature is enabled with the [AI Proxy](/hub/kong-inc/ai-proxy/) plugin, which is
deployed by default in the getting started script referenced above.
{% elseif_version gte:3.8.x %}
This core AI Gateway feature is enabled with the [AI Proxy](/hub/kong-inc/ai-proxy/) and [AI Proxy Advanced](/hub/kong-inc/ai-proxy-advanced/) plugins. The quickstart script referenced above uses the basic AI Proxy plugin. For load balancing and semantic routing capabilities, check out the AI Proxy Advanced plugin instead.
{% endif %}

The AI Proxy supports two types of LLM requests:

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 5c8a45d

Please sign in to comment.