Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: AI proxy advanced load balancing #7971

Merged
merged 6 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions app/_data/docs_nav_gateway_3.8.x.yml
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,10 @@ items:
- text: Expose and graph AI Metrics
url: /ai-gateway/metrics/
generate: false
- text: AI Gateway Load Balancing
url: /hub/kong-inc/ai-proxy-advanced/#load-balancing
generate: false
absolute_url: true
- text: AI Gateway plugins
url: /hub/?category=ai
generate: false
Expand Down
14 changes: 14 additions & 0 deletions app/_data/docs_nav_gateway_3.9.x.yml
Original file line number Diff line number Diff line change
Expand Up @@ -392,12 +392,26 @@ items:
url: /hub/kong-inc/ai-proxy/how-to/llm-provider-integration-guides/llama2/
generate: false
absolute_url: true
- text: AI Platform Integration Guides
items:
- text: Gemini
url: /hub/kong-inc/ai-proxy/how-to/machine-learning-platform-integration-guides/gemini/
generate: false
absolute_url: true
- text: Amazon Bedrock
url: /hub/kong-inc/ai-proxy/how-to/machine-learning-platform-integration-guides/bedrock/
generate: false
absolute_url: true
- text: AI Gateway Analytics
url: /ai-gateway/ai-analytics/
generate: false
- text: Expose and graph AI Metrics
url: /ai-gateway/metrics/
generate: false
- text: AI Gateway Load Balancing
url: /hub/kong-inc/ai-proxy-advanced/#load-balancing
generate: false
absolute_url: true
- text: AI Gateway plugins
url: /hub/?category=ai
generate: false
Expand Down
9 changes: 5 additions & 4 deletions app/_hub/kong-inc/ai-proxy-advanced/overview/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,17 @@ This plugin currently only supports REST-based full text responses.

This plugin supports several load-balancing algorithms, similar to those used for Kong upstreams, allowing efficient distribution of requests across different AI models. The supported algorithms include:
* **Lowest-usage**: The lowest-usage algorithm in AI Proxy Advanced is based on the volume of usage for each model. It balances the load by distributing requests to models with the lowest usage, measured by factors such as prompt token counts, response token counts, or other resource metrics.
* **Lowest-latency**: The lowest-latency algorithm is based on the response time for each model. It distributes requests to models with the lowest response time.
* **Semantic**: The semantic algorithm distributes requests to different models based on the similarity between the prompt in the request and the description provided in the model configuration. This allows Kong to automatically select the model that is best suited for the given domain or use case. This feature enhances the flexibility and efficiency of model selection, especially when dealing with a diverse range of AI providers and models.
* [Round-robin (weighted)](/gateway/latest/how-kong-works/load-balancing/#round-robin)
* [Consistent-hashing (sticky-session on given header value)](/gateway/latest/how-kong-works/load-balancing/#consistent-hashing)

Additionally, semantic routing works similarly to load-balancing algorithms like lowest-usage or least-connections, but instead of volume or connection metrics, it uses the similarity score between the incoming prompt and the descriptions of each model. This allows Kong to automatically choose the model best suited for handling the request, based on performance in similar contexts.

## Semantic routing
## Retry and fallback

The AI Proxy Advanced plugin supports semantic routing, which enables distribution of requests based on the similarity between the prompt and the description of each model. This allows Kong to automatically select the model that is best suited for the given domain or use case.
The load balancer has customizable retries and timeouts for requests, and can redirect a request to a different model in case of failure. This allows to have a fallback in case one of your targets is unavailable.
lmilan marked this conversation as resolved.
Show resolved Hide resolved

By analyzing the content of the request, the plugin can match it to the most appropriate model that is known to perform better in similar contexts. This feature enhances the flexibility and efficiency of model selection, especially when dealing with a diverse range of AI providers and models.
This plugin does not support fallback over targets with different formats. For example, you can have a load balancer containing targets with different OpenAI models, but you can't have one target with an OpenAI model and another with an Ollama model. However, you use can an OpenAI model alongside a Mistral model compatible with the OpenAI format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding this section, is it possible to display this info in like a table or individual bullet points?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this part, let me know if it's clearer


## Request and response formats

Expand Down
Loading