Skip to content

Commit

Permalink
Revise and add some conditional rendering
Browse files Browse the repository at this point in the history
Signed-off-by: Diana <[email protected]>
  • Loading branch information
cloudjumpercat committed Aug 23, 2024
1 parent 2cda4e2 commit 583112c
Show file tree
Hide file tree
Showing 3 changed files with 106 additions and 58 deletions.
33 changes: 15 additions & 18 deletions app/_hub/kong-inc/prometheus/overview/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,16 @@ license signature. Those metrics are only exported on {{site.base_gateway}}.
timers, in Running or Pending state.

{% if_version gte:3.0.x %}
### Metrics disabled by default
Following metrics are disabled by default as it may create high cardinality of metrics and may
cause performance issues:

#### Status code metrics
When `status_code_metrics` is set to true:
- **Status codes**: HTTP status codes returned by upstream services.
These are available per service, across all services, and per route per consumer.

#### Latency metrics
When `latency_metrics` is set to true:
- **Latencies Histograms**: Latency (in ms), as measured at Kong:
- **Request**: Total time taken by Kong and upstream services to serve
Expand All @@ -67,10 +70,12 @@ When `latency_metrics` is set to true:
plugins.
- **Upstream**: Time taken by the upstream service to respond to requests.

#### Bandwidth metrics
When `bandwidth_metrics` is set to true:
- **Bandwidth**: Total Bandwidth (egress/ingress) flowing through Kong.
This metric is available per service and as a sum across all services.

#### Upstream health metrics
When `upstream_health_metrics` is set to true:
- **Target Health**: The healthiness status (`healthchecks_off`, `healthy`, `unhealthy`, or `dns_error`) of targets
belonging to a given upstream as well as their subsystem (`http` or `stream`).
Expand All @@ -79,30 +84,22 @@ When `upstream_health_metrics` is set to true:
{% endif_version %}

{% if_version gte:3.8.x %}
#### AI LLM metrics
All the following AI LLM metrics are available per provider, model, cache, database name (if cached), embeddings provider (if cached), embeddings model (if cached), and workspace.

When `ai_llm_metrics` is set to true:
- **AI Requests**: AI request sent to LLM providers.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Cost:**: AI Cost charged by LLM providers.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Tokens** AI Tokens counted by LLM providers.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), token type, and workspace.
- **AI LLM Latency** Time taken to return a response by LLM providers.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Cache Fetch Latency** Time taken to return a response from the cache.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Cache Embeddings Latency** Time taken to generate embedding during the cache.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Cost**: AI Cost charged by LLM providers.
- **AI Tokens**: AI Tokens counted by LLM providers.
These are also available per token type in addition to the options listed previously.
- **AI LLM Latency**: Time taken to return a response by LLM providers.
- **AI Cache Fetch Latency**: Time taken to return a response from the cache.
- **AI Cache Embeddings Latency**: Time taken to generate embedding during the cache.

For more details, see [AI Metrics](/gateway/{{ page.release }}/production/monitoring/ai-metrics/).
{% endif_version %}


### Metrics output example
Here is an example of output you could expect from the `/metrics` endpoint:

```bash
Expand Down
94 changes: 71 additions & 23 deletions app/_src/gateway/production/logging/ai-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,44 @@ Each AI plugin returns a set of tokens.

All log entries include the following attributes:

{% if_version lte:3.7.x %}
```json
"ai": {
"payload": { "request": "[$optional_payload_request_]" },
"[$plugin_name_1]": {
"payload": { "response": "[$optional_payload_response]" },
"usage": {
"prompt_token": 28,
"total_tokens": 48,
"completion_token": 20,
"cost": 0.0038
},
"meta": {
"request_model": "command",
"provider_name": "cohere",
"response_model": "command",
"plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd"
}
},
"[$plugin_name_2]": {
"payload": { "response": "[$optional_payload_response]" },
"usage": {
"prompt_token": 89,
"total_tokens": 145,
"completion_token": 56,
"cost": 0.0012
},
"meta": {
"request_model": "gpt-35-turbo",
"provider_name": "azure",
"response_model": "gpt-35-turbo",
"plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b"
}
}
}
```
{% endif_version %}
{% if_version gte:3.8.x %}
```json
"ai": {
"payload": { "request": "[$optional_payload_request_]" },
Expand Down Expand Up @@ -50,36 +88,45 @@ All log entries include the following attributes:
}
}
```
{% endif_version %}

### Log Details

Each log entry includes the following details:

<!--vale off-->
Property | Description
---------|-------------
`ai.payload.request` | The request payload.
`ai.[$plugin_name].payload.response` |The response payload.
`ai.[$plugin_name].usage.prompt_token` | Number of tokens used for prompting.
`ai.[$plugin_name].usage.completion_token` | Number of tokens used for completion.
`ai.[$plugin_name].usage.total_tokens` | Total number of tokens used.
`ai.[$plugin_name].usage.cost` | The total cost of the request (input and output cost).
`ai.[$plugin_name].usage.time_per_token` | The average time to generate an output token in ms.
`ai.[$plugin_name].meta.request_model` | Model used for the AI request.
`ai.[$plugin_name].meta.provider_name` | Name of the AI service provider.
`ai.[$plugin_name].meta.response_model` | Model used for the AI response.
`ai.[$plugin_name].meta.plugin_id` | Unique identifier of the plugin.
`ai.[$plugin_name].meta.llm_latency` | The time in ms it took the llm provider to generate the full response.
`ai.[$plugin_name].cache.cache_status` | The cache status could be Hit, Miss. Bypass or Refresh.
`ai.[$plugin_name].cache.fetch_latency` | The time in ms it took the return a cache response.
`ai.[$plugin_name].cache.embeddings_provider` | For semantic caching, the provider used to generate the embeddings.
`ai.[$plugin_name].cache.embeddings_model` | For semantic caching, the model used to generate the embeddings.
`ai.[$plugin_name].cache.embeddings_latency` | For semantic caching, the time taken to generate the embeddings.
| Property | Description |
| --------- | ------------- |
| `ai.payload.request` | The request payload. |
| `ai.[$plugin_name].payload.response` | The response payload. |
| `ai.[$plugin_name].usage.prompt_token` | Number of tokens used for prompting. |
| `ai.[$plugin_name].usage.completion_token` | Number of tokens used for completion. |
| `ai.[$plugin_name].usage.total_tokens` | Total number of tokens used. |
| `ai.[$plugin_name].usage.cost` | The total cost of the request (input and output cost). |

{% if_version gte:3.8.x %}
| `ai.[$plugin_name].usage.time_per_token` | The average time to generate an output token, in milliseconds. |
{% endif_version %}

| `ai.[$plugin_name].meta.request_model` | Model used for the AI request. |
| `ai.[$plugin_name].meta.provider_name` | Name of the AI service provider. |
| `ai.[$plugin_name].meta.response_model` | Model used for the AI response. |
| `ai.[$plugin_name].meta.plugin_id` | Unique identifier of the plugin. |

{% if_version gte:3.8.x %}
| `ai.[$plugin_name].meta.llm_latency` | The time, in milliseconds, it took the LLM provider to generate the full response. |
| `ai.[$plugin_name].cache.cache_status` | The cache status. This can be Hit, Miss, Bypass or Refresh. |
| `ai.[$plugin_name].cache.fetch_latency` | The time, in milliseconds, it took to return a cache response. |
| `ai.[$plugin_name].cache.embeddings_provider` | For semantic caching, the provider used to generate the embeddings. |
| `ai.[$plugin_name].cache.embeddings_model` | For semantic caching, the model used to generate the embeddings. |
| `ai.[$plugin_name].cache.embeddings_latency` | For semantic caching, the time taken to generate the embeddings. |
{% endif_version %}
<!--vale on-->

### Caches Logging
{% if_version gte:3.8.x %}
### Caches logging

If using the cache semantic plugins, logging will be provided with some additional details about caching:
If you're using the [AI Semantic Cache plugin](/hub/kong-inc/), logging will include some additional details about caching:

```json
"ai": {
Expand Down Expand Up @@ -132,6 +179,7 @@ If using the cache semantic plugins, logging will be provided with some addition

{:.note}
> **Note:**
> When returning a cache response `time_per_token` and `llm_latency` will be omitted.
> The cache response can be returned either as a semantic cache or an exact cache. If returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.
> When returning a cache response, `time_per_token` and `llm_latency` are omitted.
> The cache response can be returned either as a semantic cache or an exact cache. If it's returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.
{% endif_version %}

37 changes: 20 additions & 17 deletions app/_src/gateway/production/monitoring/ai-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,24 +37,27 @@ dashboard](https://grafana.com/grafana/dashboards/21162-kong-cx-ai/):

## Available metrics

{% if_version lte: 3.7.x %}
- **AI Requests**: AI requests sent to LLM providers.
These are available per provider, model, cache, database name (if cached), and workspace.
- **AI Cost**: AI costs charged by LLM providers.
These are available per provider, model, cache, database name (if cached), and workspace.
- **AI Tokens**: AI tokens counted by LLM providers.
These are available per provider, model, cache, database name (if cached), token type, and workspace.
{% endif_version %}

{% if_version gte: 3.8.x %}
All the following AI LLM metrics are available per provider, model, cache, database name (if cached), embeddings provider (if cached), embeddings model (if cached), and workspace.

When `ai_llm_metrics` is set to true:
- **AI Requests**: AI request sent to LLM providers.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Cost:**: AI Cost charged by LLM providers.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Tokens** AI Tokens counted by LLM providers.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), token type, and workspace.
- **AI LLM Latency** Time taken to return a response by LLM providers.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Cache Fetch Latency** Time taken to return a response from the cache.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Cache Embeddings Latency** Time taken to generate embedding during the cache.
These are available per provider, model, cache, database name (if cached),
embeddings provider (if cached), embeddings model (if cached), and workspace.
- **AI Cost**: AI Cost charged by LLM providers.
- **AI Tokens**: AI Tokens counted by LLM providers.
These are also available per token type in addition to the options listed previously.
- **AI LLM Latency**: Time taken to return a response by LLM providers.
- **AI Cache Fetch Latency**: Time taken to return a response from the cache.
- **AI Cache Embeddings Latency**: Time taken to generate embedding during the cache.
{% endif_version %}

AI metrics are disabled by default as it may create high cardinality of metrics and may
cause performance issues. To enable them, set `ai_metrics` to true in the Prometheus plugin configuration.
Expand Down

0 comments on commit 583112c

Please sign in to comment.