Revise and add some conditional rendering

Signed-off-by: Diana <[email protected]>
Kong · Aug 23, 2024 · 583112c · 583112c
1 parent 2cda4e2
commit 583112c
Show file tree

Hide file tree

Showing 3 changed files with 106 additions and 58 deletions.
diff --git a/app/_hub/kong-inc/prometheus/overview/_index.md b/app/_hub/kong-inc/prometheus/overview/_index.md
@@ -52,13 +52,16 @@ license signature. Those metrics are only exported on {{site.base_gateway}}.
     timers, in Running or Pending state.
 
 {% if_version gte:3.0.x %}
+### Metrics disabled by default
 Following metrics are disabled by default as it may create high cardinality of metrics and may
 cause performance issues:
 
+#### Status code metrics
 When `status_code_metrics` is set to true:
 - **Status codes**: HTTP status codes returned by upstream services.
   These are available per service, across all services, and per route per consumer.
 
+#### Latency metrics
 When `latency_metrics` is set to true:
 - **Latencies Histograms**: Latency (in ms), as measured at Kong:
    - **Request**: Total time taken by Kong and upstream services to serve
@@ -67,10 +70,12 @@ When `latency_metrics` is set to true:
      plugins.
    - **Upstream**: Time taken by the upstream service to respond to requests.
 
+#### Bandwidth metrics
 When `bandwidth_metrics` is set to true:
 - **Bandwidth**: Total Bandwidth (egress/ingress) flowing through Kong.
   This metric is available per service and as a sum across all services.
 
+#### Upstream health metrics
 When `upstream_health_metrics` is set to true:
 - **Target Health**: The healthiness status (`healthchecks_off`, `healthy`, `unhealthy`, or `dns_error`) of targets
   belonging to a given upstream as well as their subsystem (`http` or `stream`).
@@ -79,30 +84,22 @@ When `upstream_health_metrics` is set to true:
 {% endif_version %}
 
 {% if_version gte:3.8.x %}
+#### AI LLM metrics
+All the following AI LLM metrics are available per provider, model, cache, database name (if cached), embeddings provider (if cached), embeddings model (if cached), and workspace.
+
 When `ai_llm_metrics` is set to true:
 - **AI Requests**: AI request sent to LLM providers.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
-- **AI Cost:**: AI Cost charged by LLM providers.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
-- **AI Tokens** AI Tokens counted by LLM providers.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), token type, and workspace.
-- **AI LLM Latency** Time taken to return a response by LLM providers.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
-- **AI Cache Fetch Latency** Time taken to return a response from the cache.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
-- **AI Cache Embeddings Latency** Time taken to generate embedding during the cache.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
+- **AI Cost**: AI Cost charged by LLM providers.
+- **AI Tokens**: AI Tokens counted by LLM providers.
+  These are also available per token type in addition to the options listed previously.
+- **AI LLM Latency**: Time taken to return a response by LLM providers.
+- **AI Cache Fetch Latency**: Time taken to return a response from the cache.
+- **AI Cache Embeddings Latency**: Time taken to generate embedding during the cache.
 
 For more details, see [AI Metrics](/gateway/{{ page.release }}/production/monitoring/ai-metrics/).
 {% endif_version %}
 
-
+### Metrics output example
 Here is an example of output you could expect from the `/metrics` endpoint:
 
 ```bash

diff --git a/app/_src/gateway/production/logging/ai-analytics.md b/app/_src/gateway/production/logging/ai-analytics.md
@@ -11,6 +11,44 @@ Each AI plugin returns a set of tokens.
 
 All log entries include the following attributes:
 
+{% if_version lte:3.7.x %}
+```json
+"ai": {
+    "payload": { "request": "[$optional_payload_request_]" },
+    "[$plugin_name_1]": {
+      "payload": { "response": "[$optional_payload_response]" },
+      "usage": {
+        "prompt_token": 28,
+        "total_tokens": 48,
+        "completion_token": 20,
+        "cost": 0.0038
+      },
+      "meta": {
+        "request_model": "command",
+        "provider_name": "cohere",
+        "response_model": "command",
+        "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd"
+      }
+    },
+    "[$plugin_name_2]": {
+      "payload": { "response": "[$optional_payload_response]" },
+      "usage": {
+        "prompt_token": 89,
+        "total_tokens": 145,
+        "completion_token": 56,
+        "cost": 0.0012
+      },
+      "meta": {
+        "request_model": "gpt-35-turbo",
+        "provider_name": "azure",
+        "response_model": "gpt-35-turbo",
+        "plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b"
+      }
+    }
+  }
+```
+{% endif_version %}
+{% if_version gte:3.8.x %}
 ```json
 "ai": {
     "payload": { "request": "[$optional_payload_request_]" },
@@ -50,36 +88,45 @@ All log entries include the following attributes:
     }
   }
 ```
+{% endif_version %}
 
 ### Log Details
 
 Each log entry includes the following details:
 
 <!--vale off-->
-Property | Description
----------|-------------
-`ai.payload.request` | The request payload.
-`ai.[$plugin_name].payload.response` |The response payload.
-`ai.[$plugin_name].usage.prompt_token` | Number of tokens used for prompting.
-`ai.[$plugin_name].usage.completion_token` | Number of tokens used for completion.
-`ai.[$plugin_name].usage.total_tokens` | Total number of tokens used.
-`ai.[$plugin_name].usage.cost` | The total cost of the request (input and output cost).
-`ai.[$plugin_name].usage.time_per_token` | The average time to generate an output token in ms.
-`ai.[$plugin_name].meta.request_model` | Model used for the AI request.
-`ai.[$plugin_name].meta.provider_name` | Name of the AI service provider.
-`ai.[$plugin_name].meta.response_model` | Model used for the AI response.
-`ai.[$plugin_name].meta.plugin_id` | Unique identifier of the plugin.
-`ai.[$plugin_name].meta.llm_latency` | The time in ms it took the llm provider to generate the full response.
-`ai.[$plugin_name].cache.cache_status` | The cache status could be Hit, Miss. Bypass or Refresh.
-`ai.[$plugin_name].cache.fetch_latency` | The time in ms it took the return a cache response.
-`ai.[$plugin_name].cache.embeddings_provider` | For semantic caching, the provider used to generate the embeddings.
-`ai.[$plugin_name].cache.embeddings_model` | For semantic caching, the model used to generate the embeddings.
-`ai.[$plugin_name].cache.embeddings_latency` | For semantic caching, the time taken to generate the embeddings.
+| Property | Description |
+| --------- | ------------- |
+| `ai.payload.request` | The request payload. |
+| `ai.[$plugin_name].payload.response` | The response payload. |
+| `ai.[$plugin_name].usage.prompt_token` | Number of tokens used for prompting. |
+| `ai.[$plugin_name].usage.completion_token` | Number of tokens used for completion. |
+| `ai.[$plugin_name].usage.total_tokens` | Total number of tokens used. |
+| `ai.[$plugin_name].usage.cost` | The total cost of the request (input and output cost). |
+
+{% if_version gte:3.8.x %}
+| `ai.[$plugin_name].usage.time_per_token` | The average time to generate an output token, in milliseconds. |
+{% endif_version %}
+
+| `ai.[$plugin_name].meta.request_model` | Model used for the AI request. |
+| `ai.[$plugin_name].meta.provider_name` | Name of the AI service provider. |
+| `ai.[$plugin_name].meta.response_model` | Model used for the AI response. |
+| `ai.[$plugin_name].meta.plugin_id` | Unique identifier of the plugin. |
+
+{% if_version gte:3.8.x %}
+| `ai.[$plugin_name].meta.llm_latency` | The time, in milliseconds, it took the LLM provider to generate the full response. |
+| `ai.[$plugin_name].cache.cache_status` | The cache status. This can be Hit, Miss, Bypass or Refresh. |
+| `ai.[$plugin_name].cache.fetch_latency` | The time, in milliseconds, it took to return a cache response. |
+| `ai.[$plugin_name].cache.embeddings_provider` | For semantic caching, the provider used to generate the embeddings. |
+| `ai.[$plugin_name].cache.embeddings_model` | For semantic caching, the model used to generate the embeddings. |
+| `ai.[$plugin_name].cache.embeddings_latency` | For semantic caching, the time taken to generate the embeddings. |
+{% endif_version %}
 <!--vale on-->
 
-### Caches Logging
+{% if_version gte:3.8.x %}
+### Caches logging
 
-If using the cache semantic plugins, logging will be provided with some additional details about caching:
+If you're using the [AI Semantic Cache plugin](/hub/kong-inc/), logging will include some additional details about caching:
 
 ```json
 "ai": {
@@ -132,6 +179,7 @@ If using the cache semantic plugins, logging will be provided with some addition
 
 {:.note}
 > **Note:** 
-> When returning a cache response `time_per_token` and `llm_latency` will be omitted.
-> The cache response can be returned either as a semantic cache or an exact cache. If returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.
+> When returning a cache response, `time_per_token` and `llm_latency` are omitted.
+> The cache response can be returned either as a semantic cache or an exact cache. If it's returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.
+{% endif_version %}
 
diff --git a/app/_src/gateway/production/monitoring/ai-metrics.md b/app/_src/gateway/production/monitoring/ai-metrics.md
@@ -37,24 +37,27 @@ dashboard](https://grafana.com/grafana/dashboards/21162-kong-cx-ai/):
 
 ## Available metrics
 
+{% if_version lte: 3.7.x %}
+- **AI Requests**: AI requests sent to LLM providers.
+  These are available per provider, model, cache, database name (if cached), and workspace.
+- **AI Cost**: AI costs charged by LLM providers.
+  These are available per provider, model, cache, database name (if cached), and workspace.
+- **AI Tokens**: AI tokens counted by LLM providers.
+  These are available per provider, model, cache, database name (if cached), token type, and workspace.
+{% endif_version %}
+
+{% if_version gte: 3.8.x %}
+All the following AI LLM metrics are available per provider, model, cache, database name (if cached), embeddings provider (if cached), embeddings model (if cached), and workspace.
+
+When `ai_llm_metrics` is set to true:
 - **AI Requests**: AI request sent to LLM providers.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
-- **AI Cost:**: AI Cost charged by LLM providers.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
-- **AI Tokens** AI Tokens counted by LLM providers.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), token type, and workspace.
-- **AI LLM Latency** Time taken to return a response by LLM providers.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
-- **AI Cache Fetch Latency** Time taken to return a response from the cache.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
-- **AI Cache Embeddings Latency** Time taken to generate embedding during the cache.
-  These are available per provider, model, cache, database name (if cached),
-  embeddings provider (if cached), embeddings model (if cached), and workspace.
+- **AI Cost**: AI Cost charged by LLM providers.
+- **AI Tokens**: AI Tokens counted by LLM providers.
+  These are also available per token type in addition to the options listed previously.
+- **AI LLM Latency**: Time taken to return a response by LLM providers.
+- **AI Cache Fetch Latency**: Time taken to return a response from the cache.
+- **AI Cache Embeddings Latency**: Time taken to generate embedding during the cache.
+{% endif_version %}
 
 AI metrics are disabled by default as it may create high cardinality of metrics and may
 cause performance issues. To enable them, set `ai_metrics` to true in the Prometheus plugin configuration.