From 5e777af673baa3b5f4b0afac1bcd0f840e78e794 Mon Sep 17 00:00:00 2001 From: truptiparkar7 <159386855+truptiparkar7@users.noreply.github.com> Date: Tue, 6 Aug 2024 11:40:29 -0700 Subject: [PATCH 1/7] Update gen-ai.yaml --- model/gen-ai/spans.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/model/gen-ai/spans.yaml b/model/gen-ai/spans.yaml index d634d94473..723d442e79 100644 --- a/model/gen-ai/spans.yaml +++ b/model/gen-ai/spans.yaml @@ -113,3 +113,13 @@ groups: brief: The number of tokens used in the prompt sent to OpenAI. - ref: gen_ai.usage.output_tokens brief: The number of tokens used in the completions from OpenAI. + + - id: gen_ai.evaluation.user_feedback + name: gen_ai.evaluation.user_feedback + type: event + brief: > + This event describes the evaluation of GenAI response based on the user feedback. + extends: gen_ai.common.event.attributes + attributes: + - ref: gen_ai.response.id + requirement_level: required From b937f024c75bc487ab4578c35182fddaa776aac9 Mon Sep 17 00:00:00 2001 From: truptiparkar7 <159386855+truptiparkar7@users.noreply.github.com> Date: Tue, 6 Aug 2024 11:42:10 -0700 Subject: [PATCH 2/7] Create genai-evaluation-events --- docs/gen-ai/genai-evaluation-events | 78 +++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 docs/gen-ai/genai-evaluation-events diff --git a/docs/gen-ai/genai-evaluation-events b/docs/gen-ai/genai-evaluation-events new file mode 100644 index 0000000000..d3b3ce0771 --- /dev/null +++ b/docs/gen-ai/genai-evaluation-events @@ -0,0 +1,78 @@ +@@ -0,0 +1,79 @@ + + +# Semantic Conventions for GenAI evaluation events + +**Status**: [Experimental][DocumentStatus] + + + + + + + + +Each evaluation event defines a common way to report an evaluation score and the context for this specific evaluation method. + +## Naming pattern + +The evaluation events follow `gen_ai.evaluation.{evaluation method}` naming pattern. +For example, evaluations that are common across different GenAI models and framework tooling, such as user feedback should be reported as `gen_ai.evaluation.user_feedback`. + +GenAI vendor-specific evaluation events SHOULD follow `gen_ai.{gen_ai.system}.evaluation.{evaluation method}` pattern. + +## User feedback evaluation + +The user feedback evaluation event SHOULD be captured if and only if user provided a reaction to GenAI model response. +It SHOULD, when possible, be parented to the GenAI span describing such response. + + + + + + + + +The event name MUST be `gen_ai.evaluation.user_feedback`. + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.response.id`](/docs/attributes-registry/gen-ai.md) | string | The unique identifier for the completion. | `chatcmpl-123` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[1]:** The `gen_ai.system` describes a family of GenAI models with specific model identified +by `gen_ai.request.model` and `gen_ai.response.model` attributes. + +The actual GenAI product may differ from the one identified by the client. +For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` +is set to `openai` based on the instrumentation's best knowledge. + +For custom model, a custom friendly name SHOULD be used. +If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`. + + + +`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + + +The user feedback event body has the following structure: + +| Body Field | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `score` | double | Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. | `0.42` | `Required` | + From 8030345ea33a45b8d223bd308c74da386b1212f5 Mon Sep 17 00:00:00 2001 From: truptiparkar7 <159386855+truptiparkar7@users.noreply.github.com> Date: Tue, 6 Aug 2024 11:42:35 -0700 Subject: [PATCH 3/7] Rename genai-evaluation-events to genai-evaluation-events.md --- .../{genai-evaluation-events => genai-evaluation-events.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/gen-ai/{genai-evaluation-events => genai-evaluation-events.md} (100%) diff --git a/docs/gen-ai/genai-evaluation-events b/docs/gen-ai/genai-evaluation-events.md similarity index 100% rename from docs/gen-ai/genai-evaluation-events rename to docs/gen-ai/genai-evaluation-events.md From fdc5e6a22cf9be93b290ce27cb9acaf20e4b6049 Mon Sep 17 00:00:00 2001 From: truptiparkar7 <159386855+truptiparkar7@users.noreply.github.com> Date: Tue, 6 Aug 2024 11:42:59 -0700 Subject: [PATCH 4/7] Update genai-evaluation-events.md --- docs/gen-ai/genai-evaluation-events.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/gen-ai/genai-evaluation-events.md b/docs/gen-ai/genai-evaluation-events.md index d3b3ce0771..9896f2fc65 100644 --- a/docs/gen-ai/genai-evaluation-events.md +++ b/docs/gen-ai/genai-evaluation-events.md @@ -1,4 +1,4 @@ -@@ -0,0 +1,79 @@ + From a17db184d1938779e70cd481e06a9ba62d7c5043 Mon Sep 17 00:00:00 2001 From: truptiparkar7 <159386855+truptiparkar7@users.noreply.github.com> Date: Thu, 19 Sep 2024 14:10:52 -0700 Subject: [PATCH 5/7] Update genai-evaluation-events.md --- docs/gen-ai/genai-evaluation-events.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/gen-ai/genai-evaluation-events.md b/docs/gen-ai/genai-evaluation-events.md index 9896f2fc65..5c5af530f0 100644 --- a/docs/gen-ai/genai-evaluation-events.md +++ b/docs/gen-ai/genai-evaluation-events.md @@ -75,4 +75,5 @@ The user feedback event body has the following structure: | Body Field | Type | Description | Examples | Requirement Level | |---|---|---|---|---| | `score` | double | Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. | `0.42` | `Required` | +| `comment` | string | Additional details about the user feedback | `I did not like it` | `Optional` | From a538723f86efd47ca57fdd5b4da89467aa631087 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 2 Oct 2024 17:52:08 -0700 Subject: [PATCH 6/7] Move score to the attribute --- docs/attributes-registry/gen-ai.md | 15 +++++---- docs/gen-ai/genai-evaluation-events.md | 27 ++-------------- model/gen-ai/events.yaml | 43 ++++++++++++++++++++++++++ model/gen-ai/registry.yaml | 11 +++++++ model/gen-ai/spans.yaml | 36 --------------------- 5 files changed, 65 insertions(+), 67 deletions(-) create mode 100644 model/gen-ai/events.yaml diff --git a/docs/attributes-registry/gen-ai.md b/docs/attributes-registry/gen-ai.md index 0dc935e462..67f5d8cc55 100644 --- a/docs/attributes-registry/gen-ai.md +++ b/docs/attributes-registry/gen-ai.md @@ -17,8 +17,9 @@ This document defines the attributes used to describe telemetry in the context o | Attribute | Type | Description | Examples | Stability | | ---------------------------------- | -------- | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------- | ---------------------------------------------------------------- | | `gen_ai.completion` | string | The full response received from the GenAI model. [1] | `[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.operation.name` | string | The name of the operation being performed. [2] | `chat`; `text_completion` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.prompt` | string | The full prompt sent to the GenAI model. [3] | `[{'role': 'user', 'content': 'What is the capital of France?'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.evaluation.score` | double | The score calculated by the evaluator for the GenAI response. [2] | `0.42` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.operation.name` | string | The name of the operation being performed. [3] | `chat`; `text_completion` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.prompt` | string | The full prompt sent to the GenAI model. [4] | `[{'role': 'user', 'content': 'What is the capital of France?'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.request.frequency_penalty` | double | The frequency penalty setting for the GenAI request. | `0.1` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.request.max_tokens` | int | The maximum number of tokens the model generates for a request. | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.request.model` | string | The name of the GenAI model a request is being made to. | `gpt-4` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -30,18 +31,20 @@ This document defines the attributes used to describe telemetry in the context o | `gen_ai.response.finish_reasons` | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `["stop"]`; `["stop", "length"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.response.model` | string | The name of the model that generated the response. | `gpt-4-0613` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.system` | string | The Generative AI product as identified by the client or server instrumentation. [4] | `openai` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.system` | string | The Generative AI product as identified by the client or server instrumentation. [5] | `openai` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.token.type` | string | The type of token being counted. | `input`; `output` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.usage.input_tokens` | int | The number of tokens used in the GenAI input (prompt). | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.usage.output_tokens` | int | The number of tokens used in the GenAI response (completion). | `180` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | **[1]:** It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) -**[2]:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. +**[2]:** Semantic conventions describing GenAI evaluation telemetry SHOULD document the scoring system and method used to calculate the score. -**[3]:** It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) +**[3]:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. -**[4]:** The `gen_ai.system` describes a family of GenAI models with specific model identified +**[4]:** It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) + +**[5]:** The `gen_ai.system` describes a family of GenAI models with specific model identified by `gen_ai.request.model` and `gen_ai.response.model` attributes. The actual GenAI product may differ from the one identified by the client. diff --git a/docs/gen-ai/genai-evaluation-events.md b/docs/gen-ai/genai-evaluation-events.md index 5c5af530f0..327b7b3114 100644 --- a/docs/gen-ai/genai-evaluation-events.md +++ b/docs/gen-ai/genai-evaluation-events.md @@ -40,29 +40,7 @@ The event name MUST be `gen_ai.evaluation.user_feedback`. | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`gen_ai.response.id`](/docs/attributes-registry/gen-ai.md) | string | The unique identifier for the completion. | `chatcmpl-123` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | - -**[1]:** The `gen_ai.system` describes a family of GenAI models with specific model identified -by `gen_ai.request.model` and `gen_ai.response.model` attributes. - -The actual GenAI product may differ from the one identified by the client. -For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` -is set to `openai` based on the instrumentation's best knowledge. - -For custom model, a custom friendly name SHOULD be used. -If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`. - - - -`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. - -| Value | Description | Stability | -|---|---|---| -| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | - +| [`gen_ai.evaluation.score`](/docs/attributes-registry/gen-ai.md) | double | Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. | `0.42` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -74,6 +52,5 @@ The user feedback event body has the following structure: | Body Field | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `score` | double | Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. | `0.42` | `Required` | -| `comment` | string | Additional details about the user feedback | `I did not like it` | `Optional` | +| `comment` | string | Additional details about the user feedback | `"I did not like it"` | `Opt-in` | diff --git a/model/gen-ai/events.yaml b/model/gen-ai/events.yaml new file mode 100644 index 0000000000..94281587ff --- /dev/null +++ b/model/gen-ai/events.yaml @@ -0,0 +1,43 @@ +groups: + - id: gen_ai.content.prompt + name: gen_ai.content.prompt + stability: experimental + type: event + brief: > + In the lifetime of an GenAI span, events for prompts sent and completions received + may be created, depending on the configuration of the instrumentation. + attributes: + - ref: gen_ai.prompt + requirement_level: + conditionally_required: if and only if corresponding event is enabled + note: > + It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) + + - id: gen_ai.content.completion + name: gen_ai.content.completion + type: event + stability: experimental + brief: > + In the lifetime of an GenAI span, events for prompts sent and completions received + may be created, depending on the configuration of the instrumentation. + attributes: + - ref: gen_ai.completion + requirement_level: + conditionally_required: if and only if corresponding event is enabled + note: > + It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) + + - id: gen_ai.evaluation.user_feedback + name: gen_ai.evaluation.user_feedback + type: event + stability: experimental + brief: > + This event describes the evaluation of GenAI response based on the user feedback. + attributes: + - ref: gen_ai.response.id + requirement_level: required + - ref: gen_ai.evaluation.score + brief: > + Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. + note: "" + requirement_level: recommended diff --git a/model/gen-ai/registry.yaml b/model/gen-ai/registry.yaml index 5b3d1cff79..fe35a66216 100644 --- a/model/gen-ai/registry.yaml +++ b/model/gen-ai/registry.yaml @@ -1,6 +1,7 @@ groups: - id: registry.gen_ai type: attribute_group + stability: experimental display_name: GenAI Attributes brief: > This document defines the attributes used to describe telemetry in the context of Generative Artificial Intelligence (GenAI) Models requests and responses. @@ -148,8 +149,18 @@ groups: If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. + - id: gen_ai.evaluation.score + stability: experimental + type: double + brief: The score calculated by the evaluator for the GenAI response. + note: > + Semantic conventions describing GenAI evaluation telemetry SHOULD document + the scoring system and method used to calculate the score. + examples: [0.42] + - id: registry.gen_ai.openai type: attribute_group + stability: experimental display_name: OpenAI Attributes brief: > Thie group defines attributes for OpenAI. diff --git a/model/gen-ai/spans.yaml b/model/gen-ai/spans.yaml index 723d442e79..7a4382bf33 100644 --- a/model/gen-ai/spans.yaml +++ b/model/gen-ai/spans.yaml @@ -58,32 +58,6 @@ groups: - gen_ai.content.prompt - gen_ai.content.completion - - id: gen_ai.content.prompt - name: gen_ai.content.prompt - type: event - brief: > - In the lifetime of an GenAI span, events for prompts sent and completions received - may be created, depending on the configuration of the instrumentation. - attributes: - - ref: gen_ai.prompt - requirement_level: - conditionally_required: if and only if corresponding event is enabled - note: > - It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - - - id: gen_ai.content.completion - name: gen_ai.content.completion - type: event - brief: > - In the lifetime of an GenAI span, events for prompts sent and completions received - may be created, depending on the configuration of the instrumentation. - attributes: - - ref: gen_ai.completion - requirement_level: - conditionally_required: if and only if corresponding event is enabled - note: > - It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - - id: trace.gen_ai.client extends: trace.gen_ai.client.common brief: > @@ -113,13 +87,3 @@ groups: brief: The number of tokens used in the prompt sent to OpenAI. - ref: gen_ai.usage.output_tokens brief: The number of tokens used in the completions from OpenAI. - - - id: gen_ai.evaluation.user_feedback - name: gen_ai.evaluation.user_feedback - type: event - brief: > - This event describes the evaluation of GenAI response based on the user feedback. - extends: gen_ai.common.event.attributes - attributes: - - ref: gen_ai.response.id - requirement_level: required From 2da159c727e75e8b76b3daae52071e64df65fa1f Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 2 Oct 2024 17:54:21 -0700 Subject: [PATCH 7/7] up --- ...i-evaluation-events.md => gen-ai-evaluation-events.md} | 8 +------- docs/gen-ai/gen-ai-spans.md | 2 +- 2 files changed, 2 insertions(+), 8 deletions(-) rename docs/gen-ai/{genai-evaluation-events.md => gen-ai-evaluation-events.md} (96%) diff --git a/docs/gen-ai/genai-evaluation-events.md b/docs/gen-ai/gen-ai-evaluation-events.md similarity index 96% rename from docs/gen-ai/genai-evaluation-events.md rename to docs/gen-ai/gen-ai-evaluation-events.md index 327b7b3114..e2655d7fc1 100644 --- a/docs/gen-ai/genai-evaluation-events.md +++ b/docs/gen-ai/gen-ai-evaluation-events.md @@ -7,13 +7,6 @@ linkTitle: Generative AI evaluation events **Status**: [Experimental][DocumentStatus] - - - - - - - Each evaluation event defines a common way to report an evaluation score and the context for this specific evaluation method. ## Naming pattern @@ -54,3 +47,4 @@ The user feedback event body has the following structure: |---|---|---|---|---| | `comment` | string | Additional details about the user feedback | `"I did not like it"` | `Opt-in` | +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status \ No newline at end of file diff --git a/docs/gen-ai/gen-ai-spans.md b/docs/gen-ai/gen-ai-spans.md index 0a3eec44b4..59d7fbc524 100644 --- a/docs/gen-ai/gen-ai-spans.md +++ b/docs/gen-ai/gen-ai-spans.md @@ -175,4 +175,4 @@ The event name MUST be `gen_ai.content.completion`. -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status \ No newline at end of file