Skip to content

Commit

Permalink
Added search_traces method in the SDK (#484)
Browse files Browse the repository at this point in the history
* Added search_traces method in the SDK

* Update following code review
  • Loading branch information
jverre authored Oct 30, 2024
1 parent 652a00f commit 103ad95
Show file tree
Hide file tree
Showing 9 changed files with 482 additions and 0 deletions.
111 changes: 111 additions & 0 deletions apps/opik-documentation/documentation/docs/tracing/exporting_traces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
sidebar_label: Exporting Traces
---

# Exporting Traces

You can export the traces you have logged to the Opik platform using:

1. Using the Opik SDK: You can use the [`Opik.search_traces`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_traces) method to export traces.
2. Using the Opik REST API: You can use the [`/traces`](/reference/rest_api/get-traces-by-project.api.mdx) endpoint to export traces.
3. Using the UI: Once you have selected the traces you want to export, you can click on the `Export CSV` button in the `Actions` dropdown.

:::tip
The recommended way to export traces is to use the [`Opik.search_traces`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_traces) method in the Opik SDK.
:::

## Using the Opik SDK

The [`Opik.search_traces`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_traces) method allows you to both export all the traces in a project or search for specific traces and export them.

### Exporting all traces

To export all traces, you will need to specify a `max_results` value that is higher than the total number of traces in your project:

```python
import opik

client = opik.Opik()

traces = client.search_traces(project_name="Default project", max_results=1000000)
```

### Search for specific traces

You can use the `filter_string` parameter to search for specific traces:

```python
import opik

client = opik.Opik()

traces = client.search_traces(project_name="Default project", filter_string='input contains "Opik"')

# Convert to Dict if required
traces = [trace.dict() for trace in traces]
```

The `filter_string` parameter should follow the format `<column> <operator> <value>` with:

1. `<column>`: The column to filter on, these can be:
- `name`
- `input`
- `output`
- `start_time`
- `end_time`
- `metadata`
- `feedback_score`
- `tags`
- `usage.total_tokens`
- `usage.prompt_tokens`
- `usage.completion_tokens`.
2. `<operator>`: The operator to use for the filter, this can be `=`, `!=`, `>`, `>=`, `<`, `<=`, `contains`, `not_contains`. Not that not all operators are supported for all columns.
3. `<value>`: The value to filter on. If you are filtering on a string, you will need to wrap it in double quotes.

Here are some additional examples of valid `filter_string` values:

```python
import opik

client = opik.Opik(
project_name="Default project"
)

traces = client.search_traces(filter_string='input contains "Opik"')
traces = client.search_traces(filter_string='start_time >= "2024-01-01T00:00:00Z"')
traces = client.search_traces(filter_string='tags contains "production"')
traces = client.search_traces(filter_string='usage.total_tokens > 1000')
traces = client.search_traces(filter_string='metadata.model = "gpt-4o"')
```

## Using the Opik REST API

To export traces using the Opik REST API, you can use the [`/traces`](/reference/rest_api/get-traces-by-project.api.mdx) endpoint. This endpoint is paginated so you will need to make multiple requests to retrieve all the traces you want.

To search for specific traces, you can use the `filter` parameter. While this is a string parameter, it does not follow the same format as the `filter_string` parameter in the Opik SDK. Instead it is a list of json objects with the following format:

```json
[
{
"field": "name",
"type": "string",
"operator": "=",
"value": "Opik"
}
]
```

:::warning
The `filter` parameter was designed to be used with the Opik UI and is therefore not very flexible. If you need more flexibility,
please raise an issue on [GitHub](https://github.com/comet-ml/opik/issues) so we can help.
:::

## Using the UI

To export traces as a CSV file from the UI, you can simply select the traces you wish to export and click on `Export CSV` in the `Actions` dropdown:

![Export CSV](/img/tracing/download_traces.png)

:::tip
The UI only allows you to export up to 100 traces at a time as it is linked to the page size of the traces table. If you need to export more traces, we recommend using the Opik SDK.
:::
1 change: 1 addition & 0 deletions apps/opik-documentation/documentation/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ const sidebars: SidebarsConfig = {
"tracing/log_distributed_traces",
"tracing/annotate_traces",
"tracing/sdk_configuration",
"tracing/exporting_traces",
{
type: "category",
label: "Integrations",
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
TracePublic
===========

.. autoclass:: opik.rest_api.types.trace_public.TracePublic
:members:
3 changes: 3 additions & 0 deletions apps/opik-documentation/python-sdk-docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ Evaluations are run using the `evaluate` function, this function takes a dataset
from opik.integrations.openai import track_openai
from typing import Dict

from typing import Dict

# Define the task to evaluate
openai_client = track_openai(openai.OpenAI())

Expand Down Expand Up @@ -175,6 +177,7 @@ You can learn more about the `opik` python SDK in the following sections:

Objects/Trace.rst
Objects/TraceData.rst
Objects/TracePublic.rst
Objects/Span.rst
Objects/SpanData.rst
Objects/FeedbackScoreDict.rst
Expand Down
38 changes: 38 additions & 0 deletions sdks/python/src/opik/api_objects/opik_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from ..types import SpanType, UsageDict, FeedbackScoreDict
from . import (
opik_query_language,
span,
trace,
dataset,
Expand Down Expand Up @@ -494,6 +495,43 @@ def flush(self, timeout: Optional[int] = None) -> None:
timeout = timeout if timeout is not None else self._flush_timeout
self._streamer.flush(timeout)

def search_traces(
self,
project_name: Optional[str] = None,
filter_string: Optional[str] = None,
max_results: int = 1000,
) -> List[trace_public.TracePublic]:
"""
Search for traces in the given project.
Args:
project_name: The name of the project to search traces in. If not provided the project name configured when the Client was created will be used.
filter_string: A filter string to narrow down the search. If not provided, all traces in the project will be returned up to the limit.
max_results: The maximum number of traces to return.
"""

page_size = 200
traces: List[trace_public.TracePublic] = []

filters = opik_query_language.OpikQueryLanguage(filter_string).parsed_filters

page = 1
while len(traces) < max_results:
page_traces = self._rest_client.traces.get_traces_by_project(
project_name=project_name or self._project_name,
filters=filters,
page=page,
size=page_size,
)

if len(page_traces.content) == 0:
break

traces.extend(page_traces.content)
page += 1

return traces[:max_results]

def get_trace_content(self, id: str) -> trace_public.TracePublic:
"""
Args:
Expand Down
Loading

0 comments on commit 103ad95

Please sign in to comment.