Introduce `evaluate_experiment` method #630

jverre · 2024-11-13T23:27:25Z

Details

Introduces a new evaluate_experiment function that can be used to update the scores of an existing experiment. This can be useful when iterating on the scoring metrics so you don't have to keep rerunning an evaluation.

Testing

This was tested by running the evaluate_existing_experiment.py script, extensive testing for edge cases was not performed.

Documentation

Documentation was updated (both reference and guides)

…-name-handle' into jacques/experiment-improvements

JehandadK

💯 🥇
Much needed feature!

JehandadK · 2024-11-14T06:27:23Z

sdks/python/examples/evaluate_existing_experiment.py

+
+
+evaluate_experiment(
+    experiment_name="surprised_herb_1466",


[nits] Hi @jacques-comet, I have been using the comet dashboard and it seems like the experiment names can be same even for different runs. Is the intention of this function to reevaluate scores for all matching names?

@JehandadK Yes, this function would re-evaluate all experiments with the same name. I'll do a follow up PR to support experiment IDs (and small FE update to show these ideas) so that you can re-score only one of these experiments

alexkuzmik

I left a few comments regarding the code organization, but from the functional side looks good 👍
2 points to sum up:

In general, we should try not to go with client._rest_client when it's possible. Whenever we need it somewhere, it likely means that we can add a new convenient method to our API objects: Opik, Experiment, Dataset, Prompt, ... (if it is not there already). This will help us with enriching our public API and keeping the code base cleaner. A lot of REST-related logic spread across the modules (especially like utils) will become very messy and hard to maintain very quickly.
An e2e test for this use case would be nice, either in this PR or in another one (if not in this one, let's open a ticket/issue so we don't forget).

alexkuzmik · 2024-11-14T08:22:40Z

sdks/python/src/opik/evaluation/utils.py

+    test_cases = []
+    page = 1
+
+    while True:


We should move the logic for getting experiment items to opik.api_objects.experiment.Experiment.get_items() similarly to the logic we have for dataset items.

Regarding the endpoint used, I think stream_experiment_items would be better than the paginated one designed for frontend.

@alexkuzmik I tried to do this but I couldn't get it to work, the stream endpoint doesn't return the input or output just the trace_id. I think tried to move the logic I have using the FE endpoint but that requires a dataset_id rather than dataset_name which made the whole logic a lot more complicated

So for now I recommend we keep it as is and come back later to clean up the code

sdks/python/src/opik/evaluation/utils.py

alexkuzmik · 2024-11-14T08:36:50Z

sdks/python/src/opik/evaluation/utils.py

+    return project_metadata.name
+
+
+def get_experiment_test_cases(


After we extract get_items logic, the function should become.

def experiment_items_to_test_cases(experiment_items: List[ExperimentItem]) -> List[test_case.TestCase]

The items will be retrieved in evaluator.py before this function is called.

sdks/python/src/opik/evaluation/evaluator.py

alexkuzmik · 2024-11-14T08:45:15Z

sdks/python/src/opik/evaluation/evaluator.py

+        verbose: an integer value that controls evaluation output logs such as summary and tqdm progress bar.
+    """
+    start_time = time.time()
+    # Get the experiment object


Please remove such comments from the code.
We should leave comments only if there is a non-trivial logic hard to understand without context (comments usually are for "Why?" questions, not "What?").

Sasha and others added 8 commits November 12, 2024 16:04

[OPIK-402]: handle project name and score definitions for evaluate();

38c19cb

[OPIK-402]: linter;

c6448f9

[OPIK-402]: remove the default value for log_scores project_name

c6e2f92

[OPIK-402]: run linter;

6ee8898

Introduce a evaluate_existing method

cb50b0b

Merge remote-tracking branch 'origin/sashaa/OPIK-402/evaluate-project…

390feae

…-name-handle' into jacques/experiment-improvements

Add proper support for project_name

b2564f0

Add proper support for project_name

48a4c77

jverre requested review from a team as code owners November 13, 2024 23:27

jverre changed the title ~~Jacques/experiment improvements~~ Introduce evaluate_experiment method Nov 13, 2024

JehandadK reviewed Nov 14, 2024

View reviewed changes

alexkuzmik reviewed Nov 14, 2024

View reviewed changes

jverre added 3 commits November 14, 2024 12:38

Update following code review

e5b4169

Update following code review

36c78b8

Update following code review

dfbbecc

jverre requested a review from alexkuzmik November 15, 2024 10:43

alexkuzmik approved these changes Nov 15, 2024

View reviewed changes

jverre merged commit f5cb060 into main Nov 15, 2024
25 checks passed

jverre deleted the jacques/experiment-improvements branch November 15, 2024 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `evaluate_experiment` method #630

Introduce `evaluate_experiment` method #630

jverre commented Nov 13, 2024

JehandadK left a comment

JehandadK Nov 14, 2024

jverre Nov 14, 2024

alexkuzmik left a comment

alexkuzmik Nov 14, 2024

jverre Nov 14, 2024

alexkuzmik Nov 14, 2024

alexkuzmik Nov 14, 2024 •

edited

Loading

jverre Nov 14, 2024



		evaluate_experiment(
		experiment_name="surprised_herb_1466",

Introduce evaluate_experiment method #630

Introduce evaluate_experiment method #630

Conversation

jverre commented Nov 13, 2024

Details

Testing

Documentation

JehandadK left a comment

Choose a reason for hiding this comment

JehandadK Nov 14, 2024

Choose a reason for hiding this comment

jverre Nov 14, 2024

Choose a reason for hiding this comment

alexkuzmik left a comment

Choose a reason for hiding this comment

alexkuzmik Nov 14, 2024

Choose a reason for hiding this comment

jverre Nov 14, 2024

Choose a reason for hiding this comment

alexkuzmik Nov 14, 2024

Choose a reason for hiding this comment

alexkuzmik Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

jverre Nov 14, 2024

Choose a reason for hiding this comment

Introduce `evaluate_experiment` method #630

Introduce `evaluate_experiment` method #630

alexkuzmik Nov 14, 2024 •

edited

Loading