-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add how to for elo evaluations refactor: Move elo evaluation lo…
…gic from "elo_evaluator.py" into "incremental_evaluator.py" TASK: IL-502
- Loading branch information
1 parent
c34d1ed
commit b8376fc
Showing
9 changed files
with
257 additions
and
133 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
96 changes: 96 additions & 0 deletions
96
src/documentation/how_tos/how_to_implement_elo_evaluations.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from documentation.how_tos.example_data import DummyEloEvaluationLogic, example_data\n", | ||
"from intelligence_layer.evaluation import (\n", | ||
" IncrementalEvaluator,\n", | ||
" InMemoryEvaluationRepository,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# How to implement elo evaluations\n", | ||
"0. Run your tasks on the datasets you want to evaluate (see [here](./how_to_run_a_task_on_a_dataset.ipynb))\n", | ||
" - When evaluating multiple runs, all of them need the same data types \n", | ||
"2. Initialize all necessary repositories for the `IncrementalEvaluator`, and an `EloEvaluationLogic` that is specific to your use case. \n", | ||
"3. Run the evaluator to evaluate all examples and create a single `EvaluationOverview`\n", | ||
"4. (Optional) Save the evaluation id for later use" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Example" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Step 0\n", | ||
"\n", | ||
"\n", | ||
"my_example_data = example_data()\n", | ||
"print()\n", | ||
"run_ids = [my_example_data.run_overview_1.id, my_example_data.run_overview_2.id]\n", | ||
"\n", | ||
"# Step 1\n", | ||
"dataset_repository = my_example_data.dataset_repository\n", | ||
"run_repository = my_example_data.run_repository\n", | ||
"evaluation_repository = InMemoryEvaluationRepository()\n", | ||
"evaluation_logic = DummyEloEvaluationLogic()\n", | ||
"\n", | ||
"# Step 3\n", | ||
"evaluator = IncrementalEvaluator(\n", | ||
" dataset_repository,\n", | ||
" run_repository,\n", | ||
" evaluation_repository,\n", | ||
" \"My dummy evaluation\",\n", | ||
" evaluation_logic,\n", | ||
")\n", | ||
"\n", | ||
"evaluation_overview = evaluator.evaluate_runs(*run_ids)\n", | ||
"\n", | ||
"# Step 4\n", | ||
"print(evaluation_overview.id)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "intelligence-layer-aL2cXmJM-py3.11", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.8" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
120 changes: 0 additions & 120 deletions
120
src/intelligence_layer/evaluation/evaluation/evaluator/elo_evaluator.py
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.