-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Rename how-to implement incremental evaluation and make it more…
… concise TASK: IL-313
- Loading branch information
1 parent
85ec00a
commit 8274c36
Showing
4 changed files
with
155 additions
and
207 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
205 changes: 0 additions & 205 deletions
205
src/documentation/how_tos/how_to_implement_complete_incremental_evaluation_flow.ipynb
This file was deleted.
Oops, something went wrong.
153 changes: 153 additions & 0 deletions
153
src/documentation/how_tos/how_to_implement_incremental_evaluation.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from documentation.how_tos.example_data import (\n", | ||
" DummyAggregationLogic,\n", | ||
" DummyEvaluation,\n", | ||
" DummyExample,\n", | ||
" example_data,\n", | ||
")\n", | ||
"from intelligence_layer.evaluation import (\n", | ||
" Aggregator,\n", | ||
" Example,\n", | ||
" IncrementalEvaluationLogic,\n", | ||
" IncrementalEvaluator,\n", | ||
" InMemoryAggregationRepository,\n", | ||
" InMemoryEvaluationRepository,\n", | ||
" SuccessfulExampleOutput,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# How to implement incremental evaluation\n", | ||
"This notebook outlines how to perform evaluations in an incremental fashion, i.e., adding additional runs to your existing evaluations without the need for recalculation.\n", | ||
" \n", | ||
"## Step-by-Step Guide\n", | ||
"0. Run your tasks on the datasets on which you want to evaluate them (see [here](./how_to_run_a_task_on_a_dataset.ipynb))\n", | ||
" - When evaluating multiple runs, all of them need the same data types \n", | ||
"1. Initialize all necessary repositories and define your `IncrementalEvaluationLogic`; It is similar to a normal `EvaluationLogic` (see [here](./how_to_implement_a_simple_evaluation_and_aggregation_logic.ipynb)) but you additionally have to implement your own `do_incremental_evaluate` method\n", | ||
"2. Initialize an `IncrementalEvaluator` with the repositories and your custom `IncrementalEvaluationLogic`\n", | ||
"3. Call the `evaluate_runs` method of the `IncrementalEvaluator`\n", | ||
"4. Aggregate your evaluations using the [standard aggregation](./how_to_aggregate_evaluations.ipynb) or using a [custom aggregation logic](./how_to_implement_a_simple_evaluation_and_aggregation_logic.ipynb)\n", | ||
"\n", | ||
"#### Steps for addition of new runs \n", | ||
"5. Call the `evaluate_additional_runs` method of the `IncrementalEvaluator`:\n", | ||
" - `run_ids`: Runs to be included in the evaluation results, including those that have been evaluated before\n", | ||
" - `previous_evaluation_ids`: Runs **not** to be re-evaluated, depending on the specific implementation of the `do_incremental_evaluate` method\n", | ||
"6. Aggregate all your `EvaluationOverview`s" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Step 0\n", | ||
"examples = [\n", | ||
" DummyExample(input=\"input1\", expected_output=\"expected_output1\", data=\"data1\")\n", | ||
"]\n", | ||
"my_example_data = example_data()\n", | ||
"\n", | ||
"dataset_repository = my_example_data.dataset_repository\n", | ||
"run_repository = my_example_data.run_repository\n", | ||
"\n", | ||
"# Step 1\n", | ||
"evaluation_repository = InMemoryEvaluationRepository()\n", | ||
"aggregation_repository = InMemoryAggregationRepository()\n", | ||
"\n", | ||
"\n", | ||
"class DummyIncrementalEvaluationLogic(\n", | ||
" IncrementalEvaluationLogic[str, str, str, DummyEvaluation]\n", | ||
"):\n", | ||
" def do_incremental_evaluate(\n", | ||
" self,\n", | ||
" example: Example[str, str],\n", | ||
" outputs: list[SuccessfulExampleOutput[str]],\n", | ||
" already_evaluated_outputs: list[list[SuccessfulExampleOutput[str]]],\n", | ||
" ) -> DummyEvaluation:\n", | ||
" return DummyEvaluation(eval=\"DummyEvalResult\")\n", | ||
"\n", | ||
"\n", | ||
"# Step 2\n", | ||
"incremental_evaluator = IncrementalEvaluator(\n", | ||
" dataset_repository,\n", | ||
" run_repository,\n", | ||
" evaluation_repository,\n", | ||
" \"My incremental evaluation\",\n", | ||
" DummyIncrementalEvaluationLogic(),\n", | ||
")\n", | ||
"\n", | ||
"# Step 3\n", | ||
"incremental_evaluator.evaluate_runs(my_example_data.run_overview_1.id)\n", | ||
"\n", | ||
"# Step 4\n", | ||
"aggregation_logic = DummyAggregationLogic()\n", | ||
"aggregator = Aggregator(\n", | ||
" evaluation_repository, aggregation_repository, \"MyAggregator\", aggregation_logic\n", | ||
")\n", | ||
"aggregation_overview = aggregator.aggregate_evaluation(\n", | ||
" *evaluation_repository.evaluation_overview_ids()\n", | ||
")\n", | ||
"print(aggregation_overview)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"## Addition of new task/run\n", | ||
"# Step 5\n", | ||
"run_ids = [my_example_data.run_overview_1.id, my_example_data.run_overview_1.id]\n", | ||
"incremental_evaluator.evaluate_additional_runs(\n", | ||
" *run_ids,\n", | ||
" previous_evaluation_ids=evaluation_repository.evaluation_overview_ids(),\n", | ||
")\n", | ||
"\n", | ||
"# Step 6\n", | ||
"second_aggregation_overview = aggregator.aggregate_evaluation(\n", | ||
" *evaluation_repository.evaluation_overview_ids()\n", | ||
")\n", | ||
"print(second_aggregation_overview)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.8" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |