Skip to content

Commit

Permalink
feat: Rename how-to implement incremental evaluation and make it more…
Browse files Browse the repository at this point in the history
… concise (#864)

TASK: IL-313
  • Loading branch information
SebastianNiehusAA authored May 23, 2024
1 parent 48b09b0 commit 86c5e2e
Show file tree
Hide file tree
Showing 4 changed files with 155 additions and 207 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
...

### New Features
- Add `how_to_implement_complete_incremental_evaluation_flow`
- Add `how_to_implement_incremental_evaluation`.

### Fixes
- The document index client now correctly URL-encodes document names in its queries.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ The how-tos are quick lookups about how to do things. Compared to the tutorials,
| [...retrieve data for analysis](./src/documentation/how_tos/how_to_retrieve_data_for_analysis.ipynb) | Retrieve experiment data in multiple different ways |
| [...implement a custom human evaluation](./src/documentation/how_tos/how_to_human_evaluation_via_argilla.ipynb) | Necessary steps to create an evaluation with humans as a judge via Argilla |
| [...implement elo evaluations](./src/documentation/how_tos/how_to_implement_elo_evaluations.ipynb) | Evaluate runs and create ELO ranking for them |
| [...implement complete incremental evaluation flow](./src/documentation/how_tos/how_to_implement_complete_incremental_evaluation_flow.ipynb) | Run complete incremental evaluation flow from runner to aggretation
| [...implement incremental evaluation](./src/documentation/how_tos/how_to_implement_incremental_evaluation.ipynb) | Implement and run an incremental evaluation
# Models

Currently, we support a bunch of models accessible via the Aleph Alpha API. Depending on your local setup, you may even have additional models available.
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from documentation.how_tos.example_data import (\n",
" DummyAggregationLogic,\n",
" DummyEvaluation,\n",
" DummyExample,\n",
" example_data,\n",
")\n",
"from intelligence_layer.evaluation import (\n",
" Aggregator,\n",
" Example,\n",
" IncrementalEvaluationLogic,\n",
" IncrementalEvaluator,\n",
" InMemoryAggregationRepository,\n",
" InMemoryEvaluationRepository,\n",
" SuccessfulExampleOutput,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to implement incremental evaluation\n",
"This notebook outlines how to perform evaluations in an incremental fashion, i.e., adding additional runs to your existing evaluations without the need for recalculation.\n",
" \n",
"## Step-by-Step Guide\n",
"0. Run your tasks on the datasets on which you want to evaluate them (see [here](./how_to_run_a_task_on_a_dataset.ipynb))\n",
" - When evaluating multiple runs, all of them need the same data types \n",
"1. Initialize all necessary repositories and define your `IncrementalEvaluationLogic`; It is similar to a normal `EvaluationLogic` (see [here](./how_to_implement_a_simple_evaluation_and_aggregation_logic.ipynb)) but you additionally have to implement your own `do_incremental_evaluate` method\n",
"2. Initialize an `IncrementalEvaluator` with the repositories and your custom `IncrementalEvaluationLogic`\n",
"3. Call the `evaluate_runs` method of the `IncrementalEvaluator`\n",
"4. Aggregate your evaluations using the [standard aggregation](./how_to_aggregate_evaluations.ipynb) or using a [custom aggregation logic](./how_to_implement_a_simple_evaluation_and_aggregation_logic.ipynb)\n",
"\n",
"#### Steps for addition of new runs \n",
"5. Call the `evaluate_additional_runs` method of the `IncrementalEvaluator`:\n",
" - `run_ids`: Runs to be included in the evaluation results, including those that have been evaluated before\n",
" - `previous_evaluation_ids`: Runs **not** to be re-evaluated, depending on the specific implementation of the `do_incremental_evaluate` method\n",
"6. Aggregate all your `EvaluationOverview`s"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Step 0\n",
"examples = [\n",
" DummyExample(input=\"input1\", expected_output=\"expected_output1\", data=\"data1\")\n",
"]\n",
"my_example_data = example_data()\n",
"\n",
"dataset_repository = my_example_data.dataset_repository\n",
"run_repository = my_example_data.run_repository\n",
"\n",
"# Step 1\n",
"evaluation_repository = InMemoryEvaluationRepository()\n",
"aggregation_repository = InMemoryAggregationRepository()\n",
"\n",
"\n",
"class DummyIncrementalEvaluationLogic(\n",
" IncrementalEvaluationLogic[str, str, str, DummyEvaluation]\n",
"):\n",
" def do_incremental_evaluate(\n",
" self,\n",
" example: Example[str, str],\n",
" outputs: list[SuccessfulExampleOutput[str]],\n",
" already_evaluated_outputs: list[list[SuccessfulExampleOutput[str]]],\n",
" ) -> DummyEvaluation:\n",
" return DummyEvaluation(eval=\"DummyEvalResult\")\n",
"\n",
"\n",
"# Step 2\n",
"incremental_evaluator = IncrementalEvaluator(\n",
" dataset_repository,\n",
" run_repository,\n",
" evaluation_repository,\n",
" \"My incremental evaluation\",\n",
" DummyIncrementalEvaluationLogic(),\n",
")\n",
"\n",
"# Step 3\n",
"incremental_evaluator.evaluate_runs(my_example_data.run_overview_1.id)\n",
"\n",
"# Step 4\n",
"aggregation_logic = DummyAggregationLogic()\n",
"aggregator = Aggregator(\n",
" evaluation_repository, aggregation_repository, \"MyAggregator\", aggregation_logic\n",
")\n",
"aggregation_overview = aggregator.aggregate_evaluation(\n",
" *evaluation_repository.evaluation_overview_ids()\n",
")\n",
"print(aggregation_overview)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Addition of new task/run\n",
"# Step 5\n",
"run_ids = [my_example_data.run_overview_1.id, my_example_data.run_overview_1.id]\n",
"incremental_evaluator.evaluate_additional_runs(\n",
" *run_ids,\n",
" previous_evaluation_ids=evaluation_repository.evaluation_overview_ids(),\n",
")\n",
"\n",
"# Step 6\n",
"second_aggregation_overview = aggregator.aggregate_evaluation(\n",
" *evaluation_repository.evaluation_overview_ids()\n",
")\n",
"print(second_aggregation_overview)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 86c5e2e

Please sign in to comment.