Skip to content

Commit

Permalink
doc(changelog): Update CHANGELOG.md with release of version 0.7.0
Browse files Browse the repository at this point in the history
TASK: IL-412
  • Loading branch information
MerlinKallenbornTNG committed Mar 28, 2024
1 parent a3599a0 commit ec47108
Showing 1 changed file with 136 additions and 8 deletions.
144 changes: 136 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,145 @@
# Changelog

## Unreleased
## 0.7.0

- The elo-calculation logic has been heavily simplified
- `Payoff` from the elo package has been renamed to `Match`
- `PayoffMatrix` from the elo package has been renamed to `MatchOutcome` and is now pydantic (de)-serializable
- `SingleChunkQa` now uses a logit_bias to promote not answering for German
- `__init__`-parameters of all tasks are streamlined:
### Breaking Changes
- breaking change: FScores are now correctly exposed as FScores and no longer as RougeScores
- breaking change: HuggingFaceAggregationRepository and HuggingFaceDatasetRepository now consistently follow the same folder structure as FileDatasetRepository when creating data sets. This means that datasets will be stored in a folder datasets and additional sub-folders named according to the respective dataset ID.
- breaking change: Split run_repository into file_run_repository, in_memory_run_repository.
- breaking change: Split evaluation_repository into argilla_evaluation_repository, file_evaluation_repository and in_memory_evaluation_repository
- breaking change: Split dataset_repository into file_dataset_repository and in_memory_dataset_respository
- breaking change: Split aggregation_respository into file_aggragation_repository and in_memory_aggregation_repository
- breaking change: Renamed evaluation/run.py to evaluation/run_evaluator.py
- breaking change: Split evaluation/domain and distribute it across aggregation, evaluation, dataset and run packages.
- breaking change: Split evaluation/argilla and distribute it across aggregation and evaluation packages.
- breaking change: Split evaluation into separate dataset, run, evaluation and aggregationpackages.
- breaking change: Split evaluation/hugging_face.py into dataset and aggregation repository files in data_storage package.
- breaking change: create_dataset now returns the new Dataset type instead of a dataset ID.
- breaking change: Consistent naming for repository root directories when creating evaluations or aggregations:
- .../eval → .../evaluations and .../aggregation → aggregations.
- breaking change: Core tasks not longer provide defaults for the applied models.
- breaking change: Methods returning entities from repositories now return the results ordered by their IDs.
- breaking change: Renamed crashed_during_eval_count to crashed_during_evaluation_count in AggregationOverview.
- breaking change: Renamed create_evaluation_dataset to initialize_evaluation in EvaluationRepository.
- breaking change: Renamed to_explanation_response to to_explanation_request in ExplainInput.
- breaking change: Removed TextHighlight::text in favor of TextHighlight::start and TextHighlight::end
- breaking change: Removed `IntelligenceApp` and `IntelligenceStarterApp`
- breaking change: RetrieverBasedQa uses now MultiChunkQa instead of generic task pr SingleChunkQa
- breaking change: EvaluationRepository failed_example_evaluations no longer abstract
- breaking change: Elo calculation simplified:
- Payoff from elo package has been removed
- PayoffMatrix from elo package renamed to MatchOutcome
- SingleChunkQa uses logit_bias to promote not answering for German
- breaking change: Remove ChunkOverlap task.
- breaking change: Rename Chunk to TextChunk.
- breaking change: Rename ChunkTask to Chunk .
- breaking change: Rename EchoTask to Echo.
- breaking change: Rename TextHighlightTask to TextHighlightbreaking change: Rename ChunkOverlaptTask to ChunkOverlap

### New Features

- Aggregation:
- feature: InstructComparisonArgillaAggregationLogic uses full evaluation set instead of sample for aggregation

- Documentation

- feature: Added How-To’s (linked in the README):
- how to define a task
- how to implement a task
- how to create a dataset
- how to run a task on a dataset
- how to perform aggregation
- how to evaluate runs
- feature: Restructured and cleaned up README for more conciseness.
- feature: Add illustrations to Concepts.md.
- feature: Added tutorial for adding task to a FastAPI app (linked in README).
- feature: Improved and added various DocStrings.
- feature: Added a README section about the client URL.
- feature: Add python naming convention to README

- Classify
- feature: PromptBasedClassify now supports changing of the prompt instruction via the instruction parameter.
- feature: Add default model for PromptBasedClassify
- feature: Add default task for PromptBasedClassify

- Evaluation
- feature: All repositories will return a ValueError when trying to access a dataset that does not exist while also trying to access an entry of the dataset. If only the dataset is retrieved, it will return None.
- `ArgillaEvaluationRepository` now handles failed evaluations.
- feature: Added SingleHuggingfaceDatasetRepository.
- feature: Added HighlightCoverageGrader.
- feature: Added LanguageMatchesGrader.

- feature: Added prettier default printing behavior of repository entities by providing overloads to __str__ and __repr__ methods.

- feature: Added abstract HuggingFace repository base-class.

- feature: Refactoring of HuggingFace repository

- feature: Added HuggingFaceAggregationRepository.
- feature: Added template method to individual repository
- feature: Added Dataset model to dataset repository. This allows to store a short descriptive name for the dataset for easier identification
- feature: SingleChunkQa internally now uses the same model in TextHighlight by default.
- feature: MeanAccumulator tracks standard deviation and standard error
- feature: EloCalculator now updates ranking after each match
- feature: Add data selection methods to repositories:
- AggregationRepository::aggregation_overviews
- EvaluationRepository::run_overviews
- EvaluationRepository::run_overview_ids
- EvaluationRepository::example_output
- EvaluationRepository::example_outputs
- EvaluationRepository::example_output_ids
- EvaluationRepository::example_trace
- EvaluationRepository::example_tracer
- RunRepository::run_overviews
- RunRepository::run_overview_ids
- RunRepository::example_output
- RunRepository::example_outputs
- RunRepository::example_output_ids
- RunRepository::example_trace
- RunRepository::example_tracer

- feature: Evaluator continues in case of no successful outputs

- Q & A

- feature: Define default parameters for LongContextQa, SingleChunkQa
- feature: Define default task for RetrieverBasedQa
- feature: Define default model for KeyWordExtract, MultiChunkQa,
- feature: Improved focus of highlights in TextHighlight tasks.
- feature: Added filtering for TextHighlight tasks.
- feature: Introduce logit_bias to SingleChunkQa

- Summarize
- feature: Added RecursiveSummarizeInput.
- feature: Define defaults for SteerableSingleChunkSummarize,SteerableLongContexSummarize, RecursiveSummarize

- Tracer
- feature: Added better trace viewer integration:
- Add trace storage to trace viewer server
- added submit_to_tracer_viewer method to InMemoryTracer
- UI and navigation improvements for trace viewer
- Add exception handling for tracers during log entry writing

- Others

- feature: The following classes are now exposed:
- DocumentChunk
- MultipleChunkQaOutput
- Subanswer
- feature: Simplified internal imports.
- feature: Stream lining of __init__-parameters of all tasks
- Sub-tasks are typically exposed as `__init__`-parameters with sensible defaults.
- Defaults for non-trivial parameters like models or tasks are defined in `__init__` while the default parameter is `None`.
- Defaults for non-trivial parameters like models or tasks are defined in __init__while the default parameter is None.
- Instead of exposing parameters that are passed on to sub-tasks the sub-task themselves are exposed.
- `IntelligenceApp` and `IntelligenceStarterApp` have been removed.
- feature: Update supported models

### Fixes

- fix: Fixed exception handling in language detection of LanguageMatchesGrader.
- fix: Fixed a bug that could lead to cut-off highlight ranges in TextHighlight tasks.
- fix: Fixed list_ids methods to use path_to_str
- fix: Disallow traces without end in the trace viewer
- fix: ArgillaClient now correctly uses provided API-URL instead of hard-coded localhost

## 0.6.0

Expand Down

0 comments on commit ec47108

Please sign in to comment.