Releases: ecmwf/anemoi-training
0.3.0 - Loss & Callback Refactors
Config Updates
Due to large refactors of the loss functions and callbacks, it is highly advised you reset your configs.
anemoi-training config generate --override
In particular, the training
and diagnostics
configurations have greatly changed.
Added
- training_loss
- validation_metrics
- variable_loss_scaling
Removed
- loss_scaling
Changed
- callbacks
What's Changed
- Feature/improve loss functions by @HCookie in #70
- Refactor Callbacks by @HCookie in #60
- Bind max steps and lr iterations by @Rilwan-Adewoyin in #67
- feature: sub hour timesteping by @JesperDramsch in #63
- added config for bounding by @gabrieloks in #10
- fix: metric ranges in the validation space not the normalized space by @sahahner in #116
- fix: enable learningrate monitor automatically by @sahahner in #119
- Mlflow benchmark profiler update by @anaprietonem in #38
- Rename frequency to batch_frequency in RolloutEval by @HCookie in #118
- Add expansion of params to logger by @HCookie in #91
- Fix missing checkpoint callbacks by @HCookie in #125
- Change how mlflow measures CPU memory usage by @cathalobrien in #94
- Graph config for stretched grid graph by @havardhhaugen in #133
- Graph config for limited area models (LAMs) by @JPXKQX in #134
- Feature/update graph callbacks by @JPXKQX in #135
- fix: Rename loss_scaling to variable_loss_scaling by @HCookie in #138
- fix: Allow updates to scalars & other improvements by @HCookie in #137
New Contributors
- @Rilwan-Adewoyin made their first contribution in #67
- @gabrieloks made their first contribution in #10
- @cathalobrien made their first contribution in #94
- @havardhhaugen made their first contribution in #133
Full Changelog: 0.2.2...0.3.0
0.2.2 - Maintenance: pin python <3.13
What's Changed
This release pins python <3.13 due to a missing dependency distribution. Support for python 3.13 is expected in the near future.
Changed
- Lock python version <3.13 #107
Commit Log
- Fix version import by @HCookie in #104
- Add contributors file. by @mchantry in #106
- Lock python version to <3.13 by @HCookie in #107
New Contributors
Full Changelog: https://github.com/ecmwf/anemoi-training/blob/develop/CHANGELOG.md
0.2.1 - Bugfix: resuming mlflow runs
What's Changed
Added
- Mlflow-sync to include new tag for server to server syncing #83
- Mlflow-sync to include functionality to resume and fork server2server runs #83
- Rollout training for Limited Area Models. #79
- Feature: New
Boolean1DMask
class. Enables rollout training for limited area models. #79
Fixed
- Mlflow-sync to handle creation of new experiments in the remote server #83
- Fix for multi-gpu when using mlflow due to refactoring of _get_mlflow_run_params function #99
- ci: fix pyshtools install error #100
Changed
- Update copyright notice
Commit Log
- Add output mask by @JPXKQX in #79
- Fix/mlflow sync tag by @anaprietonem in #83
- Ci/fix pyshtools installation error by @theissenhelen in #100
- Update copyright notice by @b8raoult in #101
- Fix/mlflow multi gpu by @anaprietonem in #99
Full Changelog: https://github.com/ecmwf/anemoi-training/blob/develop/CHANGELOG.md#021---bugfix-resuming-mlflow-runs---2024-10-24
0.2.0 - Feature release
What's Changed
This release brings some changes to the default config. Use the anemoi-training config
CLI to re-generate your default config.
Added
Miscellaneous
- Introduction of remapper to anemoi-models leads to changes in the data indices. Some preprocessors cannot be applied in-place anymore.
Functionality
- Enable the callback for plotting a histogram for variables containing NaNs
- Enforce same binning for histograms comparing true data to predicted data
- Fix: Inference checkpoints are now saved according the frequency settings defined in the config #37
- Feature: Add configurable models #50
- Feature: Authentication support for mlflow sync - #51
- Feature: Support training for datasets with missing time steps #48
- Feature:
AnemoiMlflowClient
, an mlflow client with authentication support #86 - Long Rollout Plots
Fixed
- Fix
TypeError
raised when trying to JSON serialisedatetime.timedelta
object - #43 - Bugfixes for CI (#56)
- Fix
mlflow
subcommand on python 3.9 #62 - Show correct subcommand in MLFlow - Addresses #39 in #61
- Fix interactive multi-GPU training #82
- Allow 500 characters in mlflow logging #88
Changed
- Updated configuration examples in documentation and corrected links - #46
- Remove credential prompt from mlflow login, replace with seed refresh token via web - #78
- Update CODEOWNERS
Commit Log
- fix: histogram only on non-nan values by @sahahner in #15
- fix: saving frequency bug for inference checkpoints by @anaprietonem in #37
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #36
- feature: make models instantiateable by @JesperDramsch in #50
- Feat: adding in ability to configure precip like plots by @da-ewanp in #49
- Changes to data indices in anemoi models by @sahahner in #17
- fix: remapped variables in tests need to be dictionaries by @sahahner in #52
- Fix -
datetime.timedelta
to string conversion for JSON by @gareth-j in #43 - 40 support dataset with missing timesteps by @JPXKQX in #48
- Expanded intro docs and examples by @gareth-j in #46
- Chore/multiple fixes ci precommit by @theissenhelen in #56
- 23 plot a validation sample after long rollout by @sahahner in #26
- [fix] Capture Anemoi Training subcommands in MLFlow by @JesperDramsch in #61
- fix version pinning by @MeraX in #66
- Fix mlflow command on python 3.9 by @gmertes in #62
- Authentication support for mlflow sync by @gmertes in #51
- New mlflow authentication API by @gmertes in #78
- add link to transform by @b8raoult in #81
- Allow for longer truncation when mlflow > 1.28 by @HCookie in #88
- Update CODEOWNERS by @b8raoult in #90
- Fix interactive multi-GPU training by @gmertes in #82
- Add
AnemoiMlflowClient
with auth support by @gmertes in #86 - [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #74
- Make pin_memory of the Dataloader configurable by @MeraX in #64
New Contributors
- @anaprietonem made their first contribution in #37
- @da-ewanp made their first contribution in #49
- @MeraX made their first contribution in #66
- @b8raoult made their first contribution in #81
- @HCookie made their first contribution in #88
Full Changelog: https://github.com/ecmwf/anemoi-training/blob/develop/CHANGELOG.md#020---feature-release---2024-10-16
0.1.0 - Anemoi training - First release
What's Changed
Added
Subcommands
- Subcommand for training
anemoi-training train
- Subcommand for config generation of configs
- Subcommand for mlflow: login and sync
- Subcommand for checkpoint handling
Functionality
- Searchpaths for Hydra configs, to enable configs in CWD,
ANEMOI_CONFIG_PATH
env, and.config/anemoi/training
in addition to package defaults - MlFlow token authentication
- Configurable pressure level scaling
Continuous Integration / Deployment
- Downstream CI to test all dependencies with changes
- Changelog Status check
- Readthedocs PR builder
- Changelog Release Updater Workflow
Miscellaneous
- Extended ruff Ruleset
- Added Docsig pre-commit hook
__future__
annotations for typehints- Added Typehints where missing
- Added Changelog
- Correct errors in callback plots
- fix error in the default config
- example slurm config
Changed
Move to Anemoi Ecosystem
- Fixed PyPI packaging
- Use of Anemoi models
- Use of Anemoi graphs
- Adjusted tests to work with new Anemoi ecosystem
- Adjusted configs to reasonable common defaults
Functionality
- Changed hardware-specific keys from configs to
???
to trigger "missing" __len__
of NativeGridDataset- Configurable dropout in attention layer
Docs
- First draft on Read the Docs
- Fixed docstrings
Miscellaneous
- Moved callbacks into folder to facilitate future refactor
- Adjusted PyPI release infrastructure to common ECMWF workflow
- Bumped versions in Pre-commit hooks
- Fix crash when logging hyperparameters with missing values in the config
- Fixed "null" tracker metadata when tracking is disabled, now returns an empty dict
- Pinned numpy<2 until we can test all migration
- ci: path ignore of docs for downstream ci
- ci: make python QA reusable
- ci: permissions on changelog updater
Removed
- Dependency on mlflow-export-import
- Specific user configs
- len function of NativeGridDataset as it lead to bugs
Release Work
- @gmertes made their first contribution in #2
- @theissenhelen made their first contribution in #11
- @sahahner made their first contribution in #24
- @JesperDramsch made their first contribution in #28
- @mc4117 and @gareth-j made their first contribution in #25
Original Contributions in AIFS by
- @ssmmnn11
- @mishooax
- @mchantry
- @JPXKQX
- @anaprietonem
- @gabrieloks
- @Rilwan-Adewoyin
- @stietsche
- @jakob-schloer
- The ECMWF AIFS team.
Full Changelog: https://github.com/ecmwf/anemoi-training/blob/develop/CHANGELOG.md#0.1.0