Releases · ecmwf/anemoi-training

14 Nov 17:56

HCookie

0.3.0

64915e6

0.3.0 - Loss & Callback Refactors Latest

Latest

Config Updates

Due to large refactors of the loss functions and callbacks, it is highly advised you reset your configs.
anemoi-training config generate --override

In particular, the training and diagnostics configurations have greatly changed.

Added

training_loss
validation_metrics
variable_loss_scaling

Removed

loss_scaling

Changed

callbacks

What's Changed

Feature/improve loss functions by @HCookie in #70
Refactor Callbacks by @HCookie in #60
Bind max steps and lr iterations by @Rilwan-Adewoyin in #67
feature: sub hour timesteping by @JesperDramsch in #63
added config for bounding by @gabrieloks in #10
fix: metric ranges in the validation space not the normalized space by @sahahner in #116
fix: enable learningrate monitor automatically by @sahahner in #119
Mlflow benchmark profiler update by @anaprietonem in #38
Rename frequency to batch_frequency in RolloutEval by @HCookie in #118
Add expansion of params to logger by @HCookie in #91
Fix missing checkpoint callbacks by @HCookie in #125
Change how mlflow measures CPU memory usage by @cathalobrien in #94
Graph config for stretched grid graph by @havardhhaugen in #133
Graph config for limited area models (LAMs) by @JPXKQX in #134
Feature/update graph callbacks by @JPXKQX in #135
fix: Rename loss_scaling to variable_loss_scaling by @HCookie in #138
fix: Allow updates to scalars & other improvements by @HCookie in #137

New Contributors

@Rilwan-Adewoyin made their first contribution in #67
@gabrieloks made their first contribution in #10
@cathalobrien made their first contribution in #94
@havardhhaugen made their first contribution in #133

Full Changelog: 0.2.2...0.3.0

Contributors

JesperDramsch, Rilwan-Adewoyin, and 7 other contributors

Assets 2

28 Oct 14:48

gmertes

0.2.2

de98029

0.2.2 - Maintenance: pin python <3.13

What's Changed

This release pins python <3.13 due to a missing dependency distribution. Support for python 3.13 is expected in the near future.

Changed

Lock python version <3.13 #107

Commit Log

Fix version import by @HCookie in #104
Add contributors file. by @mchantry in #106
Lock python version to <3.13 by @HCookie in #107

New Contributors

@mchantry made their first contribution in #106

Full Changelog: https://github.com/ecmwf/anemoi-training/blob/develop/CHANGELOG.md

Contributors

mchantry and HCookie

Assets 2

24 Oct 14:09

gmertes

0.2.1

84fd53b

0.2.1 - Bugfix: resuming mlflow runs

What's Changed

Added

Mlflow-sync to include new tag for server to server syncing #83
Mlflow-sync to include functionality to resume and fork server2server runs #83
Rollout training for Limited Area Models. #79
Feature: New Boolean1DMask class. Enables rollout training for limited area models. #79

Fixed

Mlflow-sync to handle creation of new experiments in the remote server #83
Fix for multi-gpu when using mlflow due to refactoring of _get_mlflow_run_params function #99
ci: fix pyshtools install error #100

Changed

Update copyright notice

Commit Log

Add output mask by @JPXKQX in #79
Fix/mlflow sync tag by @anaprietonem in #83
Ci/fix pyshtools installation error by @theissenhelen in #100
Update copyright notice by @b8raoult in #101
Fix/mlflow multi gpu by @anaprietonem in #99

Full Changelog: https://github.com/ecmwf/anemoi-training/blob/develop/CHANGELOG.md#021---bugfix-resuming-mlflow-runs---2024-10-24

Contributors

theissenhelen, JPXKQX, and 2 other contributors

Assets 2

16 Oct 11:20

gmertes

0.2.0

59c3199

0.2.0 - Feature release

What's Changed

This release brings some changes to the default config. Use the anemoi-training config CLI to re-generate your default config.

Added

Add anemoi-transform link to documentation
Codeowners file (#56)
Changelog merge strategy (#56)

Miscellaneous

Introduction of remapper to anemoi-models leads to changes in the data indices. Some preprocessors cannot be applied in-place anymore.

Functionality

Enable the callback for plotting a histogram for variables containing NaNs
Enforce same binning for histograms comparing true data to predicted data
Fix: Inference checkpoints are now saved according the frequency settings defined in the config #37
Feature: Add configurable models #50
Feature: Authentication support for mlflow sync - #51
Feature: Support training for datasets with missing time steps #48
Feature: AnemoiMlflowClient, an mlflow client with authentication support #86
Long Rollout Plots

Fixed

Fix TypeError raised when trying to JSON serialise datetime.timedelta object - #43
Bugfixes for CI (#56)
Fix mlflow subcommand on python 3.9 #62
Show correct subcommand in MLFlow - Addresses #39 in #61
Fix interactive multi-GPU training #82
Allow 500 characters in mlflow logging #88

Changed

Updated configuration examples in documentation and corrected links - #46
Remove credential prompt from mlflow login, replace with seed refresh token via web - #78
Update CODEOWNERS

Commit Log

fix: histogram only on non-nan values by @sahahner in #15
fix: saving frequency bug for inference checkpoints by @anaprietonem in #37
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #36
feature: make models instantiateable by @JesperDramsch in #50
Feat: adding in ability to configure precip like plots by @da-ewanp in #49
Changes to data indices in anemoi models by @sahahner in #17
fix: remapped variables in tests need to be dictionaries by @sahahner in #52
Fix - datetime.timedelta to string conversion for JSON by @gareth-j in #43
40 support dataset with missing timesteps by @JPXKQX in #48
Expanded intro docs and examples by @gareth-j in #46
Chore/multiple fixes ci precommit by @theissenhelen in #56
23 plot a validation sample after long rollout by @sahahner in #26
[fix] Capture Anemoi Training subcommands in MLFlow by @JesperDramsch in #61
fix version pinning by @MeraX in #66
Fix mlflow command on python 3.9 by @gmertes in #62
Authentication support for mlflow sync by @gmertes in #51
New mlflow authentication API by @gmertes in #78
add link to transform by @b8raoult in #81
Allow for longer truncation when mlflow > 1.28 by @HCookie in #88
Update CODEOWNERS by @b8raoult in #90
Fix interactive multi-GPU training by @gmertes in #82
Add AnemoiMlflowClient with auth support by @gmertes in #86
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #74
Make pin_memory of the Dataloader configurable by @MeraX in #64

New Contributors

@anaprietonem made their first contribution in #37
@da-ewanp made their first contribution in #49
@MeraX made their first contribution in #66
@b8raoult made their first contribution in #81
@HCookie made their first contribution in #88

Full Changelog: https://github.com/ecmwf/anemoi-training/blob/develop/CHANGELOG.md#020---feature-release---2024-10-16

Contributors

JesperDramsch, MeraX, and 10 other contributors

Assets 2

16 Aug 15:28

JesperDramsch

0.1.0

af48387

0.1.0 - Anemoi training - First release

What's Changed

Added

Subcommands

Subcommand for training anemoi-training train
Subcommand for config generation of configs
Subcommand for mlflow: login and sync
Subcommand for checkpoint handling

Functionality

Searchpaths for Hydra configs, to enable configs in CWD, ANEMOI_CONFIG_PATH env, and .config/anemoi/training in addition to package defaults
MlFlow token authentication
Configurable pressure level scaling

Continuous Integration / Deployment

Downstream CI to test all dependencies with changes
Changelog Status check
Readthedocs PR builder
Changelog Release Updater Workflow

Miscellaneous

Extended ruff Ruleset
Added Docsig pre-commit hook
__future__ annotations for typehints
Added Typehints where missing
Added Changelog
Correct errors in callback plots
fix error in the default config
example slurm config

Changed

Move to Anemoi Ecosystem

Fixed PyPI packaging
Use of Anemoi models
Use of Anemoi graphs
Adjusted tests to work with new Anemoi ecosystem
Adjusted configs to reasonable common defaults

Functionality

Changed hardware-specific keys from configs to ??? to trigger "missing"
__len__ of NativeGridDataset
Configurable dropout in attention layer

Docs

First draft on Read the Docs
Fixed docstrings

Miscellaneous

Moved callbacks into folder to facilitate future refactor
Adjusted PyPI release infrastructure to common ECMWF workflow
Bumped versions in Pre-commit hooks
Fix crash when logging hyperparameters with missing values in the config
Fixed "null" tracker metadata when tracking is disabled, now returns an empty dict
Pinned numpy<2 until we can test all migration
ci: path ignore of docs for downstream ci
ci: make python QA reusable
ci: permissions on changelog updater

Removed

Dependency on mlflow-export-import
Specific user configs
len function of NativeGridDataset as it lead to bugs

Release Work

@gmertes made their first contribution in #2
@theissenhelen made their first contribution in #11
@sahahner made their first contribution in #24
@JesperDramsch made their first contribution in #28
@mc4117 and @gareth-j made their first contribution in #25

Original Contributions in AIFS by

Full Changelog: https://github.com/ecmwf/anemoi-training/blob/develop/CHANGELOG.md#0.1.0

Contributors

JesperDramsch, gareth-j, and 13 other contributors

Assets 2

Releases: ecmwf/anemoi-training

0.3.0 - Loss & Callback Refactors

Config Updates

Added

Removed

Changed

What's Changed

New Contributors

Contributors

0.2.2 - Maintenance: pin python <3.13

What's Changed

Changed

Commit Log

New Contributors

Contributors

0.2.1 - Bugfix: resuming mlflow runs

What's Changed

Added

Fixed

Changed

Commit Log

Contributors

0.2.0 - Feature release

What's Changed

Added

Miscellaneous

Functionality

Fixed

Changed

Commit Log

New Contributors

Contributors

0.1.0 - Anemoi training - First release

What's Changed

Added

Subcommands

Functionality

Continuous Integration / Deployment

Miscellaneous

Changed

Move to Anemoi Ecosystem

Functionality

Docs

Miscellaneous

Removed

Release Work

Original Contributions in AIFS by

Contributors