Skip to content

Releases: ClashLuke/HeavyBall

v1.3.0

18 Dec 17:54
9a20be2
Compare
Choose a tag to compare
  • fixes: in 1.2.x (not 1.1.x), all optimizers were SGD; AdamW now runs AdamW again
  • heavyball.utils.disable_caution_scaling implements the behavior documented here
  • SOAP converges well again
    image

faster, less memory, minor fixes

15 Dec 19:01
afd848f
Compare
Choose a tag to compare
  • LaProp/Adam/... are now compilable
  • fused_hook and hook_optimizer_into_model, reducing memory usage by fusing backward pass with optimizer step
  • fewer inplace ops, giving better compilations and cleaner code
  • scaling ("graft", "scale", "none") for Muon, allowing Adam#Muon at minimal cost
  • storage_dtype argument is implemented again
  • LaProp is correctly implemented, ADOPT is more stable
  • via @ethansmith2000: cleaner, more maintainable defaults, reducing the surface for potential errors

Stability, Muon and Fixes

08 Dec 22:54
Compare
Choose a tag to compare
  • utils
    • bugfixes impacting SFAdamW and RMSProp
    • breaking: zeroth_power_method no longer supports eigh and doesn't allow specification of the number of newtonschulz iterations
    • faster newtonschulz5 (via @tysam-code)
    • PSGD preconditioner dampening (via @evanatyourservice)
  • chainable
    • implementation of nesterov_momentum, heavyball_momentum and orthogonalize_update
  • core
    • heavyball.Muon (by chaining nesterov_momentum and orthogonalize_update); Muon supports gradient and update clipping out of the box

v1.0.0

07 Dec 19:36
Compare
Choose a tag to compare

functional (optax-style) API and backend