Skip to content

Releases: ml-explore/mlx

v0.17.0

23 Aug 18:48
684e11c
Compare
Choose a tag to compare

Highlights

  • mx.einsum: PR
  • Big speedups in reductions: benchmarks
  • 2x faster model loading: PR
  • mx.fast.metal_kernel for custom GPU kernels: docs

Core

  • Faster program exits
  • Laplace sampling
  • mx.nan_to_num
  • nn.tanh gelu approximation
  • Fused GPU quantization ops
  • Faster group norm
  • bf16 winograd conv
  • vmap support for mx.scatter
  • mx.pad "edge" padding
  • More numerically stable mx.var
  • mx.linalg.cholesky_inv/mx.linalg.tri_inv
  • mx.isfinite
  • Complex mx.sign now mirrors NumPy 2.0 behaviour
  • More flexible mx.fast.rope
  • Update to nanobind 2.1

Bug Fixes

  • gguf zero initialization
  • expm1f overflow handling
  • bfloat16 hadamard
  • large arrays for various ops
  • rope fix
  • bf16 array creation
  • preserve dtype in nn.Dropout
  • nn.TransformerEncoder with norm_first=False
  • excess copies from contiguity bug

v0.16.3

12 Aug 23:14
1086dc4
Compare
Choose a tag to compare

πŸš€

v0.16.2

09 Aug 00:30
9231617
Compare
Choose a tag to compare

πŸš€πŸš€

0.16.1

25 Jul 18:45
e9e5385
Compare
Choose a tag to compare

πŸš€

v0.16.0

11 Jul 18:44
d0da742
Compare
Choose a tag to compare

Highlights

  • @mx.custom_function for custom vjp/jvp/vmap transforms
  • Up to 2x faster Metal GEMV and fast masked GEMV
  • Fast hadamard_transform

Core

  • Metal 3.2 support
  • Reduced CPU binary size
  • Added quantized GPU ops to JIT
  • Faster GPU compilation
  • Added grads for bitwise ops + indexing

Bug Fixes

  • 1D scatter bug
  • Strided sort bug
  • Reshape copy bug
  • Seg fault in mx.compile
  • Donation condition in compilation
  • Compilation of accelerate on iOS

v0.15.2

27 Jun 18:21
d6383a1
Compare
Choose a tag to compare

πŸš€

v0.15.1

14 Jun 21:13
af9079c
Compare
Choose a tag to compare

πŸš€

v0.15.0

07 Jun 03:16
cf236fc
Compare
Choose a tag to compare

Highlights

  • Fast Metal GPU FFTs
  • mx.distributed with all_sum and all_gather

Core

  • Added dlpack device __dlpack_device__
  • Fast GPU FFTs benchmarks
  • Add docs for the mx.distributed
  • Add mx.view op

NN

  • softmin, hardshrink, and hardtanh activations

Bugfixes

  • Fix broadcast bug in bitwise ops
  • Allow more buffers for JIT compilation
  • Fix matvec vector stride bug
  • Fix multi-block sort stride management
  • Stable cumprod grad at 0
  • Buf fix with race condition in scan

v0.14.1

31 May 19:34
0798824
Compare
Choose a tag to compare

πŸš€

v0.14.0

24 May 01:33
9f9cb7a
Compare
Choose a tag to compare

Highlights

  • Small-size build that JIT compiles kernels and omits the CPU backend which results in a binary <4MB
    • Series of PRs 1, 2, 3, 4, 5
  • mx.gather_qmm quantized equivalent for mx.gather_mm which speeds up MoE inference by ~2x
  • Grouped 2D convolutions

Core

  • mx.conjugate
  • mx.conv3d and nn.Conv3d
  • List based indexing
  • Started mx.distributed which uses MPI (if installed) for communication across machines
    • mx.distributed.init
    • mx.distributed.all_gather
    • mx.distributed.all_reduce_sum
  • Support conversion to and from dlpack
  • mx.linalg.cholesky on CPU
  • mx.quantized_matmul sped up for vector-matrix products
  • mx.trace
  • mx.block_masked_mm now supports floating point masks!

Fixes

  • Error messaging in eval
  • Add some missing docs
  • Scatter index bug
  • The extensions example now compiles and runs
  • CPU copy bug with many dimensions