Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ml-explore / mlx Public

Notifications You must be signed in to change notification settings
Fork 1.1k
Star 18.4k

Code
Issues 100
Pull requests 7
Discussions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Projects
Security
Insights

Releases: ml-explore/mlx

Releases · ml-explore/mlx

v0.22.0

09 Jan 22:33

angeloskath

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.22.0 Latest

Latest

Highlights

Export and import MLX functions to a file (example, bigger example)
- Functions can be exported from Python and run in C++ and vice versa

Core

Add slice and slice_update which take arrays for starting locations
Add an example for using MLX in C++ with CMake
Fused attention for generation now supports boolean masking (benchmark)
Allow array offset for mx.fast.rope
Add mx.finfo
Allow negative strides without resorting to copying for slice and as_strided
Add Flatten, Unflatten and ExpandDims primitives
Enable the compilation of lambdas in C++
Add a lot more primitives for shapeless compilation (full list)
Fix performance regression in qvm
Introduce separate types for Shape and Strides and switch to int64 strides from uint64
Reduced copies for fused-attention kernel
Recompile a function when the stream changes
Several steps to improve the linux / x86_64 experience (#1625, #1627, #1635)
Several steps to improve/enable the windows experience (#1628, #1660, #1662, #1661, #1672, #1663, #1664, ...)
Update to newer Metal-cpp
Throw when exceeding the maximum number of buffers possible
Add mx.kron
mx.distributed.send now implements the identity function instead of returning an empty array
Better errors reporting for mx.compile on CPU and for unrecoverable errors

NN

Add optional bias correction in Adam/AdamW
Enable mixed quantization by nn.quantize
Remove reshapes from nn.QuantizedEmbedding

Bug fixes

Fix qmv/qvm bug for batch size 2-5
Fix some leaks and races (#1629)
Fix transformer postnorm in mlx.nn
Fix some mx.fast fallbacks
Fix the hashing for string constants in compile
Fix small sort in Metal
Fix memory leak of non-evaled arrays with siblings
Fix concatenate/slice_update vjp in edge-case where the inputs have different type

Assets 2

Loading

starkdmi, zuhorer, and GOBINDABISTA reacted with thumbs up emoji

awni, paulleseigneur, and GOBINDABISTA reacted with rocket emoji

All reactions

👍 3 reactions
🚀 3 reactions

5 people reacted

v0.21.1

06 Dec 21:17

awni

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.21.1

🚀 🚀

Assets 2

Loading

LiPingYen, icetraxx, stevengans, Srivat04, ginikolog, oorischubert, and ramkrishna2910 reacted with rocket emoji

All reactions

🚀 7 reactions

7 people reacted

v0.21.0

22 Nov 20:18

awni

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.21.0

Highlights

Support 3 and 6 bit quantization: benchmarks
Much faster memory efficient attention for headdim 64, 80: benchmarks
Much faster sdpa inference kernel for longer sequences: benchmarks

Core

contiguous op (C++ only) + primitive
Bfs width limit to reduce memory consumption during eval
Fast CPU quantization
Faster indexing math in several kernels:
- unary, binary, ternary, copy, compiled, reduce
Improve dispatch threads for a few kernels:
- conv, gemm splitk, custom kernels
More buffer donation with no-ops to reduce memory use
Use CMAKE_OSX_DEPLOYMENT_TARGET to pick Metal version
Dispatch Metal bf16 type at runtime when using the JIT

NN

nn.AvgPool3d and nn.MaxPool3d
Support groups in nn.Conv2d

Bug fixes

Fix per-example mask + docs in sdpa
Fix FFT synchronization bug (use dispatch method everywhere)
Throw for invalid *fft{2,n} cases
Fix OOB access in qmv
Fix donation in sdpa to reduce memory use
Allocate safetensors header on the heap to avoid stack overflow
Fix sibling memory leak
Fix view segfault for scalars input
Fix concatenate vmap

Assets 2

Loading

awni, TortoiseHam, cocoa-xu, atiorh, alenowak, ThePyLord, Mathewvanh, altaic, HongyuS, and mfarme reacted with hooray emoji

awni, jmorganca, LiPingYen, Alexfilus, and mlatysh reacted with rocket emoji

All reactions

🎉 10 reactions
🚀 5 reactions

14 people reacted

v0.20.0

05 Nov 21:23

barronalex

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.20.0

Highlights

Even faster GEMMs
- Peaking at 23.89 TFlops on M2 Ultra benchmarks
BFS graph optimizations
- Over 120tks with Mistral 7B!
Fast batched QMV/QVM for KV quantized attention benchmarks

Core

New Features
- mx.linalg.eigh and mx.linalg.eigvalsh
- mx.nn.init.sparse
- 64bit type support for mx.cumprod, mx.cumsum
Performance
- Faster long column reductions
- Wired buffer support for large models
- Better Winograd dispatch condition for convs
- Faster scatter/gather
- Faster mx.random.uniform and mx.random.bernoulli
- Better threadgroup sizes for large arrays
Misc
- Added Python 3.13 to CI
- C++20 compatibility

Bugfixes

Fix command encoder synchronization
Fix mx.vmap with gather and constant outputs
Fix fused sdpa with differing key and value strides
Support mx.array.__format__ with spec
Fix multi output array leak
Fix RMSNorm weight mismatch error

Assets 2

Loading

awni, danilopeixoto, LiPingYen, atiorh, DePasqualeOrg, dgovil, and foveo reacted with hooray emoji

awni, amirhossein-razlighi, HongyuS, neoheartbeats, abeleinin, LiPingYen, atiorh, DePasqualeOrg, yliess86, dgovil, and 4 more reacted with rocket emoji

All reactions

🎉 7 reactions
🚀 14 reactions

15 people reacted

v0.19.3

31 Oct 23:11

awni

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.19.3

🚀

Assets 2

Loading

awni, LiPingYen, spon94, derekelewis, wuhongsheng, HongyuS, amirhossein-razlighi, blueinkgz, dkhundley, stevengans, and 3 more reacted with rocket emoji

All reactions

🚀 13 reactions

13 people reacted

v0.19.2

31 Oct 02:54

angeloskath

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.19.2

🚀🚀

Assets 2

Loading

awni, magnusdk, V0XNIHILI, altaic, and bhargavyagnik reacted with hooray emoji

awni, LiPingYen, and amirhossein-razlighi reacted with rocket emoji

All reactions

🎉 5 reactions
🚀 3 reactions

7 people reacted

v0.19.1

25 Oct 20:18

awni

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.19.1

🚀

Assets 2

Loading

vladkalinichencko, grib0ed0v, and Goekdeniz-Guelmez reacted with thumbs up emoji

awni, altaic, ooosv, and Goekdeniz-Guelmez reacted with hooray emoji

awni, LeonNissen, amirhossein-razlighi, ivanbelenky, LiPingYen, stevengans, vladkalinichencko, irfanghat, nastya236, and Goekdeniz-Guelmez reacted with rocket emoji

All reactions

👍 3 reactions
🎉 4 reactions
🚀 10 reactions

13 people reacted

v0.19.0

18 Oct 19:35

angeloskath

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.19.0

Highlights

Speed improvements
- Up to 6x faster CPU indexing benchmarks
- Faster Metal compiled kernels for strided inputs benchmarks
- Faster generation with fused-attention kernel benchmarks
Gradient for grouped convolutions
Due to Python 3.8's end-of-life we no longer test with it on CI

Core

New features
- Gradient for grouped convolutions
- mx.roll
- mx.random.permutation
- mx.real and mx.imag
Performance
- Up to 6x faster CPU indexing benchmarks
- Faster CPU sort benchmarks
- Faster Metal compiled kernels for strided inputs benchmarks
- Faster generation with fused-attention kernel benchmarks
- Bulk eval in safetensors to avoid unnecessary serialization of work
Misc
- Bump to nanobind 2.2
- Move testing to python 3.9 due to 3.8's end-of-life
- Make the GPU device more thread safe
- Fix the submodule stubs for better IDE support
- CI generated docs that will never be stale

NN

Add support for grouped 1D convolutions to the nn API
Add some missing type annotations

Bugfixes

Fix and speedup row-reduce with few rows
Fix normalization primitive segfault with unexpected inputs
Fix complex power on the GPU
Fix freeing deep unevaluated graphs details
Fix race with array::is_available
Consistently handle softmax with all -inf inputs
Fix streams in affine quantize
Fix CPU compile preamble for some linux machines
Stream safety in CPU compilation
Fix CPU compile segfault at program shutdown

Assets 2

Loading

LiPingYen and blueinkgz reacted with hooray emoji

awni, o-az, lin72h, DePasqualeOrg, cdotwang, ruoyiqiao, magnusdk, amirhossein-razlighi, ivanbelenky, altaic, and 3 more reacted with rocket emoji

All reactions

🎉 2 reactions
🚀 13 reactions

15 people reacted

v0.18.1

10 Oct 20:05

awni

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.18.1

🚀

Assets 2

Loading

lin72h, profConradi, tjr214, Goekdeniz-Guelmez, and arashyi reacted with thumbs up emoji

awni, amirhossein-razlighi, mzbac, LiPingYen, stevengans, guptaaryan16, johnmai-dev, cyrilzakka, nastya236, ivanbelenky, and 2 more reacted with rocket emoji

All reactions

👍 5 reactions
🚀 12 reactions

15 people reacted

v0.18.0

27 Sep 21:10

awni

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.18.0

Highlights

Speed improvements:
- Up to 2x faster I/O: benchmarks.
- Faster transposed copies, unary, and binary ops
  - CPU benchmarks here.
  - GPU benchmarks here and here.
Transposed convolutions
Improvements to mx.distributed (send/recv/average_gradients)

Core

New features:
- mx.conv_transpose{1,2,3}d
- Allow mx.take to work with integer index
- Add std as method on mx.array
- mx.put_along_axis
- mx.cross_product
- int() and float() work on scalar mx.array
- Add optional headers to mx.fast.metal_kernel
- mx.distributed.send and mx.distributed.recv
- mx.linalg.pinv
Performance
- Up to 2x faster I/O
- Much faster CPU convolutions
- Faster general n-dimensional copies, unary, and binary ops for both CPU and GPU
- Put reduction ops in default stream with async for faster comms
- Overhead reductions in mx.fast.metal_kernel
- Improve donation heuristics to reduce memory use
Misc
- Support Xcode 160

NN

Faster RNN layers
nn.ConvTranspose{1,2,3}d
mlx.nn.average_gradients data parallel helper for distributed training

Bug Fixes

Fix boolean all reduce bug
Fix extension metal library finding
Fix ternary for large arrays
Make eval just wait if all arrays are scheduled
Fix CPU softmax by removing redundant coefficient in neon_fast_exp
Fix JIT reductions
Fix overflow in quantize/dequantize
Fix compile with byte sized constants
Fix copy in the sort primitive
Fix reduce edge case
Fix slice data size
Throw for certain cases of non captured inputs in compile
Fix copying scalars by adding fill_gpu
Fix bug in module attribute set, reset, set
Ensure io/comm streams are active before eval
Fix mx.clip
Override class function in Repr so mx.array is not confused with array.array
Avoid using find_library to make install truly portable
Remove fmt dependencies from MLX install
Fix for partition VJP
Avoid command buffer timeout for IO on large arrays

Assets 2

Loading

BIGChask reacted with thumbs up emoji

ValentinKt, yury, ivanfioravanti, amirhossein-razlighi, edmondja, kasnol, SimLej18, and mlaves reacted with hooray emoji

ValentinKt, bhargavyagnik, altaic, Q1CHENL, and edmondja reacted with heart emoji

awni, sck-at-ucy, lin72h, angeloskath, abeleinin, LiPingYen, ValentinKt, fkouteib, altaic, amirhossein-razlighi, and 6 more reacted with rocket emoji

All reactions

👍 1 reaction
🎉 8 reactions
❤️ 5 reactions
🚀 16 reactions

23 people reacted

Previous 1 2 3 4 5 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.