Release v0.1.2 · google/gemma.cpp

MQA implementation
Ops refactorings and optimizations
Bugfixes
Model exporting script (util/convert_weights.py)

Important Note: With the MQA implementation, older 2B model artifacts need to be updated. Please re-download weights from Kaggle and ensure you have the latest version (-mqa or version 3).

What's Changed

Clean up docs for developers by @austinvhuang in #102
MQA Implementation for 2B models by @ufownl in #114
Enhancing Utility Functions in ops.h by @enum-class in #105
Added a missing space in app.h by @villesundell in #115
Fix compilation error when HWY_COMPILER_GCC_ACTUAL < 1300 by @ufownl in #120
.bazelversion: Bazel 7.1.1 by @LINKIWI in #122
Add standalone tool to compress weights. by @szabadka in #125
1.07x speedup: merge MQA parallel sections as suggested by @veluca93 by @copybara-service in #126
Fix off-by-one errors in generation code and token streaming callback. by @szabadka in #127

New Contributors

@villesundell made their first contribution in #115
@LINKIWI made their first contribution in #122
@szabadka made their first contribution in #125

Full Changelog: v0.1.1...v0.1.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.2

What's Changed

New Contributors

Contributors