Releases: google/gemma.cpp
Releases · google/gemma.cpp
v0.1.2
- MQA implementation
- Ops refactorings and optimizations
- Bugfixes
- Model exporting script (
util/convert_weights.py
)
Important Note: With the MQA implementation, older 2B model artifacts need to be updated. Please re-download weights from Kaggle and ensure you have the latest version (-mqa or version 3).
What's Changed
- Clean up docs for developers by @austinvhuang in #102
- MQA Implementation for 2B models by @ufownl in #114
- Enhancing Utility Functions in ops.h by @enum-class in #105
- Added a missing space in app.h by @villesundell in #115
- Fix compilation error when
HWY_COMPILER_GCC_ACTUAL < 1300
by @ufownl in #120 - .bazelversion: Bazel 7.1.1 by @LINKIWI in #122
- Add standalone tool to compress weights. by @szabadka in #125
- 1.07x speedup: merge MQA parallel sections as suggested by @veluca93 by @copybara-service in #126
- Fix off-by-one errors in generation code and token streaming callback. by @szabadka in #127
New Contributors
- @villesundell made their first contribution in #115
- @LINKIWI made their first contribution in #122
- @szabadka made their first contribution in #125
Full Changelog: v0.1.1...v0.1.2
v0.1.1
- Refactor library interfaces
- Fixes to enable android and windows builds + general improvements to builds
- Bazel builds
- CI automation
- Allow either HF or Kaggle (vs Kaggle only) for artifact downloads
- Many small fixes and quality-of-life improvements from initial 0.1.0 release
What's Changed
- Dev -> Main sync by @austinvhuang in #24
- Update build.yml by @eltociear in #22
- Fix typos by @shirayu in #32
- Allow building on Windows using
clang-cl
toolchain by @dcoles in #6 - Do not pass explicitly -O2 flag to compiler in Release build by @traversaro in #3
- Fix build. by @dan-zheng in #35
- reset conversation by @kishida in #34
- Rename BUILD to BUILD.bazel. by @dan-zheng in #36
- Add --eot_line option by @shirayu in #33
- clean up formatting after 129e66a by @austinvhuang in #58
- Warning fixes: unused member, cast, unused function by @copybara-service in #61
- CLI args + README improvements + cleanup by @austinvhuang in #66
- Fix for Android's 32-bit off_t. Fixes #62 by @copybara-service in #63
- Add DEVELOPERS notes on using gemma as a library by @austinvhuang in #71
- Add clang-tidy, fix narrowing issues, fix constness by @enum-class in #65
- Support Bazel builds. Fixes #16 by @copybara-service in #75
- Add instructions to download from Hugging Face Hub by @osanseviero in #74
- Separate KV cache from GemmaImpl by @ufownl in #81
- Avoid fadvise on older Android. Fixes #84 by @copybara-service in #85
- use hwy/simd for RMSNorm(f, bf, f) calculation by @enum-class in #78
- Use highway simd for SquaredL2 calculation by @enum-class in #77
- Detect and print build type. Refs #88 by @copybara-service in #92
- libgemma API refactor - decouple from interactive repl demo specifics, add hello world example using libgemma by @austinvhuang in #82
- Additional cleanup after libgemma refactor #82 by @austinvhuang in #87
- Use bf16-rounded sqrt for scaling embeddings to match Gemma by @copybara-service in #93
- Remove unused ascii banner string by @copybara-service in #96
- Allow changing k parameter of
SampleTopK
as a compiler flag by @ufownl in #97 - Add missing log that point to a failed Generation by @zeerd in #98
New Contributors
- @austinvhuang made their first contribution in #24
- @eltociear made their first contribution in #22
- @shirayu made their first contribution in #32
- @dcoles made their first contribution in #6
- @traversaro made their first contribution in #3
- @dan-zheng made their first contribution in #35
- @kishida made their first contribution in #34
- @copybara-service made their first contribution in #61
- @enum-class made their first contribution in #65
- @osanseviero made their first contribution in #74
- @ufownl made their first contribution in #81
- @zeerd made their first contribution in #98
Full Changelog: v0.1.0...v0.1.1