3.4.0 (2025-01-08)
Features
- token prediction (speculative decoding) (#405) (632a7bf) (documentation: Token Prediction)
controlledEvaluate
(#405) (632a7bf) (documentation: Low Level API)evaluateWithMetadata
(#405) (632a7bf) (documentation: Low Level API)- reranking (#405) (632a7bf) (documentation: Reranking Documents)
- token confidence (#405) (632a7bf) (documentation: Low Level API)
experimentalChunkDocument
(#405) (632a7bf)- build on arm64 using LLVM (#405) (632a7bf)
- try compiling with LLVM on Windows x64 when available (#405) (632a7bf)
- minor: dynamically load
llama.cpp
backends (#405) (632a7bf) - minor: more token values support in
SpecialToken
(#405) (632a7bf) - minor: improve memory usage estimation (#405) (632a7bf)
Bug Fixes
- check for Rosetta usage on macOS x64 when using the
inspect gpu
command (#405) (632a7bf) - detect running under Rosetta on Apple Silicone and show an error message instead of crashing (#405) (632a7bf)
- switch from
"nextTick"
to"nextCycle"
for the default batch dispatcher (#405) (632a7bf) - remove deprecated CLS token (#405) (632a7bf)
- pipe error logs in
inspect gpu
command (#405) (632a7bf)
Shipped with llama.cpp
release b4435
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)