3.4.0 (2025-01-08)

token prediction (speculative decoding) (#405) (632a7bf) (documentation: Token Prediction)
controlledEvaluate (#405) (632a7bf) (documentation: Low Level API)
evaluateWithMetadata (#405) (632a7bf) (documentation: Low Level API)
reranking (#405) (632a7bf) (documentation: Reranking Documents)
token confidence (#405) (632a7bf) (documentation: Low Level API)
experimentalChunkDocument (#405) (632a7bf)
build on arm64 using LLVM (#405) (632a7bf)
try compiling with LLVM on Windows x64 when available (#405) (632a7bf)
minor: dynamically load llama.cpp backends (#405) (632a7bf)
minor: more token values support in SpecialToken (#405) (632a7bf)
minor: improve memory usage estimation (#405) (632a7bf)

check for Rosetta usage on macOS x64 when using the inspect gpu command (#405) (632a7bf)
detect running under Rosetta on Apple Silicone and show an error message instead of crashing (#405) (632a7bf)
switch from "nextTick" to "nextCycle" for the default batch dispatcher (#405) (632a7bf)
remove deprecated CLS token (#405) (632a7bf)
pipe error logs in inspect gpu command (#405) (632a7bf)

Shipped with llama.cpp release b4435

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

Provide feedback