I am very happy to present you the second version of Constantine.
I thank the Ethereum Foundation for their sponsorship on implementing Torus-based cryptography to make the performance of Secret Leader Election viable.
The highlight of this release, and the inspiration for its name is the introduction of specialized ARM64 assembly for most key field operations and SHA256. Thanks to it the latest M4 Max is within 5% of an overclocked AMD Ryzen 9950X on single-threaded performance (though multithreaded performance is lackluster due to Apple very aggressive powersaving). Currently this is only for MacOS but will be coming to Linux, Android and iOS.
The second highlight of this release is significant backend work for JIT compiling elliptic curves to Nvidia and AMD GPUs.
Backends for x86 and ARM have also been explored and could present an alternative to provide libconstantine as a fully optimized assembly file, at least at Ethereum and elliptic curves level. This would streamline build systems by removing the Nim compiler. and also make it easy to vectorize the library.
Constantine is currently being scoped for a security audit, after which a 1.0 version should follow.
You can review the scope here: #483, and I'm looking for sponsors.
An independent benchmark showed that Constantine is as of January 2025 the fastest backend for EIP-4844 / KZG polynomial commitments: https://github.com/grandinetech/rust-kzg.
The Nim minimum version has been updated to Nim v2.2.0. 99% of Constantine should still work with v1.6.16 and v2.0.8 except the Torus-basec cryptography part.
Now let's review the main changes per-category
Ethereum
The focus for this release has been Ethereum Execution layer with the introduction of:
- Keccak hash function
- ECDSA signatures over secp256k1
- RIPEMD160 hash function and EVM precompile
- KZG Point Evaluation EVM precompile
- ECRECOVER precompile (under review)
- repricing of EIP-2537 (BLS12-381 precompiles)
Performance on x86 and ARM is detailed in: #520
The precompiles are exposed in C, Nim and Rust except ECRECOVER which is under review for corner cases that may not be covered by Ethereum tests and "low performance" (a 1.7x perf advantage at low-level turns to 1x no advantage at elliptic curve level - #446)
The inner product argument (IPA) multi-proof primitives for Ethereum Verkle Tries have been thoroughly reviewed and improved.
On the Consensus side, sponsored work has been done on accelerating multi-exponentiation in 𝔾ₜ pairing group via Torus-based cryptography for the purposes of secret leader election: https://ethresear.ch/t/the-return-of-torus-based-cryptography-whisk-and-curdleproof-in-the-target-group/16678/4
Proof-system
Multilinear extensions of polynomials have been added. This is a prerequisite for sumchecks, the current state-of-the-art proving technique in research.
A Groth16 prover has been submitted in a PR by @Vindaar and is under final review.
Backend
We added an ARM64 compile-time assembler and 90% of the main computing bottlenecks now have ARM64 acceleration.
Performance: #513
Exploration in LLVM JIT compilation for GPU has been progressing with:
- the Nvidia backend now having a prototype serial MSM thanks to @Vindaar
- AMD GPUs being supported
The threadpool had a task garbage collection fix on ARM64 (and other weak memory models ISA)
Misc
Constantine can now generate benchmarks in https://zka.lc format with
git clone https://github.com/mratsim/constantine
cd constantine
nimble make_zkalc
bin/constantine-bench-zkalc --curve=BLS12_381 --o=myoutputfile.json
In CI, the Nim installation script has been completely rewritten to install from either of:
- source
- nightlies
- website
and can handle versioning for all 3 as well as the new Linux and MacOS ARM64 builds, and testing on Linux in 32-bit mode.
Future work
Please refer to https://github.com/mratsim/constantine/blob/v0.2.0/PLANNING.md and the issue tracker https://github.com/mratsim/constantine/issues?q=is%3Aopen+is%3Aissue+label%3A%22enhancement+%3Ashipit%3A%22+
Here are some of the work stream I want to prioritize
- Work is currently being done to improve the LLVM backend codegen. It may provide multiple advantages:
- pure assembly: remove GCC vs Clang compiler differences (may be as high as 20%).
- we can ensure constant-time properties without the compiler rugpulling us.
- vectorization can be just changing
i256
to<i256 x 4>
and reusing the exact same LLVM IR.
- GPU acceleration
- Ethereum PeerDAS / Data Availability Sampling (Erasure coding + 2D KZG proofs)
- Sumchecks Polynomial commitment scheme (PCS)
- Small fields support like Baby Bear, Koala Bear, Goldilocks and Mersenne31
- FRI, Deep FRI and STIR PCS.
- Blake2 to finish EVM precompiles.
- Poseidon2 hash function
-- Mamy
Detailed changes (auto-generated)
- Multilinear extensions of polynomials by @mratsim in #423
- add: multiproof consistency test by @agnxsh in #424
- fix
scalarMul_vartime
for tiny multiple 5 by @Vindaar in #426 - feat(bench): PoC of integration with zkalc by @mratsim in #425
- 𝔾ₜ exponentiation, with endomorphism acceleration by @mratsim in #429
- Formal verification: resurrect fiat-crypto with formally verified assembly by @mratsim in #430
- Constant-time 𝔾ₜ exponentiation with endomorphism acceleration by @mratsim in #431
- fix(cryptofuzz): expose all cryptofuzz tested primitives in lowlevel_* by @mratsim in #432
- fix(arith): bug in vartime inversion when using fused inverse+multiply by factor - found by Guido Vranken by @mratsim in #433
- Compatibility with Nim v1.6.x, Nim v2.0.x, Nim v2.2.x by @mratsim in #434
- fix(gcc): compatibility with GCC14 by @mratsim in #435
- 𝔾ₜ multi-exponentiations by @mratsim in #436
- feat(public API): expose hashing to curve for BN254 and BLS12-381 by @mratsim in #437
- fix(nvidia): reorg + rename following #402 by @mratsim in #439
- Nvidia backend: update for LLVM 17 by @mratsim in #440
- fix: 32-bit on 64-bit compilation by @mratsim in #441
- fix test suite for banderwagon by @advaita-saha in #442
- feat(secp256k1): add endomorphism acceleration by @mratsim in #444
- workaround #448: deactivated secp256k1 tests due to bug on Windows with assembly by @mratsim in #449
- research: update LLVM x86 compiler and JIT by @mratsim in #452
- AMDGPU JIT compiler by @mratsim in #453
- LLVM: field addition with saturated fields by @mratsim in #456
- Verkle ipa multiproof is now internally consistent by @mratsim in #458
- fix MSM bench using 64-bit scalars after #444 [skip ci] by @mratsim in #460
- fixes when nimvm mixing different types by @ringabout in #459
- fixes another mixing types by @ringabout in #461
- Nvidia remastered by @mratsim in #464
- Halo2 0.4 and Halo2curves 0.7 compat + Rust warnings fixes by @mratsim in #468
- fix: template typechecking is more stringent by @mratsim in #470
- CI: replace apt-fast by apt-get + Nim v2.2.x in CI as Nim v2.0.10 is broken. by @mratsim in #473
- Fixes for Nim v2.2 by @mratsim in #476
- Implement finite field
ccopy
,neg
,cneg
,nsqr
, ... for CUDA target by @Vindaar in #466 - CI: drop old nim compiler versions by @mratsim in #486
- Torus-acceleration for multiexponentiation on GT by @mratsim in #485
- fix(MSM): properly handle edge condition in parallel MSM when bits is exactly divided by c by @mratsim in #484
- Crandall primes by @mratsim in #445
- Add KZG point precompile by @Vindaar in #489
- Keccak256 and SHA3-256 by @mratsim in #494
- Keccak optimizations by @mratsim in #498
- Improve Rust build script by @DaniPopes in #500
- fix(threadpool): fix task garbage collection synchronization on weak memory models by @mratsim in #503
- refactor(add-carry): with Clang on non-x86 (for example MacOS) use builtin add-carry instead of u128 by @mratsim in #411
- Add ECDSA over secp256k1 signatures and verification by @Vindaar in #490
- keccak: OpenSSL skip MacOS test by @mratsim in #508
- fix(threadpool regression): deadlock on Windows on fibonacci by @mratsim in #509
- C bindings for Banderwagon by @Richa-iitr in #477
- Nvidia MSM proof of concept (serial) by @Vindaar in #480
- opt(ecc): jacobian doubling improvement by @mratsim in #510
- adds nodecl to imported types by @ringabout in #512
- Arm64 assembly by @mratsim in #513
- Rust bindings update - includes Banderwagon by @mratsim in #514
- upstream CI: support Apple Clang before macOS 15, fix #516 by @mratsim in #517
- Add RIPEMD160 hash function and EVM precompile by @Vindaar in #505
- Add
ECRecover
EVM precompile by @Vindaar in #504 - SHA256 ARM64 hardware accel: 6.4x acceleration (Apple Silicon only) by @mratsim in #518
- eip2537: repricing by @mratsim in #493
- update Ethereum benches by @mratsim in #520
- CI: Test on MacOS and Linux ARM64 / Aarch64 by @mratsim in #524
New Contributors
- @ringabout made their first contribution in #459
- @DaniPopes made their first contribution in #500
- @Richa-iitr made their first contribution in #477
Full Changelog: v0.1.0...v0.2.0