-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Insert MSM and FFT code and their benchmarks. #86
Insert MSM and FFT code and their benchmarks. #86
Conversation
ae74f12
to
8797915
Compare
Hi team, Rationale for this PR is mentioned in privacy-scaling-explorations/halo2#84. Following this I'd like to:
Depending on what is merged first either this PR or privacy-scaling-explorations/halo2#202 will need to be updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I'd like to run benchmars with and without this commit in our server cc: @AronisAt79 @ed255 So that we can also see if we really benefit from multithreading without exploding RAM comsumption.
Once we wee no RAM explosions, I'mm happy to merge this!
If you have any benchmark results in other machines or memory profiling @mratsim it would be nice if you could share it.
I'll try to run this locally and get it. But only have 16 threads avaliable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Checked locally the fft.rs
and msm.rs
are ported from halo2_proofs
with only 2 lines of non-logic change.
Also got a question about MSM benchmark.
…ltiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation.
The EC points for benchmark should be moved so it's not repeated and it's parallelized, running the 2^18 to 2^22 part of the benches take almost 30min. Looking at the code for Point::random, 2 things can be slow when creating a point randomly, halo2curves/src/derive/curve.rs Lines 365 to 377 in 6e2ff38
But BN254 has a cofactor of 1, so only sqrt is slow. On my optimized library (using addition chains not Tonelli Shanks) it takes 6877ns. |
We could file an issue for that! |
The bench came from my library, with addition chains ;), so likely in Halo2curves it might even take a minute to generate the points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Hold on on merging, @einar-taiko is looking into parallelizing and caching the generation of test (coeffs, points) pairs |
Didn't do it as per your last comment :) |
Laptop measurements: k=22: 109 sec k=16: 1 sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new bench with parallel point generation now completes under a minute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a small suggestion, could be ignored.
Here are my benchmarks on my laptop (i9-11980HK 8 cores). Unfortunately I have to go to zkSummit just after so benchmark on my 18-core workstation will have to wait end of week. For 2^16 there is a factor 3-3.5x between halo2curves (90ms) and Gnark (30ms) / Constantine (22.5ms) Halo2curvesGnarkConstantine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need to add required-features = ["multicore"]
for msm.rs
to pass the CI build without default features.
ee7cb86
* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80) * feat: add field conversion to/from `[u64;4]` * Added conversion tests * Added `montgomery_reduce_short` for no-asm * For bn256, uses assembly conversion when asm feature is on * fix: remove conflict for asm * chore: bump rust-toolchain to 1.67.0 * Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77) * Add `Legendre` trait and macro - Add Legendre macro with norm and legendre symbol computation - Add macro for automatic implementation in prime fields * Add legendre macro call for prime fields * Remove unused imports * Remove leftover * Add `is_quadratic_non_residue` for hash_to_curve * Add `legendre` function * Compute modulus separately * Substitute division for shift * Update modulus computation * Add quadratic residue check func * Add quadratic residue tests * Add hash_to_curve bench * Implement Legendre trait for all curves * Move misplaced comment * Add all curves to hash bench * fix: add suggestion for legendre_exp * fix: imports after rebase * Add simplified SWU method (privacy-scaling-explorations#81) * Fix broken link * Add simple SWU algorithm * Add simplified SWU hash_to_curve for secp256r1 * add: sswu z reference * update MAP_ID identifier Co-authored-by: Han <[email protected]> --------- Co-authored-by: Han <[email protected]> * Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82) * refactor: bring back curve algorithms for `a = 0` * fix: clippy warning * fix: Improve serialization for prime fields (privacy-scaling-explorations#85) * fix: Improve serialization for prime fields Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of that for (de)serializers that are human-readable (concretely, json). - Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries. - Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use. - Enhanced error checking in the custom serialization methods to ensure valid field elements. - Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking. * fixup! fix: Improve serialization for prime fields --------- Co-authored-by: Carlos Pérez <[email protected]> * refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88) * refactor: implement (De)Serialization of points using the `GroupEncoding` trait - Updated curve point (de)serialization logic from the internal representation to the representation offered by the implementation of the `GroupEncoding` trait. * fix: add explicit json serde tests * Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86) * Insert MSM and FFT code and their benchmarks. Resolves taikoxyz/zkevm-circuits#150. * feedback * Add instructions * feeback * Implement feedback: Actually supply the correct arguments to `best_multiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation. * Use slicing and parallelism to to decrease running time. Laptop measurements: k=22: 109 sec k=16: 1 sec * Refactor msm * Refactor fft * Update module comments * Fix formatting * Implement suggestion for fixing CI --------- Co-authored-by: David Nevado <[email protected]> Co-authored-by: Han <[email protected]> Co-authored-by: François Garillot <[email protected]> Co-authored-by: Carlos Pérez <[email protected]> Co-authored-by: einar-taiko <[email protected]>
* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80) * feat: add field conversion to/from `[u64;4]` * Added conversion tests * Added `montgomery_reduce_short` for no-asm * For bn256, uses assembly conversion when asm feature is on * fix: remove conflict for asm * chore: bump rust-toolchain to 1.67.0 * Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77) * Add `Legendre` trait and macro - Add Legendre macro with norm and legendre symbol computation - Add macro for automatic implementation in prime fields * Add legendre macro call for prime fields * Remove unused imports * Remove leftover * Add `is_quadratic_non_residue` for hash_to_curve * Add `legendre` function * Compute modulus separately * Substitute division for shift * Update modulus computation * Add quadratic residue check func * Add quadratic residue tests * Add hash_to_curve bench * Implement Legendre trait for all curves * Move misplaced comment * Add all curves to hash bench * fix: add suggestion for legendre_exp * fix: imports after rebase * Add simplified SWU method (privacy-scaling-explorations#81) * Fix broken link * Add simple SWU algorithm * Add simplified SWU hash_to_curve for secp256r1 * add: sswu z reference * update MAP_ID identifier Co-authored-by: Han <[email protected]> --------- Co-authored-by: Han <[email protected]> * Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82) * refactor: bring back curve algorithms for `a = 0` * fix: clippy warning * fix: Improve serialization for prime fields (privacy-scaling-explorations#85) * fix: Improve serialization for prime fields Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of that for (de)serializers that are human-readable (concretely, json). - Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries. - Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use. - Enhanced error checking in the custom serialization methods to ensure valid field elements. - Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking. * fixup! fix: Improve serialization for prime fields --------- Co-authored-by: Carlos Pérez <[email protected]> * refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88) * refactor: implement (De)Serialization of points using the `GroupEncoding` trait - Updated curve point (de)serialization logic from the internal representation to the representation offered by the implementation of the `GroupEncoding` trait. * fix: add explicit json serde tests * Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86) * Insert MSM and FFT code and their benchmarks. Resolves taikoxyz/zkevm-circuits#150. * feedback * Add instructions * feeback * Implement feedback: Actually supply the correct arguments to `best_multiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation. * Use slicing and parallelism to to decrease running time. Laptop measurements: k=22: 109 sec k=16: 1 sec * Refactor msm * Refactor fft * Update module comments * Fix formatting * Implement suggestion for fixing CI * Re-export also mod `pairing` and remove flag `reexport` to alwasy re-export (privacy-scaling-explorations#93) fix: re-export also mod `pairing` and remove flag `reexport` to alwasy re-export * fix regression in privacy-scaling-explorations#93 reexport field benches aren't run (privacy-scaling-explorations#94) fix regression in privacy-scaling-explorations#93, field benches aren't run * Fast modular inverse - 9.4x acceleration (privacy-scaling-explorations#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]> * Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration (privacy-scaling-explorations#95) * Derivatives of the Pornin's method (taikoxyz#3) * renaming file * make cargo fmt happy * clarifications from privacy-scaling-explorations#95 (comment) [skip ci] * Formatting and slightly changing a comment --------- Co-authored-by: Aleksei Vambol <[email protected]> * chore: delete bernsteinyang module (replaced by ff_inverse) * Bump version to 0.4.1 --------- Co-authored-by: David Nevado <[email protected]> Co-authored-by: Han <[email protected]> Co-authored-by: François Garillot <[email protected]> Co-authored-by: Carlos Pérez <[email protected]> Co-authored-by: einar-taiko <[email protected]> Co-authored-by: Mamy Ratsimbazafy <[email protected]> Co-authored-by: Aleksei Vambol <[email protected]>
This PR moves MSM and FFT to
halo2curves
as suggested in #84.Adds benchmarks for FFT and MSM to serve as comparison standard in the future: current situation vs privacy-scaling-explorations/halo2#40 vs #29 vs post #163.
Note: This might require moving privacy-scaling-explorations/halo2#202 to
halo2curves
cc @jonathanpwangResolves taikoxyz/zkevm-circuits#150.