Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add stable benchmark #92

Merged
merged 10 commits into from
Oct 2, 2023
Merged

add stable benchmark #92

merged 10 commits into from
Oct 2, 2023

Conversation

chenyan-dfinity
Copy link
Contributor

No description provided.

@github-actions
Copy link

github-actions bot commented Sep 30, 2023

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Warning
Skip _out/collections/README.md, due to the number of tables mismatches from main branch.

SHA-2

Note
Same as main branch, skipping.

Certified map

binary_size generate 10k max mem inc witness upgrade
Motoko 206_295 4_390_018_572 3_429_984 519_711 327_767 225_144_790
Rust 521_976 ($\textcolor{red}{0.04\%}$) 6_202_432_827 2_228_224 983_997 288_528 5_811_201_292

Statistics

  • binary_size: 0.04%
  • max_mem: no change
  • cycles: no change

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal upgrade
Motoko 236_673 491_766 16_294 12_672 14_136 ($\textcolor{red}{0.16\%}$) 122_439
Rust 806_537 ($\textcolor{red}{0.02\%}$) 541_266 ($\textcolor{red}{0.00\%}$) 86_052 107_287 117_056 1_686_510

DIP721 NFT

binary_size init mint_token transfer_token upgrade
Motoko 194_938 466_439 22_357 4_729 65_612
Rust 820_893 ($\textcolor{red}{0.03\%}$) 210_081 ($\textcolor{red}{0.01\%}$) 324_368 81_020 1_860_352

Statistics

  • binary_size: 0.02% [0.02%, 0.03%]
  • max_mem: no change
  • cycles: 0.06% [-0.09%, 0.20%]

Heartbeat

binary_size heartbeat
Motoko 123_509 3_758 ($\textcolor{green}{-49.21\%}$)
Rust 23_826 785

Timer

Note
Same as main branch, skipping.

Statistics

  • binary_size: no change
  • max_mem: no change
  • cycles: no change

Overall Statistics

  • binary_size: 0.03% [0.02%, 0.04%]
  • max_mem: no change
  • cycles: 0.06% [-0.09%, 0.20%]

@github-actions
Copy link

github-actions bot commented Sep 30, 2023

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.
The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

  • generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
  • max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
  • batch_get 50. Find 50 elements from the collection.
  • batch_put 50. Insert 50 elements to the collection.
  • batch_remove 50. Remove 50 elements from the collection.
  • upgrade. Upgrade the canister with the same Wasm module. For non-stable benchmarks, the map state is persisted by serializing and deserializing states into stable memory. For stable benchmarks, the upgrade takes no cycles, as the state is already in the stable memory.

💎 Takeaways

  • The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
  • We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
  • Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
  • Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.
  • The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.
  • Different library has different ways for persisting data during upgrades, there are mainly three categories:
    • Use stable variable directly in Motoko: zhenya_hashmap, btree, vector
    • Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs
    • Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs
  • The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get zero cost upgrade.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 160_033 6_984_044_834 61_987_792 288_670 5_536_856_465 310_195 9_128_777_557
triemap 163_286 11_463_656_817 74_216_112 222_926 549_435 540_205 13_075_150_332
rbtree 157_961 5_979_230_865 57_996_000 88_905 268_573 278_339 5_771_873_746
splay 159_768 11_568_250_977 53_995_936 552_014 581_765 810_321 3_722_468_031
btree 187_709 8_224_242_624 31_103_952 277_542 384_171 429_041 2_517_935_226
zhenya_hashmap 160_321 2_201_622_488 22_773_040 48_627 61_839 70_872 2_695_441_915
btreemap_rs 494_261 1_654_113_949 27_590_656 66_889 112_603 81_249 2_401_229_430
imrc_hashmap_rs 500_199 2_407_082_660 244_973_568 32_962 163_913 98_591 5_209_975_418
hashmap_rs 487_986 403_296_624 73_138_176 17_350 21_647 20_615 957_579_445

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50 upgrade
heap 147_450 4_684_518_110 29_995_896 511_505 186_471 487_212 2_655_603_064
heap_rs 481_753 123_102_208 18_284_544 53_480 18_264 53_621 349_011_816

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 150_816 2_082_623 65_584 73_092 671_517 127_592 2_468_118
vector 152_363 1_588_260 24_520 105_191 149_932 148_094 3_837_918
vec_rs 480_829 265_643 1_376_256 12_986 25_331 21_215 2_854_587

Stable structures

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
btreemap_rs 494_261 70_231_886 2_555_904 57_208 86_708 79_740 100_477_350
btreemap_stable_rs 497_838 3_676_196_177 2_621_440 2_190_807 4_013_463 6_777_299 0
heap_rs 481_753 6_214_821 2_293_760 45_761 18_496 45_732 18_367_724
heap_stable_rs 469_093 240_377_401 458_752 2_038_566 209_047 2_023_426 0
vec_rs 480_829 2_866_842 2_293_760 12_986 14_081 13_678 16_575_110
vec_stable_rs 464_782 55_585_887 458_752 52_650 67_745 69_641 0

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

  • SHA-2 benchmarks
    • SHA-256/SHA-512. Compute the hash of a 1M Wasm binary.
    • account_id. Compute the ledger account id from principal, based on SHA-224.
    • neuron_id. Compute the NNS neuron id from principal, based on SHA-256.
  • Certified map. Merkle Tree for storing key-value pairs and generate witness according to the IC Interface Specification.
    • generate 10k. Insert 10k 7-character word as both key and value into the certified map.
    • max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
    • inc. Increment a counter and insert the counter value into the map.
    • witness. Generate the root hash and a witness for the counter.
    • upgrade. Upgrade the canister with the same Wasm. In Motoko, we use stable variable. In Rust, we convert the tree to a vector before serialization.

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 173_034 247_480_401 228_033_044 30_017 20_760
Rust 498_225 82_511_960 56_526_000 42_479 44_437

Certified map

binary_size generate 10k max mem inc witness upgrade
Motoko 206_295 4_390_018_572 3_429_984 519_711 327_767 225_144_790
Rust 521_976 6_202_432_827 2_228_224 983_997 288_528 5_811_201_292

Sample Dapps

Measure the performance of some typical dapps:

  • Basic DAO,
    with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
  • DIP721 NFT

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal upgrade
Motoko 236_673 491_766 16_294 12_672 14_136 122_439
Rust 806_537 541_266 86_052 107_287 117_056 1_686_510

DIP721 NFT

binary_size init mint_token transfer_token upgrade
Motoko 194_938 466_439 22_357 4_729 65_612
Rust 820_893 210_081 324_368 81_020 1_860_352

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

  • setTimer measures both the setTimer(0) method and the execution of empty job.
  • It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
    of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

binary_size heartbeat
Motoko 123_509 3_758
Rust 23_826 785

Timer

binary_size setTimer cancelTimer
Motoko 129_780 15_227 1_684
Rust 441_467 43_465 7_594

@chenyan-dfinity chenyan-dfinity merged commit b300fec into main Oct 2, 2023
@chenyan-dfinity chenyan-dfinity deleted the stable branch October 2, 2023 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants