Moc inline with new meter alloc o3 #86

crusso · 2023-09-17T02:45:19Z

base: new meter; no wasm-opt against selective inlining including allocation + wasm-op 03

…h-new-meter

github-actions · 2023-09-17T14:19:22Z

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

	binary_size	generate 1m	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	148_740 ($\textcolor{green}{-6.46\%}$)	8_341_274_314 ($\textcolor{green}{-12.17\%}$)	61_987_732	343_814 ($\textcolor{green}{-12.48\%}$)	6_578_732_491 ($\textcolor{green}{-9.90\%}$)	370_088 ($\textcolor{green}{-12.56\%}$)
triemap	152_665 ($\textcolor{green}{-5.58\%}$)	13_751_819_242 ($\textcolor{green}{-20.51\%}$)	74_216_052	253_688 ($\textcolor{green}{-26.69\%}$)	655_809 ($\textcolor{green}{-22.04\%}$)	646_250 ($\textcolor{green}{-21.67\%}$)
rbtree	156_021 ($\textcolor{green}{-3.70\%}$)	7_083_625_905 ($\textcolor{green}{-16.30\%}$)	57_995_940	113_092 ($\textcolor{green}{-28.71\%}$)	316_065 ($\textcolor{green}{-17.92\%}$)	325_269 ($\textcolor{green}{-23.89\%}$)
splay	148_292 ($\textcolor{green}{-5.78\%}$)	13_216_992_673 ($\textcolor{green}{-24.14\%}$)	53_995_876	626_740 ($\textcolor{green}{-25.49\%}$)	659_612 ($\textcolor{green}{-25.44\%}$)	919_398 ($\textcolor{green}{-25.54\%}$)
btree	212_512 ($\textcolor{green}{-0.63\%}$)	10_257_934_163 ($\textcolor{green}{-22.70\%}$)	31_103_892	352_463 ($\textcolor{green}{-23.58\%}$)	481_276 ($\textcolor{green}{-23.52\%}$)	532_764 ($\textcolor{green}{-24.60\%}$)
zhenya_hashmap	158_758 ($\textcolor{green}{-5.75\%}$)	3_095_567_422 ($\textcolor{green}{-20.14\%}$)	65_987_480	78_897 ($\textcolor{green}{-26.01\%}$)	90_670 ($\textcolor{green}{-30.94\%}$)	91_012 ($\textcolor{green}{-41.69\%}$)
btreemap_rs	446_267	1_797_752_179	13_762_560	74_544	126_136	92_839
imrc_hashmap_rs	446_166	2_571_892_333	122_454_016	38_956	179_095	115_561
hashmap_rs	439_346	447_664_894	36_536_320	22_228	27_664	25_290

Priority queue

	binary_size	heapify 1m	max mem	pop_min 50	put 50
heap	139_700 ($\textcolor{green}{-8.29\%}$)	5_684_588_471 ($\textcolor{green}{-22.09\%}$)	29_995_836	619_848 ($\textcolor{green}{-23.77\%}$)	228_031 ($\textcolor{green}{-23.37\%}$)
heap_rs	437_278	142_914_793	9_109_504	59_850	23_726

Growable array

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500
buffer	150_916 ($\textcolor{green}{-6.51\%}$)	2_560_913 ($\textcolor{green}{-21.51\%}$)	65_508	94_485 ($\textcolor{green}{-24.61\%}$)	799_435 ($\textcolor{green}{-23.32\%}$)	169_485 ($\textcolor{green}{-19.42\%}$)
vector	152_122 ($\textcolor{green}{-5.38\%}$)	2_152_008 ($\textcolor{green}{-22.18\%}$)	24_764	149_645 ($\textcolor{green}{-24.24\%}$)	206_427 ($\textcolor{green}{-21.90\%}$)	196_488 ($\textcolor{green}{-26.47\%}$)
vec_rs	435_834	290_143	655_360	17_605	31_014	25_400

Statistics

binary_size: -5.34% [-6.67%, -4.02%]
max_mem: no change
cycles: -22.61% [-24.25%, -20.98%]

SHA-2

	binary_size	SHA-256	SHA-512	account_id	neuron_id
Motoko	186_405 ($\textcolor{green}{-4.98\%}$)	271_004_746 ($\textcolor{green}{-23.19\%}$)	257_809_157 ($\textcolor{green}{-23.97\%}$)	34_239 ($\textcolor{green}{-23.60\%}$)	24_795 ($\textcolor{green}{-22.31\%}$)
Rust	528_234	82_789_387	56_794_263	50_651	53_532

Certified map

	binary_size	generate 10k	max mem	inc	witness
Motoko	202_262 ($\textcolor{green}{-1.37\%}$)	4_739_032_190 ($\textcolor{green}{-24.21\%}$)	3_429_924	562_235 ($\textcolor{green}{-24.27\%}$)	400_298 ($\textcolor{green}{-20.96\%}$)
Rust	469_955	6_359_442_714	1_081_344	1_012_174	305_119

Statistics

binary_size: -3.18% [-14.57%, 8.22%]
max_mem: no change
cycles: -23.22% [-24.10%, -22.33%]

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	257_901 ($\textcolor{green}{-7.03\%}$)	47_151 ($\textcolor{green}{-8.07\%}$)	22_255 ($\textcolor{green}{-11.74\%}$)	18_557 ($\textcolor{green}{-11.46\%}$)	19_609 ($\textcolor{green}{-13.00\%}$)
Rust	763_017	552_075	105_203	128_753	139_539

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	204_979 ($\textcolor{green}{-10.89\%}$)	17_690 ($\textcolor{green}{-8.18\%}$)	29_708 ($\textcolor{green}{-8.02\%}$)	8_778 ($\textcolor{green}{-9.76\%}$)
Rust	828_238	146_257	380_260	93_763

Statistics

binary_size: -8.96% [-21.15%, 3.23%]
max_mem: no change
cycles: -10.03% [-11.54%, -8.53%]

Heartbeat

	binary_size	heartbeat
Motoko	127_807 ($\textcolor{green}{-10.06\%}$)	19_236 ($\textcolor{green}{-4.40\%}$)
Rust	25_650	549 ($\textcolor{green}{-53.44\%}$)

Timer

	binary_size	setTimer	cancelTimer
Motoko	136_166 ($\textcolor{green}{-8.76\%}$)	51_395 ($\textcolor{green}{-5.71\%}$)	4_605 ($\textcolor{green}{-7.49\%}$)
Rust	470_693	69_727	11_405

Statistics

binary_size: -8.76%
max_mem: no change
cycles: -6.60% [-12.23%, -0.97%]

Garbage Collection

Note
Same as main branch, skipping.

Actor class

	binary size	put new bucket	put existing bucket	get
Map	282_051 ($\textcolor{green}{-5.26\%}$)	770_378 ($\textcolor{green}{-1.69\%}$)	16_464 ($\textcolor{green}{-3.41\%}$)	16_608 ($\textcolor{green}{-5.26\%}$)

Statistics

binary_size: no change
max_mem: no change
cycles: -3.91% [-5.93%, -1.89%]

Publisher & Subscriber

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	151_854 ($\textcolor{green}{-8.95\%}$)	136_083 ($\textcolor{green}{-10.41\%}$)	28_563 ($\textcolor{green}{-4.65\%}$)	11_918 ($\textcolor{green}{-5.08\%}$)	22_822 ($\textcolor{green}{-5.14\%}$)	6_412 ($\textcolor{green}{-6.69\%}$)
Rust	511_870	565_407	71_728	44_318	95_767	53_941

Statistics

binary_size: -9.68% [-14.28%, -5.08%]
max_mem: no change
cycles: -5.39% [-6.45%, -4.34%]

Overall Statistics

binary_size: -6.28% [-7.53%, -5.03%]
max_mem: no change
cycles: -18.21% [-20.07%, -16.36%]

github-actions · 2023-09-17T14:19:24Z

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

btree comes from mops.one/stableheapbtreemap.

zhenya_hashmap comes from mops.one/map.

vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

	binary_size	generate 1m	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	148_740	8_341_274_314	61_987_732	343_814	6_578_732_491	370_088
triemap	152_665	13_751_819_242	74_216_052	253_688	655_809	646_250
rbtree	156_021	7_083_625_905	57_995_940	113_092	316_065	325_269
splay	148_292	13_216_992_673	53_995_876	626_740	659_612	919_398
btree	212_512	10_257_934_163	31_103_892	352_463	481_276	532_764
zhenya_hashmap	158_758	3_095_567_422	65_987_480	78_897	90_670	91_012
btreemap_rs	446_267	1_797_752_179	13_762_560	74_544	126_136	92_839
imrc_hashmap_rs	446_166	2_571_892_333	122_454_016	38_956	179_095	115_561
hashmap_rs	439_346	447_664_894	36_536_320	22_228	27_664	25_290

Priority queue

	binary_size	heapify 1m	max mem	pop_min 50	put 50
heap	139_700	5_684_588_471	29_995_836	619_848	228_031
heap_rs	437_278	142_914_793	9_109_504	59_850	23_726

Growable array

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500
buffer	150_916	2_560_913	65_508	94_485	799_435	169_485
vector	152_122	2_152_008	24_764	149_645	206_427	196_488
vec_rs	435_834	290_143	655_360	17_605	31_014	25_400

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

SHA-2 benchmarks
- SHA-256/SHA-512. Compute the hash of a 1M Wasm binary.
- account_id. Compute the ledger account id from principal, based on SHA-224.
- neuron_id. Compute the NNS neuron id from principal, based on SHA-256.
Certified map. Merkle Tree for storing key-value pairs and generate witness according to the IC Interface Specification.
- generate 10k. Insert 10k 7-character word as both key and value into the certified map.
- max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
- inc. Increment a counter and insert the counter value into the map.
- witness. Generate the root hash and a witness for the counter.

SHA-2

	binary_size	SHA-256	SHA-512	account_id	neuron_id
Motoko	186_405	271_004_746	257_809_157	34_239	24_795
Rust	528_234	82_789_387	56_794_263	50_651	53_532

Certified map

	binary_size	generate 10k	max mem	inc	witness
Motoko	202_262	4_739_032_190	3_429_924	562_235	400_298
Rust	469_955	6_359_442_714	1_081_344	1_012_174	305_119

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO,
with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	257_901	47_151	22_255	18_557	19_609
Rust	763_017	552_075	105_203	128_753	139_539

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	204_979	17_690	29_708	8_778
Rust	828_238	146_257	380_260	93_763

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	127_807	19_236
Rust	25_650	549

Timer

	binary_size	setTimer	cancelTimer
Motoko	136_166	51_395	4_605
Rust	470_693	69_727	11_405

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_heap_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.
- default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
- copying. Compile with --force-gc --copying-gc.
- compacting. Compile with --force-gc --compacting-gc.
- generational. Compile with --force-gc --generational-gc.
- incremental. Compile with --force-gc --incremental-gc.
Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

	generate 800k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	1_338_231_405	59_396_776	118	118	118
copying	1_338_231_287	59_396_776	1_337_913_569	1_338_002_371	1_337_919_144
compacting	1_911_420_608	59_396_776	1_473_824_186	1_756_485_066	1_787_369_954
generational	2_891_818_643	59_405_240	1_141_865_993	1_217_376	1_117_840
incremental	33_436_719	1_136_155_048	333_734_166	336_829_512	336_860_690

Actor class

	binary size	put new bucket	put existing bucket	get
Map	282_051	770_378	16_464	16_608

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	151_854	136_083	28_563	11_918	22_822	6_412
Rust	511_870	565_407	71_728	44_318	95_767	53_941

…h-new-meter-O3

@chenyan-dfinity

To mitigate cycle perf regression of new cost model, selectively inline `share_code` helpers in the backend using an additional argument `Never | Always` (i.e. always inline vs never inline). Also, add compiler flags to explicitly opt-in or disable the inlining optimization. NB: some recursive share_code cannot be unshared/inlined (e.g. recursive serialization code and code that explicitly returns rather than returning control flow). Similar to #4207, but also inlines all heap object allocation and adds compiler flags to enable (default)/disable the optimization. Note users may want to disable the optimization if they can't accept the increase in code size. # Profiling data ## new metering, sans wasm-opt dfinity/canister-profiling#85 Overall Statistics binary_size: 10.68% [8.58%, 12.79%] max_mem: no change cycles: -8.32% [-9.67%, -6.97%] ## new metering with wasm-opt 03 dfinity/canister-profiling#86 Overall Statistics binary_size: -6.28% [-7.53%, -5.03%] max_mem: no change cycles: -18.21% [-20.07%, -16.36%] ## new metering, master (no-inlining) and wasm-opt 03 dfinity/canister-profiling#83 Overall Statistics binary_size: -13.96% [-14.64%, -13.28%] max_mem: no change cycles: -12.46% [-13.51%, -11.41%] (UPDATE: revised stats after @chenyan-dfinity updates to PRs)

chenyan-dfinity and others added 11 commits September 14, 2023 12:42

test moc inlining

35799ac

fix

f5888c0

Merge remote-tracking branch 'origin/no-wasm-opt' into moc-inline

e4d654b

fix

0bd8592

fix

82a51c5

Merge remote-tracking branch 'origin/no-wasm-opt' into moc-inline

2432a63

Merge remote-tracking branch 'origin/no-wasm-opt' into moc-inline

587b572

moc inlining with new metering baseline

f11b357

Merge remote-tracking branch 'origin/no-wasm-opt' into moc-inline-wit…

8feb431

…h-new-meter

use artificat with allocation inlining

5fd2146

use ic-wasm 0.5.0; optimize to O3

19cc36b

crusso added the build_base Build base instead of fetching from gh-pages. Note that the build tool runs in the same version label Sep 17, 2023

crusso changed the title ~~Moc inline with new meter o3~~ Moc inline with new meter alloc o3 Sep 17, 2023

install ic-wasm

cb3f154

crusso mentioned this pull request Sep 17, 2023

feat: selectively inline share_code helpers including allocation dfinity/motoko#4212

Merged

chenyan-dfinity added 2 commits September 17, 2023 15:44

Merge remote-tracking branch 'origin/no-wasm-opt' into moc-inline-wit…

b25fbd5

…h-new-meter-O3

fix

59c1d5a

chenyan-dfinity closed this Sep 28, 2023

chenyan-dfinity deleted the moc-inline-with-new-meter-O3 branch November 27, 2023 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moc inline with new meter alloc o3 #86

Moc inline with new meter alloc o3 #86

crusso commented Sep 17, 2023 •

edited

Loading

github-actions bot commented Sep 17, 2023 •

edited

Loading

github-actions bot commented Sep 17, 2023 •

edited

Loading

Moc inline with new meter alloc o3 #86

Moc inline with new meter alloc o3 #86

Conversation

crusso commented Sep 17, 2023 • edited Loading

github-actions bot commented Sep 17, 2023 • edited Loading

Map

Priority queue

Growable array

Statistics

SHA-2

Certified map

Statistics

Basic DAO

DIP721 NFT

Statistics

Heartbeat

Timer

Statistics

Garbage Collection

Actor class

Statistics

Publisher & Subscriber

Statistics

Overall Statistics

github-actions bot commented Sep 17, 2023 • edited Loading

Collection libraries

💎 Takeaways

Map

Priority queue

Growable array

Cryptographic libraries

SHA-2

Certified map

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer

Motoko Specific Benchmarks

Garbage Collection

Actor class

Publisher & Subscriber

crusso commented Sep 17, 2023 •

edited

Loading

github-actions bot commented Sep 17, 2023 •

edited

Loading

github-actions bot commented Sep 17, 2023 •

edited

Loading