Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update database benchmarks to take trie cache into account #6131

Open
athei opened this issue Oct 18, 2024 · 9 comments
Open

Update database benchmarks to take trie cache into account #6131

athei opened this issue Oct 18, 2024 · 9 comments
Assignees

Comments

@athei
Copy link
Member

athei commented Oct 18, 2024

As of right now the weight for database access is hard coded to 25us and 100us for read and write respectively. Those numbers are way too high given that we have a trie cache now.

We should write a benchmark that benchmarks storage access with a warm cache. Meaning that every access in the benchmark hits the cache. This is okay to assume as we will require the cache to be larger than the state size.

The results of that benchmark can be used for chains where we require collators to set a large --trie-cache-size. For example, AssetHub.

@ggwpez Do we have a storage access benchmark already in the codebase somewhere?

We also need to add code that reads the whole database on node startup. This makes sure that all the state is in RAM.

@ggwpez
Copy link
Member

ggwpez commented Oct 21, 2024

Sorry did not see the ping, the code is here:

info!("Reading {} keys", keys.len());
for key in keys.as_slice() {

You can use it from the polkadot or polkadot-omni-node binary:
polkadot benchmark storage --state-version 1 -d ~/.local/share/polkadot or same with polkadot-omni-node, but then it also needs a --chain spec.

@athei athei moved this to Minimal Feature Launch in Smart Contracts Dec 4, 2024
@ggwpez
Copy link
Member

ggwpez commented Dec 9, 2024

Do you mostly need this for Polkadot+Kusama AssetHub or something else as well?
We would need to do it directly in the Runtimes repo.

@athei
Copy link
Member Author

athei commented Dec 9, 2024

Yes, it is needed only for AssetHub. But it would be good to also have those weights in polkadot-sdk so we can test on westend AssetHub. I would assume we add some InMemoryDBWeights that you are only supposed to use when you launch all your nodes with --in-ram-db=xxGB.

This flag would be the same as --trie-cache-size but does this in addition:

  1. Read the whole state on node startup so that it is in cache.
  2. Error out if the state doesn't fit in RAM plus some safety margin.

Would be awesome if the runtime somehow could advise the client which of those flags to set. But I guess this is not in the cards.

@athei athei assigned ggwpez and bkontur and unassigned ggwpez Dec 11, 2024
@bkontur
Copy link
Contributor

bkontur commented Dec 17, 2024

I would break this issue into several parts:

  1. Set up a process for regularly regenerating/maintaining (RocksDB) hard-coded storage read/write weights

    As you pointed out, we are using this hard-coded, copied weight everywhere. Some runtimes even directly include these values from frame_support, for example, here.

    Essentially, we are consistently using read: 25_000 * constants::WEIGHT_REF_TIME_PER_NANOS and write: 100_000 * constants::WEIGHT_REF_TIME_PER_NANOS without measuring them, as stated here:

    The cost of storage operations in a Substrate chain depends on the current chain state. It is therefore important to regularly update these weights as the chain grows.

  • As a first step, I would set up and fix runtimes (e.g., polkadot-sdk, polkadot-fellows) and regenerate the rocksdb_weights.rs file as a starting point.
  • Additionally, I would establish a process: whenever we regenerate weights for benchmarks, we should also benchmark storage using a current snapshot of the relevant chain state.
    • for live chains, probably this needs to download snapshot or just state, something like try-runtime is doing, just downloading storage keys and then run benchmark storage
  • I will also create a tracking issue for polkadot-fellows to ensure runtimes are fixed.
  • I would advice to parachain teams to do the same for their runtimes

POC:
I am playing here with AssetHubWestend and benchmark storage, but it reads just genesis with 12keys (investigating), I have storage with partially synced chain with finalized headers - still investigating...

./target/production/polkadot-parachain benchmark storage --state-version 1 --include-child-trees --chain cumulus/polkadot-parachain/chain-specs/asset-hub-westend.json --base-path /home/ubuntu/.local/share/polkadot-parachain/chains/asset-hub-westend/db/full
2024-12-17 15:47:19  creating SingleState txpool Limit { count: 8192, total_bytes: 20971520 }/Limit { count: 512, total_bytes: 1048576 }.    
2024-12-17 15:47:19 Warmup round 1/1    
2024-12-17 15:47:19 Preparing keys from block 0x67f9…98c9    
2024-12-17 15:47:19 Reading 12 keys    
2024-12-17 15:47:19 Reading 0 child keys

  1. Add storage benchmarking support to frame-omni-bencher
  • This also means updating benchmark bots to support storage benchmarking.
    This support may be necessary mainly for bot operations to avoid needing to build the entire polkadot or polkadot-parachain binary.

  1. Node (polkadot/polkadot-parachain/polkadot-omni-node) Integration

I would assume we add some InMemoryDBWeights that you are only supposed to use when you launch all your nodes with --in-ram-db=xxGB.

I’m not entirely sure I understand your point. If you mean setting InMemoryDBWeights instead of RocksDBWeights in the runtime (e.g., here, that approach won’t work because the node cannot control the runtime. In other words, a node’s --in-ram-db flag cannot toggle the runtime’s switch between InMemoryDBWeights and RocksDBWeights.

I believe there’s an alternative approach where the runtime determines the node’s behavior, rather than the other way around. For example, the runtime specifies type DbWeight = RocksDbWeight;, and we can read this configuration in the node. The node’s startup parameters, such as --trie-cache-size and/or --in-ram-db, could then be adjusted dynamically based on the runtime’s DbWeight values.

Am I missing something here, or does this approach make sense?

image

@athei
Copy link
Member Author

athei commented Dec 17, 2024

I’m not entirely sure I understand your point. If you mean setting InMemoryDBWeights instead of RocksDBWeights in the runtime (e.g., here, that approach won’t work because the node cannot control the runtime. In other words, a node’s --in-ram-db flag cannot toggle the runtime’s switch between InMemoryDBWeights and RocksDBWeights.

This is why I wrote you are only supposed to set this value when you made sure your nodes all start with said cli switch. It is a convention. Just like all the rest of the weights (you need to have a certain hardware spec).

I believe there’s an alternative approach where the runtime determines the node’s behavior, rather than the other way around. For example, the runtime specifies type DbWeight = RocksDbWeight;, and we can read this configuration in the node. The node’s startup parameters, such as --trie-cache-size and/or --in-ram-db, could then be adjusted dynamically based on the runtime’s DbWeight values.

Yes this would be a good safe guard. But I wouldn't set it automatically. Rather I would error out if the command is not set or a warning that needs to be overwritten. It is still a heuristic.

Regarding your plan. I would expect to see "changing the benchmarks to measure in-memory access" somewhere. We also need to find out of the trie cache is write back or write through. If the former is the case the cache is probably too slow. Benchmarking will reveal that.

@bkontur
Copy link
Contributor

bkontur commented Dec 18, 2024

4. InMemoryDBWeights - changing the benchmarks to measure in-memory access

  • run benchmark storage for AssetHubs (at first for Westend) to generate InMemoryDBWeights with --trie-cache-size and make sure all keys are warm-up-ed in cache (TrieCache/SharedTrieCache + print some stats to see TrieHitStats).
    • iiuc InMemoryDBWeights should be more like InMemoryTrieWithRocksDBWeights, right?
    • force benchmark storage to use some snapshot or synced db and trigger this on some high/recent block, because by default genesis has just 12 keys
  • set type DbWeight = InMemoryTrieWithRocksDBWeights for AssetHubWestend runtime
  • now, regenerate all the weights for AssetHubWestend

5. Investigate trie cache is write back or write through
We also need to find out of the trie cache is write back or write through. If the former is the case the cache is probably too slow. Benchmarking will reveal that.


Ok, let's say we release new polkadot-fellows runtime AssetHubPolkadot/AssetHubKusama/Westend with benchmarked InMemoryTrieWithRocksDBWeights (benchmark storage + benchmark pallets with very high --trie-cache-size).

This is why I wrote you are only supposed to set this value when you made sure your nodes all start with said cli switch.

Could you please elaborate on this, specifically on "when you made sure your nodes all start with the said CLI switch" and "your nodes"? How would you achieve or ensure this for AssetHubPolkadot/AssetHubKusama? I mean, anyone can run a node/collator with a different node version and/or without the trie-cache-size option. Ok, we can add good safe guard with heuristic on node startup, but what about nodes which would not have this version? Or I don't know, does this affect e.g. light-clients or just collators which build blocks?

@athei
Copy link
Member Author

athei commented Dec 18, 2024

iiuc InMemoryDBWeights should be more like InMemoryTrieWithRocksDBWeights, right?

The on disk format shouldn't be relevant as it will not hit rocksdb synchronously (assuming write back cache). Maybe TrieCachedWeights is a better name.

force benchmark storage to use some snapshot or synced db and trigger this on some high/recent block, because by default genesis has just 12 keys

Yeah it is a good idea to inspect how the trie cache scales with size.

Could you please elaborate on this, specifically on "when you made sure your nodes all start with the said CLI switch" and "your nodes"? How would you achieve or ensure this for AssetHubPolkadot/AssetHubKusama? I mean, anyone can run a node/collator with a different node version and/or without the trie-cache-size option. Ok, we can add good safe guard with heuristic on node startup, but what about nodes which would not have this version?

How would you safeguard against somebody running their node on a slow computer? You can't. It is exactly the same thing here. We are raising the hardware requirements essentially. So we communicate this new requirement and any safeguard we can add is nice. But not required. We then update the runtime after enough people upgraded their node (but at least all collators).

Or I don't know, does this affect e.g. light-clients or just collators which build blocks?

Light clients and validator nodes are not affected because they are stateless. Collators and full nodes are affected.

@bkontur
Copy link
Contributor

bkontur commented Jan 9, 2025

@athei I played a bit with benchmark storage for AssetHubWestend on a high block 10,384,607, which has 975,060 keys and 1,583 child keys, using different setups:

Setup of trie-cache READ WRITE
no trie cache 10_816 * constants::WEIGHT_REF_TIME_PER_NANOS 40_633 * constants::WEIGHT_REF_TIME_PER_NANOS
--enable-trie-cache default trie cache: (67108864 Bytes) 13_343 * constants::WEIGHT_REF_TIME_PER_NANOS 40_843 * constants::WEIGHT_REF_TIME_PER_NANOS
--enable-trie-cache / --trie-cache-size 1073741824 (1GB trie cache) 8_887 * constants::WEIGHT_REF_TIME_PER_NANOS 40_787 * constants::WEIGHT_REF_TIME_PER_NANOS

Notes/Hints:

  • From the numbers, it seems that only read operations are affected by the trie-cache (likely as expected).
  • Basti changed the default trie-cache size to 1 GiB for nodes here: paritytech/polkadot-sdk#6546.
  • For benchmarking, the trie-cache is disabled by default. You can enable it with:
    --enable-trie-cache (default size is 64 MiB)
    --trie-cache-size 1073741824 (1 GiB)
    
  • I tried to add some cache stats to the logs. I can see state_cache: MemorySize(621345584) but no other metrics, e.g., state_reads_cache: 0 and state_writes_cache: 0:
    2025-01-07 20:34:45.140  INFO main frame_benchmarking_cli::storage::cmd: Cache usage:
    Some(UsageInfo { memory: MemoryInfo { state_cache: MemorySize(622322987), database_cache: MemorySize(0) }, io: IoInfo { transactions: 0, bytes_read: 0, bytes_written: 0, writes: 0, reads: 0, average_transaction_size: 0, state_reads: 976237, state_reads_cache: 0, state_writes: 0, state_writes_cache: 0, state_writes_nodes: 0 } })
    
    • Is there a bug here with state_reads_cache or state_writes_cache?
    • Should we also consider database_cache in this context?
  • I discovered that for my point 1., there is an existing issue: Run storage benchmarks before release polkadot-fellows/runtimes#491.

no trie cache

bkontur@toaster1:~/polkadot-sdk$ ./target/production/polkadot-parachain benchmark storage --state-version 1     --base-path /home/bkontur/.local/share/polkadot-parachain     --chain cumulus/polkadot-parachain/chain-specs/asset-hub-westend.json     --include-child-trees     --detailed-log-output

2025-01-06 23:29:28.475  INFO main txpool:  creating SingleState txpool Limit { count: 8192, total_bytes: 20971520 }/Limit { count: 512, total_bytes: 1048576 }.
2025-01-06 23:29:30.319  INFO main frame_benchmarking_cli::storage::cmd: Warmup round 1/1
2025-01-06 23:29:41.026  INFO main frame_benchmarking_cli::storage::read: Preparing keys from block 0x6b8a1b649c7876fc0fb4c1ef1a51f07ea97bf3322ee826977b5c23404e61a574
2025-01-06 23:29:42.565  INFO main frame_benchmarking_cli::storage::read: Reading 975060 keys
2025-01-06 23:29:53.298  INFO main frame_benchmarking_cli::storage::read: Reading 1583 child keys
2025-01-06 23:29:53.404  INFO main frame_benchmarking_cli::storage::cmd: Time summary [ns]:
Total: 10559479290
Min: 972, Max: 636946
Average: 10816, Median: 11007, Stddev: 1755.06
Percentiles 99th, 95th, 75th: 14101, 13110, 11968
Value size summary:
Total: 58047152
Min: 0, Max: 1693102
Average: 59, Median: 64, Stddev: 1899.09
Percentiles 99th, 95th, 75th: 80, 80, 80
2025-01-06 23:29:54.950  INFO main frame_benchmarking_cli::storage::cmd: Warmup round 1/1
2025-01-06 23:30:05.582  INFO main frame_benchmarking_cli::storage::write: Preparing keys from block 0x6b8a1b649c7876fc0fb4c1ef1a51f07ea97bf3322ee826977b5c23404e61a574
2025-01-06 23:30:07.629  INFO main frame_benchmarking_cli::storage::write: Writing 975060 keys
2025-01-06 23:31:49.859  INFO main frame_benchmarking_cli::storage::write: Writing 1583 child keys
2025-01-06 23:31:50.005  INFO main frame_benchmarking_cli::storage::cmd: Time summary [ns]:
Total: 39667485804
Min: 4597, Max: 10498130
Average: 40633, Median: 40101, Stddev: 32806.69
Percentiles 99th, 95th, 75th: 51758, 48543, 44127
Value size summary:
Total: 58047152
Min: 0, Max: 1693102
Average: 59, Median: 64, Stddev: 1899.09
Percentiles 99th, 95th, 75th: 80, 80, 80
2025-01-06 23:31:50.005  INFO main frame_benchmarking_cli::storage::cmd: Cache usage:
Some(UsageInfo { memory: MemoryInfo { state_cache: MemorySize(0), database_cache: MemorySize(0) }, io: IoInfo { transactions: 0, bytes_read: 0, bytes_written: 0, writes: 0, reads: 0, average_transaction_size: 0, state_reads: 1583, state_reads_cache: 0, state_writes: 0, state_writes_cache: 0, state_writes_nodes: 0 } })
2025-01-06 23:31:50.005  INFO main frame_benchmarking_cli::storage::template: Writing weights to "/home/bkontur/polkadot-sdk/rocksdb_weights.rs"

default trie cache: (67108864 Bytes)

bkontur@toaster1:~/polkadot-sdk$ ./target/production/polkadot-parachain benchmark storage --state-version 1 \
    --base-path /home/bkontur/.local/share/polkadot-parachain \
    --chain cumulus/polkadot-parachain/chain-specs/asset-hub-westend.json \
    --include-child-trees \
    --detailed-log-output \
    --enable-trie-cache
2025-01-06 23:35:00.615  INFO main txpool:  creating SingleState txpool Limit { count: 8192, total_bytes: 20971520 }/Limit { count: 512, total_bytes: 1048576 }.
2025-01-06 23:35:02.521  INFO main frame_benchmarking_cli::storage::cmd: Warmup round 1/1
2025-01-06 23:35:15.809  INFO main frame_benchmarking_cli::storage::read: Preparing keys from block 0x6b8a1b649c7876fc0fb4c1ef1a51f07ea97bf3322ee826977b5c23404e61a574
2025-01-06 23:35:17.403  INFO main frame_benchmarking_cli::storage::read: Reading 975060 keys
2025-01-06 23:35:30.643  INFO main frame_benchmarking_cli::storage::read: Reading 1583 child keys
2025-01-06 23:35:30.745  INFO main frame_benchmarking_cli::storage::cmd: Time summary [ns]:
Total: 13026510157
Min: 601, Max: 1425945
Average: 13343, Median: 12960, Stddev: 7077.46
Percentiles 99th, 95th, 75th: 20401, 17377, 14472
Value size summary:
Total: 58047152
Min: 0, Max: 1693102
Average: 59, Median: 64, Stddev: 1899.09
Percentiles 99th, 95th, 75th: 80, 80, 80
2025-01-06 23:35:32.334  INFO main frame_benchmarking_cli::storage::cmd: Warmup round 1/1
2025-01-06 23:35:45.486  INFO main frame_benchmarking_cli::storage::write: Preparing keys from block 0x6b8a1b649c7876fc0fb4c1ef1a51f07ea97bf3322ee826977b5c23404e61a574
2025-01-06 23:35:47.592  INFO main frame_benchmarking_cli::storage::write: Writing 975060 keys
2025-01-06 23:37:30.209  INFO main frame_benchmarking_cli::storage::write: Writing 1583 child keys
2025-01-06 23:37:30.343  INFO main frame_benchmarking_cli::storage::cmd: Time summary [ns]:
Total: 39873256281
Min: 4457, Max: 10471867
Average: 40843, Median: 40341, Stddev: 35949.63
Percentiles 99th, 95th, 75th: 51789, 48544, 44167
Value size summary:
Total: 58047152
Min: 0, Max: 1693102
Average: 59, Median: 64, Stddev: 1899.09
Percentiles 99th, 95th, 75th: 80, 80, 80
2025-01-06 23:37:30.343  INFO main frame_benchmarking_cli::storage::cmd: Cache usage:
Some(UsageInfo { memory: MemoryInfo { state_cache: MemorySize(39137407), database_cache: MemorySize(0) }, io: IoInfo { transactions: 0, bytes_read: 0, bytes_written: 0, writes: 0, reads: 0, average_transaction_size: 0, state_reads: 1583, state_reads_cache: 0, state_writes: 0, state_writes_cache: 0, state_writes_nodes: 0 } })
2025-01-06 23:37:30.343  INFO main frame_benchmarking_cli::storage::template: Writing weights to "/home/bkontur/polkadot-sdk/rocksdb_weights.rs"

1GB trie cache

bkontur@toaster1:~/polkadot-sdk$ ./target/production/polkadot-parachain benchmark storage --state-version 1 \
    --base-path /home/bkontur/.local/share/polkadot-parachain \
    --chain cumulus/polkadot-parachain/chain-specs/asset-hub-westend.json \
    --include-child-trees \
    --detailed-log-output \
    --trie-cache-size 1073741824 \
    --enable-trie-cache
2025-01-06 23:41:40.112  INFO main txpool:  creating SingleState txpool Limit { count: 8192, total_bytes: 20971520 }/Limit { count: 512, total_bytes: 1048576 }.
2025-01-06 23:41:41.976  INFO main frame_benchmarking_cli::storage::cmd: Warmup round 1/1
2025-01-06 23:41:53.921  INFO main frame_benchmarking_cli::storage::read: Preparing keys from block 0x6b8a1b649c7876fc0fb4c1ef1a51f07ea97bf3322ee826977b5c23404e61a574
2025-01-06 23:41:55.453  INFO main frame_benchmarking_cli::storage::read: Reading 975060 keys
2025-01-06 23:42:04.319  INFO main frame_benchmarking_cli::storage::read: Reading 1583 child keys
2025-01-06 23:42:04.418  INFO main frame_benchmarking_cli::storage::cmd: Time summary [ns]:
Total: 8676248317
Min: 591, Max: 16262784
Average: 8887, Median: 10696, Stddev: 23789.4
Percentiles 99th, 95th, 75th: 18438, 15183, 12158
Value size summary:
Total: 58047152
Min: 0, Max: 1693102
Average: 59, Median: 64, Stddev: 1899.09
Percentiles 99th, 95th, 75th: 80, 80, 80
2025-01-06 23:42:05.959  INFO main frame_benchmarking_cli::storage::cmd: Warmup round 1/1
2025-01-06 23:42:15.071  INFO main frame_benchmarking_cli::storage::write: Preparing keys from block 0x6b8a1b649c7876fc0fb4c1ef1a51f07ea97bf3322ee826977b5c23404e61a574
2025-01-06 23:42:17.133  INFO main frame_benchmarking_cli::storage::write: Writing 975060 keys
2025-01-06 23:43:59.560  INFO main frame_benchmarking_cli::storage::write: Writing 1583 child keys
2025-01-06 23:43:59.694  INFO main frame_benchmarking_cli::storage::cmd: Time summary [ns]:
Total: 39818611219
Min: 4437, Max: 11232225
Average: 40787, Median: 40181, Stddev: 34484.34
Percentiles 99th, 95th, 75th: 52259, 48935, 44337
Value size summary:
Total: 58047152
Min: 0, Max: 1693102
Average: 59, Median: 64, Stddev: 1899.09
Percentiles 99th, 95th, 75th: 80, 80, 80
2025-01-06 23:43:59.694  INFO main frame_benchmarking_cli::storage::cmd: Cache usage:
Some(UsageInfo { memory: MemoryInfo { state_cache: MemorySize(621345584), database_cache: MemorySize(0) }, io: IoInfo { transactions: 0, bytes_read: 0, bytes_written: 0, writes: 0, reads: 0, average_transaction_size: 0, state_reads: 1583, state_reads_cache: 0, state_writes: 0, state_writes_cache: 0, state_writes_nodes: 0 } })
2025-01-06 23:43:59.694  INFO main frame_benchmarking_cli::storage::template: Writing weights to "/home/bkontur/polkadot-sdk/rocksdb_weights.rs"

@athei
Copy link
Member Author

athei commented Jan 10, 2025

This does not look very promising. A cached vs disk access should be a much bigger difference. Probably those benchmarks are not measuring the right thing. Have you looked into those benchmarks that Basti wrote for the trie cache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Minimal Feature Launch
Development

No branches or pull requests

3 participants