[5.0] Replaced `cpu-effort-percent` with `produce-block-offset-ms` #1800

heifner · 2023-10-19T18:45:59Z

Replaced cpu-effort-percent with produce-block-offset-ms.

produce-block-offset-ms specifies how much time to leave at the end of the 12 block production round for the blocks (mainly the last block) to reach the next block producer. This value can be specified as larger than 500ms if needed. The default is 450ms.
See Optimize block start time #867 for implementation of new behavior.

Updated documentation: https://docs.eosnetwork.com/manuals/leap/latest/nodeos/plugins/producer_plugin/block-producing-explained/
from: https://github.com/AntelopeIO/leap/blob/39327d80665dd275680b58587e89fac3a3b90891/docs/01_nodeos/03_plugins/producer_plugin/10_block-producing-explained.md

For simplicity of the explanation let's consider the following notations:

* `r` = `producer_repetitions = 12` (hard-coded value)
* `m` = `max_block_cpu_usage` (on-chain consensus value)
* `u` = `max_block_net_usage` (on-chain consensus value)
* `t` = `block-time`
* `e` = `produce-block-offset-ms` (nodeos configuration)
* `w` = `block-time-interval = 500ms` (hard-coded value)
* `a` = `produce-block-early-amount = w - (w - (e / r)) = e / r ms` (how much to release each block of round early by)
* `l` = `produce-block-time = t - a`
* `p` = `produce block time window = w - a` (amount of wall clock time to produce a block)
* `c` = `billed_cpu_in_block = minimum(m, w - a)`
* `n` = `network tcp/ip latency`
* `h` = `block header validation time ms`

Peer validation for similar hardware/version/config will be <= `m`

**Let's consider the example of the following two BPs and their network topology as depicted in the below diagram**

     +------+     +------+       +------+     +------+
  -->| BP-A |---->| BP-A |------>| BP-B |---->| BP-B |
     +------+     | Peer |       | Peer |     +------+
                  +------+       +------+


`BP-A` will send block at `l` and, `BP-B` needs block at time `t` or otherwise will drop it.

If `BP-A`is producing 12 blocks as follows `b(lock) at t(ime) 1`, `bt 1.5`, `bt 2`, `bt 2.5`, `bt 3`, `bt 3.5`, `bt 4`, `bt 4.5`, `bt 5`, `bt 5.5`, `bt 6`, `bt 6.5` then `BP-B` needs `bt 6.5` by time `6.5` so it has `.5` to produce `bt 7`.

Please notice that the time of `bt 7` minus `.5` equals the time of `bt 6.5` therefore time `t` is the last block time of `BP-A` and when `BP-B` needs to start its first block.

A block is produced and sent when either it reaches `m` or `u` or `p`. 

Starting in Leap 4.0, blocks are propagated after block header validation. This means instead of `BP-A Peer` & `BP-B Peer` taking `m` time to validate and forward a block it only takes a small number of milliseconds to verify the block header and then forward the block.

Starting in Leap 5.0, blocks in a round are started immediately after the completion of the previous block. Before 5.0, blocks were always started on `w` intervals and a node would "sleep" between blocks if needed. In 5.0, the "sleeps" are all moved to the end of the block production round. 

## Example 1: block arrives 110ms early
* Assuming zero network latency between all nodes.
* Assuming blocks do not reach `m` and therefore take `w - a` time to produce.
* Assume block completion including signing takes zero time.
* `BP-A` has e = 120, n = 0ms, h = 5ms, a = 10ms
* `BP-A` sends b1 at `t1-10ms` => `BP-A-Peer` processes `h=5ms`, sends at `t-5ms` => `BP-B-Peer` processes `h=5ms`, sends at `t-0ms` => arrives at `BP-B` at `t`.
* `BP-A` starts b2 at `t1-10ms`, sends b2 at `t2-20ms` => `BP-A-Peer` processes `h=5ms`, sends at `t2-15ms` => `BP-B-Peer` processes `h=5ms`, sends at `t2-10ms` => arrives at `BP-B` at `t2-10ms`.
* `BP-A` starts b3 at `t2-20ms`, ...
* `BP-A` starts b12 at `t11-110ms`, sends b12 at `t12-120ms` => `BP-A-Peer` processes `h=5ms`, sends at `t12-115ms` => `BP-B-Peer` processes `h=5ms`, sends at `t12-110ms` => arrives at `BP-B` at `t12-110ms`

## Example 2: block arrives 80ms early
* Assuming zero network latency between `BP-A` & `BP-A Peer` and between `BP-B Peer` & `BP-B`.
* Assuming 150ms network latency between `BP-A Peer` & `BP-B Peer`.
* Assuming blocks do not reach `m` and therefore take `w - a` time to produce.
* Assume block completion including signing takes zero time.
* `BP-A` has e = 240, n = 0ms/150ms, h = 5ms, a = 20ms
* `BP-A` sends b1 at `t1-20ms` => `BP-A-Peer` processes `h=5ms`, sends at `t-15ms` =(150ms)> `BP-B-Peer` processes `h=5ms`, sends at `t+140ms` => arrives at `BP-B` at `t+140ms`.
* `BP-A` starts b2 at `t1-20ms`, sends b2 at `t2-40ms` => `BP-A-Peer` processes `h=5ms`, sends at `t2-35ms` =(150ms)> `BP-B-Peer` processes `h=5ms`, sends at `t2+120ms` => arrives at `BP-B` at `t2+120ms`.
* `BP-A` starts b3 at `t2-40ms`, ...
* `BP-A` starts b12 at `t11-220ms`, sends b12 at `t12-240ms` => `BP-A-Peer` processes `h=5ms`, sends at `t12-235ms` =(150ms)> `BP-B-Peer` processes `h=5ms`, sends at `t12-80ms` => arrives at `BP-B` at `t12-80ms`

## Example 3: block arrives 16ms late and is dropped
* Assuming zero network latency between `BP-A` & `BP-A Peer` and between `BP-B Peer` & `BP-B`.
* Assuming 200ms network latency between `BP-A Peer` & `BP-B Peer`.
* Assuming blocks do not reach `m` and therefore take `w - a` time to produce.
* Assume block completion including signing takes zero time.
* `BP-A` has e = 204, n = 0ms/200ms, h = 10ms, a = 17ms
* `BP-A` sends b1 at `t1-17ms` => `BP-A-Peer` processes `h=10ms`, sends at `t-7ms` =(200ms)> `BP-B-Peer` processes `h=10ms`, sends at `t+203ms` => arrives at `BP-B` at `t+203ms`.
* `BP-A` starts b2 at `t1-17ms`, sends b2 at `t2-34ms` => `BP-A-Peer` processes `h=10ms`, sends at `t2-24ms` =(200ms)> `BP-B-Peer` processes `h=10ms`, sends at `t2+186ms` => arrives at `BP-B` at `t2+186ms`.
* `BP-A` starts b3 at `t2-34ms`, ...
* `BP-A` starts b12 at `t11-187ms`, sends b12 at `t12-204ms` => `BP-A-Peer` processes `h=10ms`, sends at `t12-194ms` =(200ms)> `BP-B-Peer` processes `h=10ms`, sends at `t12+16ms` => arrives at `BP-B` at `t12+16ms`

## Example 4: full blocks are produced early
* Assuming zero network latency between `BP-A` & `BP-A Peer` and between `BP-B Peer` & `BP-B`.
* Assuming 200ms network latency between `BP-A Peer` & `BP-B Peer`.
* Assume all blocks are full as there are enough queued up unapplied transactions ready to fill all blocks.
* Assume a block can be produced with 200ms worth of transactions in 225ms worth of time. There is overhead for producing the block.
* `BP-A` has e = 120, m = 200ms, n = 0ms/200ms, h = 10ms, a = 10ms
* `BP-A` sends b1 at `t1-275s` => `BP-A-Peer` processes `h=10ms`, sends at `t-265ms` =(200ms)> `BP-B-Peer` processes `h=10ms`, sends at `t-55ms` => arrives at `BP-B` at `t-55ms`.
* `BP-A` starts b2 at `t1-275ms`, sends b2 at `t2-550ms (t1-50ms)` => `BP-A-Peer` processes `h=10ms`, sends at `t2-540ms` =(200ms)> `BP-B-Peer` processes `h=10ms`, sends at `t2-330ms` => arrives at `BP-B` at `t2-330ms`.
* `BP-A` starts b3 at `t2-550ms`, ...
* `BP-A` starts b12 at `t11-3025ms`, sends b12 at `t12-3300ms` => `BP-A-Peer` processes `h=10ms`, sends at `t12-3290ms` =(200ms)> `BP-B-Peer` processes `h=10ms`, sends at `t12-3080ms` => arrives at `BP-B` at `t12-3080ms`


Running wasm-runtime=eos-vm-jit eos-vm-oc-enable on relay node will reduce the validation time.

Resolves #1784

…ge meaning to be over complete round

docs/01_nodeos/03_plugins/producer_plugin/index.md

…fort-5.0

docs/01_nodeos/03_plugins/producer_plugin/10_block-producing-explained.md

heifner · 2023-10-20T14:40:39Z

docs/01_nodeos/03_plugins/producer_plugin/10_block-producing-explained.md

+
+Peer validation for similar hardware/version/config will be <= `m`
+
+**Let's consider for exemplification the following two BPs and their network topology as depicted in the below diagram**


Why exemplification is used here instead of just for example is beyond me. Seems like we are trying to sound smart.

I think I'd write:
Let's consider the example of the following two BPs and their network topology as depicted in the below diagram

docs/01_nodeos/03_plugins/producer_plugin/10_block-producing-explained.md

greg7mdp · 2023-10-20T15:27:26Z

This document 10_block-producing-explained.md is excellent... for sure it should be kept and maintained.

plugins/producer_plugin/producer_plugin.cpp

docs/01_nodeos/03_plugins/producer_plugin/10_block-producing-explained.md

linh2931 · 2023-10-20T17:16:59Z

docs/01_nodeos/03_plugins/producer_plugin/10_block-producing-explained.md

+* `BP-A` sends b1 at `t1-17ms` => `BP-A-Peer` processes `h=10ms`, sends at `t-7ms` =(200ms)> `BP-B-Peer` processes `h=10ms`, sends at `t+203ms` => arrives at `BP-B` at `t+203ms`.
+* `BP-A` starts b2 at `t1-17ms`, sends b2 at `t2-34ms` => `BP-A-Peer` processes `h=10ms`, sends at `t2-24ms` =(200ms)> `BP-B-Peer` processes `h=10ms`, sends at `t2+186ms` => arrives at `BP-B` at `t2+186ms`.
+* `BP-A` starts b3 at `t2-34ms`, ...
+* `BP-A` starts b12 at `t11-187ms`, sends b12 at `t12-204ms` => `BP-A-Peer` processes `h=10ms`, sends at `t12-194ms` =(200ms)> `BP-B-Peer` processes `h=10ms`, sends at `t12+16ms` => arrives at `BP-B` at `t12-16ms`


Shouldn't it be arriving at B at t12+16ms? (the +).

Yes. Thanks.

plugins/producer_plugin/producer_plugin.cpp

plugins/producer_plugin/include/eosio/producer_plugin/block_timing_util.hpp

linh2931

I don't see in the code where you send out a block right away after it is produced.
Should some new tests be added for this change?

docs/01_nodeos/03_plugins/producer_plugin/index.md

plugins/producer_plugin/producer_plugin.cpp

linh2931 · 2023-10-20T18:16:45Z

plugins/producer_plugin/producer_plugin.cpp

+      _produce_block_cpu_effort_us = fc::milliseconds( config::block_interval_ms - produce_block_offset_ms / config::producer_repetitions );
+   }
+
+   fc::microseconds get_produce_block_offset() const {


At the first glance, I think this returns the user supplied value.

It does. We do not have a milliseconds type only a microseconds type.

greg7mdp · 2023-10-20T18:22:14Z

plugins/producer_plugin/include/eosio/producer_plugin/producer_plugin.hpp

@@ -17,7 +17,7 @@ class producer_plugin : public appbase::plugin<producer_plugin> {
   struct runtime_options {
      std::optional<int32_t>   max_transaction_time;
      std::optional<int32_t>   max_irreversible_block_age;
-      std::optional<int32_t>   cpu_effort_us;
+      std::optional<int32_t>   produce_block_offset_ms;


I'd add the comment here as well

Suggested change

std::optional<int32_t> produce_block_offset_ms;

std::optional<int32_t> produce_block_offset_ms; // minimum time to reserve at the end of a production round for blocks to propagate to the next block producer.

heifner · 2023-10-20T18:52:26Z

I don't see in the code where you send out a block right away after it is produced.

See producer_plugin use of block_is_exhausted().

leap/plugins/producer_plugin/producer_plugin.cpp

Line 2527 in 54e42bc

_timer.expires_from_now(boost::posix_time::microseconds(0));

leap/plugins/producer_plugin/producer_plugin.cpp

Line 849 in 54e42bc

self->schedule_maybe_produce_block(true);

leap/plugins/producer_plugin/producer_plugin.cpp

Line 2494 in 54e42bc

schedule_maybe_produce_block(result == start_block_result::exhausted);

Should some new tests be added for this change?

This is not new behavior, it has shipped a block off early since v2.0.4. See EOSIO/eos#8651
We should consider refactoring producer_plugin so it is easier to test. Or it might be possible to use mock_time like the test_trx_retry_db does. However, I believe that is out of scope for this PR. Please create a GitHub issue to work that later.

…fort

linh2931 · 2023-10-20T19:26:15Z

See producer_plugin use of block_is_exhausted().
Thanks! I was thinking about where change was for keeping production without waiting, but asking a wrong question.

linh2931 · 2023-10-20T19:30:23Z

This is not new behavior, it has shipped a block off early since v2.0.4. See EOSIO/eos#8651
We should consider refactoring producer_plugin so it is easier to test. Or it might be possible to use mock_time like the test_trx_retry_db does. However, I believe that is out of scope for this PR. Please create a GitHub issue to work that later.

#1804

heifner · 2023-10-20T19:32:38Z

Thanks! I was thinking about where change was for keeping production without waiting, but asking a wrong question.

See #867 which is the change to calculations on when to start a block.

…rovided offset

GH-1784 Rename cpu-effort-percent to produce-block-offset-ms and chan…

0c5ff7e

…ge meaning to be over complete round

heifner requested review from greg7mdp and linh2931 October 19, 2023 18:46

greg7mdp reviewed Oct 19, 2023

View reviewed changes

docs/01_nodeos/03_plugins/producer_plugin/index.md Outdated Show resolved Hide resolved

heifner added 2 commits October 19, 2023 16:33

Merge remote-tracking branch 'origin/release/5.0' into GH-1784-cpu-ef…

ffbcbd9

…fort-5.0

GH-1784 Update block producing doc

5ca96db

heifner added documentation Improvements or additions to documentation OCI Work exclusive to OCI team labels Oct 20, 2023

heifner linked an issue Oct 20, 2023 that may be closed by this pull request

Remove cpu-effort-percent and replace with new option #1784

Closed

5 tasks