Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support throttling block syncing to peers. #1540

Closed
wants to merge 53 commits into from
Closed

Conversation

jgiszczak
Copy link
Contributor

@jgiszczak jgiszczak commented Aug 22, 2023

Peer to peer listen ports may now apply a block sync throttle. Each occurrence of p2p-listen-endpoint has an independent throttle specification. Here's the updated help text:

--p2p-listen-endpoint arg (=0.0.0.0:9876:0)
                                        The actual host:port[:<rate-cap>] used
                                        to listen for incoming p2p connections.
                                        May be used multiple times.   The
                                        optional rate cap will limit block sync
                                        bandwidth to the specified rate.  A
                                        number alone will be interpreted as
                                        bytes per second.  The number may be
                                        suffixed with units.  Supported units
                                        are:   'B/s', 'KB/s', 'MB/s, 'GB/s',
                                        'TB/s', 'KiB/s', 'MiB/s', 'GiB/s',
                                        'TiB/s'.  Transactions and blocks
                                        outside of sync mode are not throttled.
                                          Examples:
                                            192.168.0.100:9876:1MiB/s
                                            node.eos.io:9876:1512KB/s
                                            node.eos.io:9876:0.5GB/s
                                            [2001:db8:85a3:8d3:1319:8a2e:370:7348]:9876:250KB/s

Parsing of the rate cap accepts fractional numbers expressed as decimals. The throttle rate is in bytes per second. Per the help text, shorthand suffixes are supported, and transactions and blocks being propagated across the peer to peer network between synchronized nodes are not throttled. Only blocks transmitted to a syncing node are throttled. A throttle of 0 or 0B/s means unthrottled, and is the default.

The limit for specifying a rate cap is 30 characters, including the colon field separator. This is sufficient characters to specify any value in the 64 bit range of size_t with room left for a suffix. If the rate cap does not parse or the value multiplied by the suffix exceeds 18,446,744,073,709,551,615 an exception will be thrown and the node will not start.

IPv6 addresses are supported for listen endpoints. They must be in square bracket format: [<ipv6 address>]:port optionally followed by :<rate-cap>.

Note

Inbound connections from IPs which are configured p2p-peer-addresses are exempt from throttle rate caps. Care should be taken in NAT environments to avoid inadvertently exempting connections due to overlapping subnets.

Throttling is stable even at exceptionally low byterates, but of course on a busy network if the throttle is less than the average block size on the network, clients using that listen port will never catch up to the head block. Such configurations are allowed but not recommended.

Suggested network topology

Together with #1411 which allows multiple listen endpoints, an edge node for public peering might be configured as follows:
Leap Throttle Sample Network Diagram (truncated)

Resolves #1295.

@jgiszczak jgiszczak added the OCI Work exclusive to OCI team label Aug 22, 2023
@jgiszczak jgiszczak marked this pull request as draft August 22, 2023 08:56
Add necessary custom topology for p2p_sync_throttle_test.
@jgiszczak jgiszczak marked this pull request as ready for review August 23, 2023 06:15
@jgiszczak jgiszczak changed the title Support throttling block syncing to peers. WIP Support throttling block syncing to peers. Aug 23, 2023
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
tests/p2p_sync_throttle_test.py Outdated Show resolved Hide resolved
tests/p2p_sync_throttle_test.py Outdated Show resolved Hide resolved
tests/p2p_sync_throttle_test.py Outdated Show resolved Hide resolved
tests/p2p_sync_throttle_test_shape.json Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
Clarify variable names in p2p throttled sync test and tweak numbers.
Fix p2p throttled test to actually function (waitForBlock has a hidden
default timeout).
Bump up timeout in block_log_util_test.
plugins/net_plugin/net_plugin.cpp Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
@BenjaminGormanPMP BenjaminGormanPMP requested review from dimas1185 and removed request for linh2931 and greg7mdp August 28, 2023 21:02
block_sync_rate_limit = boost::numeric_cast<size_t>(limit * prefix_multipliers.at(units_match[1].str()));
fc_dlog( logger, "setting block_sync_rate_limit to ${limit}", ("limit", block_sync_rate_limit));
} catch (boost::numeric::bad_numeric_cast&) {
EOS_ASSERT(false, plugin_config_exception, "block sync limit specification overflowed: ${limit}", ("limit", limit_str));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use EOS_THROW instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted to EOS_THROW.


throttlingNode = cluster.unstartedNodes[0]
i = throttlingNode.cmd.index('--p2p-listen-endpoint')
throttlingNode.cmd[i+1] = throttlingNode.cmd[i+1] + ':40000B/s'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment why are you using 40000 bytes here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if avg block size is 10Kb and throttling size is 40Kb/s how do slow down transmission if you just need 20Kb/s for full speed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question. This is an integration test, not a user scenario. If the test rate needs to be changed from 40k to 20k for some reason, we simply change it. It's a test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question is based on your comment. numbers are from your comment. you chose the numbers so I asked you to explain those.

endThrottledSync = time.time()
Print(f'Unthrottled sync time: {endThrottlingSync - clusterStart} seconds')
Print(f'Throttled sync time: {endThrottledSync - clusterStart} seconds')
assert endThrottledSync - clusterStart > endThrottlingSync - clusterStart + 15, 'Throttled sync time must be at least 15 seconds greater than unthrottled'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment with explanation of how did you calculate those 15 seconds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting something like (avg_block_size * block_amount) / throttle_size_per_sec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to give the illusion of precision where there is none. The time to synchronize varies significantly depending on the machine running the test. I tried to choose values which are reasonable for a wide range of machines in order to make the test as reliable as possible, but that necessarily involves some guesswork. Unfortunately the actual block sizes are not available to the test framework so I can't implement my preferred solution and just calculate values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I understand you point but that makes this not a real test but guessed numbers that make this script pass. I understand that you can't do this precisely but if you can think of any approximate formula it would be nice. e.g. you can roughly check blocks.log size and divide this by number of blocks or you can take it from network stats that node supposed to print, to include all messages and serialized data sizes.
Otherwise, imagine this test fails. How do I know if that is error or just block size has changed or network issues? To solve this I can play with numbers to make it pass but I still have no idea if I'm not hiding some bug that way.

Fix parsing and overflow problems and address peer review comments.
Extend throttle test to add another throttle prefix.

size_t net_plugin_impl::parse_connection_rate_limit( const std::string& limit_str) const {
std::istringstream in(limit_str);
fc_dlog( logger, "parsing connection endpoint limit ${limit} with locale ${l}", ("limit", limit_str)("l", std::locale("").name()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't parsing the config in a locale dependent way mean config files aren't transportable across different systems (when they have a different locale)? I can imagine some confusion from users when copying a sample config file and it errors out on their system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed locale-awareness.

@bhazzard bhazzard added this to the Leap v6.0.0 Cusp milestone Oct 4, 2023
Remove dependency on python requests package.
Remove locale-aware parsing of sync throttle rate.
Prevent transmitting peer from throttling while not in sync mode.
Add timeouts to throttle sync test.
heifner
heifner previously approved these changes Oct 9, 2023
@heifner heifner changed the base branch from main to release/5.0 October 10, 2023 17:28
@heifner heifner changed the base branch from release/5.0 to main October 10, 2023 17:28
@heifner heifner dismissed their stale review October 10, 2023 21:19

Replaced by #1741

@heifner
Copy link
Member

heifner commented Oct 10, 2023

Replaced by #1741 & #1742

@heifner heifner closed this Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCI Work exclusive to OCI team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

P2P: Ability to limit peer bandwidth consumption
7 participants