Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network/litep2p: Update litep2p network backend to version 0.8.1 #6484

Merged
merged 3 commits into from
Nov 15, 2024

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Nov 14, 2024

This PR updates the litep2p backend to version 0.8.1 from 0.8.0.

The v0.8.1 release includes key fixes that enhance the stability and performance of the litep2p library. The focus is on long-running stability and improvements to polling mechanisms.

Long Running Stability Improvements

This issue caused long-running nodes to reject all incoming connections, impacting overall stability.

Addressed a bug in the connection limits functionality that incorrectly tracked connections due for rejection.

This issue caused an artificial increase in inbound peers, which were not being properly removed from the connection limit count.

This fix ensures more accurate tracking and management of peer connections #286.

Polling implementation fixes

This release provides multiple fixes to the polling mechanism, improving how connections and events are processed:

  • Resolved an overflow issue in TransportContext’s polling index for streams, preventing potential crashes (#283).
  • Fixed a delay in the manager’s poll_next function that prevented immediate polling of newly added futures (#287).
  • Corrected an issue where the listener did not return Poll::Ready(None) when it was closed, ensuring proper signal handling (#285).

Fixed

  • manager: Fix connection limits tracking of rejected connections (#286)
  • transport: Fix waking up on filtered events from poll_next (#287)
  • transports: Fix missing Poll::Ready(None) event from listener (#285)
  • manager: Avoid overflow on stream implementation for TransportContext (#283)
  • manager: Log when polling returns Ready(None) (#284)

Testing Done

Started kusama nodes running side by side with a higher number of inbound and outbound connections (500).
We previously tested with peers bounded at 50. This testing filtered out the fixes included in the latest release.

With this high connection testing setup, litep2p outperforms libp2p in almost every domain, from performance to the warnings / errors encountered while operating the nodes.

TLDR: this is the version we need to test on kusama validators next

  • Litep2p
Repo Count Level Triage report
polkadot-sdk 409 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Peer disconnected with inflight after backoffs. Banned, disconnecting. )
litep2p 128 warn Refusing to add known address that corresponds to a different peer ID
litep2p 54 warn inbound identify substream opened for peer who doesn't exist
polkadot-sdk 7 error 💔 Called on_validated_block_announce with a bad peer ID .*
polkadot-sdk 1 warn ❌ Error while dialing .: .
polkadot-sdk 1 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Invalid justification. Banned, disconnecting. )
  • Libp2p
Repo Count Level Triage report
polkadot-sdk 1023 warn 💔 Ignored block (#.* -- .) announcement from . because all validation slots are occupied.
polkadot-sdk 472 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Unsupported protocol. Banned, disconnecting. )
polkadot-sdk 379 error 💔 Called on_validated_block_announce with a bad peer ID .*
polkadot-sdk 163 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Invalid justification. Banned, disconnecting. )
polkadot-sdk 116 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Peer disconnected with inflight after backoffs. Banned, disconnecting. )
polkadot-sdk 83 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Same block request multiple times. Banned, disconnecting. )
polkadot-sdk 4 warn Re-finalized block #.* (.) in the canonical chain, current best finalized is #.
polkadot-sdk 2 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Genesis mismatch. Banned, disconnecting. )
polkadot-sdk 2 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Not requested block data. Banned, disconnecting. )
polkadot-sdk 2 warn Can't listen on .* because: .*
polkadot-sdk 1 warn ❌ Error while dialing .: .

@lexnv lexnv added T0-node This PR/Issue is related to the topic “node”. I5-enhancement An additional feature request. labels Nov 14, 2024
@lexnv lexnv self-assigned this Nov 14, 2024
@lexnv lexnv requested review from dmitry-markin and a team November 14, 2024 15:45
Signed-off-by: Alexandru Vasile <[email protected]>
@lexnv lexnv added the A4-needs-backport Pull request must be backported to all maintained releases. label Nov 14, 2024
@@ -1074,7 +1074,13 @@ impl<B: BlockT + 'static, H: ExHashT> NetworkBackend<B, H> for Litep2pNetworkBac
metrics.pending_connections_errors_total.with_label_values(&["transport-errors"]).inc();
}
}
_ => {}
None => {
log::error!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes only when it is terminated and not shutdown gracefully?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked with a stopped polkadot node and litep2p backend, we don't seem to emit errors here

Copy link
Contributor

@michalkucharczyk michalkucharczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went quickly over fixes in litep2p - nice work!

@lexnv lexnv added this pull request to the merge queue Nov 15, 2024
Merged via the queue into master with commit 8bea091 Nov 15, 2024
221 of 234 checks passed
@lexnv lexnv deleted the lexnv/release-0.8.1-litep2p branch November 15, 2024 10:55
@paritytech-cmd-bot-polkadot-sdk

Created backport PR for stable2407:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-6484-to-stable2407
git worktree add --checkout .worktree/backport-6484-to-stable2407 backport-6484-to-stable2407
cd .worktree/backport-6484-to-stable2407
git reset --hard HEAD^
git cherry-pick -x 8bea091ef131c8d962f45d89c28e17ece17bc5b2
git push --force-with-lease

@paritytech-cmd-bot-polkadot-sdk

Created backport PR for stable2409:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-6484-to-stable2409
git worktree add --checkout .worktree/backport-6484-to-stable2409 backport-6484-to-stable2409
cd .worktree/backport-6484-to-stable2409
git reset --hard HEAD^
git cherry-pick -x 8bea091ef131c8d962f45d89c28e17ece17bc5b2
git push --force-with-lease

lexnv added a commit that referenced this pull request Nov 18, 2024
This PR updates the litep2p backend to version 0.8.1 from 0.8.0.
- Check the [litep2p updates forum
post](https://forum.polkadot.network/t/litep2p-network-backend-updates/9973/3)
for performance dashboards.
- Check [litep2p release
notes](paritytech/litep2p#288)

The v0.8.1 release includes key fixes that enhance the stability and
performance of the litep2p library. The focus is on long-running
stability and improvements to polling mechanisms.

This issue caused long-running nodes to reject all incoming connections,
impacting overall stability.

Addressed a bug in the connection limits functionality that incorrectly
tracked connections due for rejection.

This issue caused an artificial increase in inbound peers, which were
not being properly removed from the connection limit count.

This fix ensures more accurate tracking and management of peer
connections [#286](paritytech/litep2p#286).

This release provides multiple fixes to the polling mechanism, improving
how connections and events are processed:
- Resolved an overflow issue in TransportContext’s polling index for
streams, preventing potential crashes
([#283](paritytech/litep2p#283)).
- Fixed a delay in the manager’s poll_next function that prevented
immediate polling of newly added futures
([#287](paritytech/litep2p#287)).
- Corrected an issue where the listener did not return Poll::Ready(None)
when it was closed, ensuring proper signal handling
([#285](paritytech/litep2p#285)).

- manager: Fix connection limits tracking of rejected connections
([#286](paritytech/litep2p#286))
- transport: Fix waking up on filtered events from `poll_next`
([#287](paritytech/litep2p#287))
- transports: Fix missing Poll::Ready(None) event from listener
([#285](paritytech/litep2p#285))
- manager: Avoid overflow on stream implementation for
`TransportContext`
([#283](paritytech/litep2p#283))
- manager: Log when polling returns Ready(None)
([#284](paritytech/litep2p#284))

Started kusama nodes running side by side with a higher number of
inbound and outbound connections (500).
We previously tested with peers bounded at 50. This testing filtered out
the fixes included in the latest release.

With this high connection testing setup, litep2p outperforms libp2p in
almost every domain, from performance to the warnings / errors
encountered while operating the nodes.

TLDR: this is the version we need to test on kusama validators next

- Litep2p

Repo            | Count      | Level      | Triage report
-|-|-|-
polkadot-sdk | 409 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Peer disconnected with inflight after backoffs. Banned,
disconnecting. )
litep2p | 128 | warn | Refusing to add known address that corresponds to
a different peer ID
litep2p | 54 | warn | inbound identify substream opened for peer who
doesn't exist
polkadot-sdk | 7 | error | 💔 Called `on_validated_block_announce` with a
bad peer ID .*
polkadot-sdk    | 1          | warn       | ❌ Error while dialing .*: .*
polkadot-sdk | 1 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Invalid justification. Banned, disconnecting. )

- Libp2p

Repo            | Count      | Level      | Triage report
-|-|-|-
polkadot-sdk | 1023 | warn | 💔 Ignored block \(#.* -- .*\) announcement
from .* because all validation slots are occupied.
polkadot-sdk | 472 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Unsupported protocol. Banned, disconnecting. )
polkadot-sdk | 379 | error | 💔 Called `on_validated_block_announce` with
a bad peer ID .*
polkadot-sdk | 163 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Invalid justification. Banned, disconnecting. )
polkadot-sdk | 116 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Peer disconnected with inflight after backoffs. Banned,
disconnecting. )
polkadot-sdk | 83 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Same block request multiple times. Banned,
disconnecting. )
polkadot-sdk | 4 | warn | Re-finalized block #.* \(.*\) in the canonical
chain, current best finalized is #.*
polkadot-sdk | 2 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Genesis mismatch. Banned, disconnecting. )
polkadot-sdk | 2 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Not requested block data. Banned, disconnecting. )
polkadot-sdk | 2 | warn | Can't listen on .* because: .*
polkadot-sdk    | 1          | warn       | ❌ Error while dialing .*: .*

---------

Signed-off-by: Alexandru Vasile <[email protected]>
lexnv added a commit that referenced this pull request Nov 18, 2024
This PR updates the litep2p backend to version 0.8.1 from 0.8.0.
- Check the [litep2p updates forum
post](https://forum.polkadot.network/t/litep2p-network-backend-updates/9973/3)
for performance dashboards.
- Check [litep2p release
notes](paritytech/litep2p#288)

The v0.8.1 release includes key fixes that enhance the stability and
performance of the litep2p library. The focus is on long-running
stability and improvements to polling mechanisms.

### Long Running Stability Improvements

This issue caused long-running nodes to reject all incoming connections,
impacting overall stability.

Addressed a bug in the connection limits functionality that incorrectly
tracked connections due for rejection.

This issue caused an artificial increase in inbound peers, which were
not being properly removed from the connection limit count.

This fix ensures more accurate tracking and management of peer
connections [#286](paritytech/litep2p#286).

### Polling implementation fixes

This release provides multiple fixes to the polling mechanism, improving
how connections and events are processed:
- Resolved an overflow issue in TransportContext’s polling index for
streams, preventing potential crashes
([#283](paritytech/litep2p#283)).
- Fixed a delay in the manager’s poll_next function that prevented
immediate polling of newly added futures
([#287](paritytech/litep2p#287)).
- Corrected an issue where the listener did not return Poll::Ready(None)
when it was closed, ensuring proper signal handling
([#285](paritytech/litep2p#285)).


### Fixed

- manager: Fix connection limits tracking of rejected connections
([#286](paritytech/litep2p#286))
- transport: Fix waking up on filtered events from `poll_next`
([#287](paritytech/litep2p#287))
- transports: Fix missing Poll::Ready(None) event from listener
([#285](paritytech/litep2p#285))
- manager: Avoid overflow on stream implementation for
`TransportContext`
([#283](paritytech/litep2p#283))
- manager: Log when polling returns Ready(None)
([#284](paritytech/litep2p#284))


### Testing Done

Started kusama nodes running side by side with a higher number of
inbound and outbound connections (500).
We previously tested with peers bounded at 50. This testing filtered out
the fixes included in the latest release.

With this high connection testing setup, litep2p outperforms libp2p in
almost every domain, from performance to the warnings / errors
encountered while operating the nodes.

TLDR: this is the version we need to test on kusama validators next

- Litep2p

Repo            | Count      | Level      | Triage report
-|-|-|-
polkadot-sdk | 409 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Peer disconnected with inflight after backoffs. Banned,
disconnecting. )
litep2p | 128 | warn | Refusing to add known address that corresponds to
a different peer ID
litep2p | 54 | warn | inbound identify substream opened for peer who
doesn't exist
polkadot-sdk | 7 | error | 💔 Called `on_validated_block_announce` with a
bad peer ID .*
polkadot-sdk    | 1          | warn       | ❌ Error while dialing .*: .*
polkadot-sdk | 1 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Invalid justification. Banned, disconnecting. )

- Libp2p

Repo            | Count      | Level      | Triage report
-|-|-|-
polkadot-sdk | 1023 | warn | 💔 Ignored block \(#.* -- .*\) announcement
from .* because all validation slots are occupied.
polkadot-sdk | 472 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Unsupported protocol. Banned, disconnecting. )
polkadot-sdk | 379 | error | 💔 Called `on_validated_block_announce` with
a bad peer ID .*
polkadot-sdk | 163 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Invalid justification. Banned, disconnecting. )
polkadot-sdk | 116 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Peer disconnected with inflight after backoffs. Banned,
disconnecting. )
polkadot-sdk | 83 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Same block request multiple times. Banned,
disconnecting. )
polkadot-sdk | 4 | warn | Re-finalized block #.* \(.*\) in the canonical
chain, current best finalized is #.*
polkadot-sdk | 2 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Genesis mismatch. Banned, disconnecting. )
polkadot-sdk | 2 | warn | Report .*: .* to .*. Reason: .*. Banned,
disconnecting. ( Not requested block data. Banned, disconnecting. )
polkadot-sdk | 2 | warn | Can't listen on .* because: .*
polkadot-sdk    | 1          | warn       | ❌ Error while dialing .*: .*

---------

Signed-off-by: Alexandru Vasile <[email protected]>
EgorPopelyaev pushed a commit that referenced this pull request Nov 19, 2024
…412 (#6527)

This PR updates the litep2p version on stable2412.
The litep2p requires a user opt-in / non-default backend, and changes
should not affect testing / stable libp2p versions.

For more details see PR:
- #6484

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases. I5-enhancement An additional feature request. T0-node This PR/Issue is related to the topic “node”.
Projects
Status: Blocked ⛔️
Development

Successfully merging this pull request may close these issues.

4 participants