Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] fix(dcutr): fix roles in tcp simultaneous connection #3044

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

MarcoPolo
Copy link
Collaborator

@MarcoPolo MarcoPolo commented Nov 15, 2024

We did the opposite thing of what the spec says to do. This meant that we would fail to hole-punch with rust nodes because both sides would attempt to be the dialer.

The spec is a bit confusing since roles get flipped in the middle. But essentially:

  1. A connects to B.
  2. B initiates the /libp2p/dcutr protocol.
  3. After synchronizing, A and B dial each other. A is the client and B is the server.

This means the side handling the stream is the client, and the side initiating the stream should be the server.


Rollout:

We can't merge this until we figure out how we are going to roll this out.

This change, as is, is a backwards incompatible change. It would cause TCP hole punching to fail between go-libp2p nodes prior to this change and those after.

Let's discuss the rollout strategy here.

We did the opposite thing of what the spec says to do. This meant that
we would fail to hole-punch with rust nodes because both sides we
attempt to be the dialer.
@sukunrt
Copy link
Member

sukunrt commented Nov 18, 2024

On the side that in the new code will be the Server, can we add a random wait time after which the server will switch the role to Client and initiate the security negotiation?

@MarcoPolo MarcoPolo changed the title [DO NOT MERGE] fix: dcutr: fix roles in tcp simultaneous connection [DO NOT MERGE] fix(dcutr): fix roles in tcp simultaneous connection Nov 18, 2024
@MarcoPolo
Copy link
Collaborator Author

Does it need to be random?

Could the server wait 3 RTT (it has the estimate) for the client and switch after?

@sukunrt
Copy link
Member

sukunrt commented Nov 19, 2024

Unfortunately simple timeouts wont work. This is trickier.

We need to always start as a Server. Then switch roles after a random wait. This is because when old go nodes and new go nodes interact, they may both assume the Client role, in which case, conn establishment will fail as both the sides assume the Client role.

In the new code, if we always assume the role of Server, there will be 3 cases:

old go node vs new go node

  1. Client <-> Server:
    For client client to work, we need to assume the role of Server in new code. The peer, old go node, will initiate the handshake.
  2. Server <-> Server
    For server server to work, we need to assume the role of Server in new code, switch roles and initiate the handshake.

new go node vs new go node

  1. Server <-> Server:
    After a random backoff one side becomes the client and initiates the handshake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants