Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] feat: conntracker #3032

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

MarcoPolo
Copy link
Collaborator

This PR attempts to refactor how NewStream works. Instead of having the swarm be responsible for picking the best connection, and then waiting on identify to run on that connection. We have a separate service that subscribes to connection events and identify events, and the NewStream asks for the best connection that has been identified.

This has a couple of advantages:

  1. We can wait for a Identify to complete on any connection. Not just the one we happen to pick.
  2. It moves connection tracking code out of the swarm into a smaller service that can be tested independently.
    a. I'd even like to run a fuzz test on the conntracker service itself.
  3. It attaches a peer's supported protocols to the connection itself rather than the peer.
    a. This allows a peer to say they support protocol X and Y over a connection, and only support Z over another one. This solves a similar problem to labelling connections as "limited", but lets the remote have a say in how we use the connection rather than the local side assuming something. But note that we aren't actually using this just yet. See also Per-connection protocol list #2693

This is a draft for now as there are a couple of TODOs in the code, some more tests I want to add, and some thought in how to roll this out (maybe default on, but can disable with an ENV flag for debugging purposes.)

This is also how I think we should build on and improve go-libp2p moving forward. Simple services that share information over an eventbus or channels (depending on the use case).

@MarcoPolo MarcoPolo self-assigned this Nov 18, 2024
@sukunrt sukunrt self-assigned this Nov 18, 2024
Copy link
Member

@sukunrt sukunrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense.

I need to think more about any race condition involving swarm and identify events ordering. Not sure we've surely dealt with #2983

Should we add a NewStream method to ConnTracker? This allows us to check whether the connection we're returning is closed already and use some other connection.

Overall, I like reducing the size of swarm and to move protocols to connection. Maybe one day we can even deprecate Peerstore peer protocols api.

}
case evt := <-ct.sub.Out():
switch evt := evt.(type) {
case event.EvtPeerConnectednessChanged:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an event for connection closed as opposed to peer connectivity change? It should be simple enough to have one as we're always waiting on AcceptStream

Comment on lines +282 to +292
type ConnWithMeta struct {
network.Conn
Identified bool
supportedProtocols map[protocol.ID]struct{}
MatchingProtocols []protocol.ID
}

func (c *ConnWithMeta) SupportsProtocol(p protocol.ID) bool {
_, ok := c.supportedProtocols[p]
return ok
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just export supportedProtocols? We're not sharing the map, right?

Comment on lines +734 to +755
if nodial, _ := network.GetNoDial(ctx); !nodial {
errCh = make(chan error, 1)
go func() {
err := h.Connect(ctx, peer.AddrInfo{ID: p})
if err != nil {
select {
case errCh <- err:
default:
}
}
}()
}

// Wait for a connection that works for us
connChan, err := h.connTracker.GetBestConnChan(ctx, p, conntracker.GetBestConnOpts{
OneOf: requiredProtos,
FilterFn: connFilter,
WaitForIdentify: true, // Old behavior
})
if err != nil {
return conntracker.ConnWithMeta{}, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this pattern: Dial the peer through swarm, then wait on a conntracker chan.

Letting the conn tracker dial it also seems bad. Maybe, just do swarm.DialPeer? This doesn't wait for identify and returns the connection.

Comment on lines +230 to +232
if m, ok := ct.trackedConns[notif.conn.RemotePeer()]; ok {
delete(m, notif.conn)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: you probably don't need to check this. delete will work on nil map.

Comment on lines +245 to +249
case event.EvtPeerIdentificationFailed:
ct.updateProtos(evt.Peer, evt.Conn, nil, false, true)
ct.totalPendingReqs -= ct.tryFulfillPendingReqs(evt.Peer)
case event.EvtPeerIdentificationCompleted:
ct.updateProtos(evt.Peer, evt.Conn, evt.Protocols, true, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the implementation too closely. Just calling it out here once:
Identify events are not synchronized with any swarm events. Even the Connected callback may happen after event peer identification completed.

Comment on lines +176 to 178

ConnTracker *conntracker.ConnTracker
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can users of basic host get access to the ConnTracker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants