Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network self-organisation #11

Open
SionoiS opened this issue May 31, 2022 · 10 comments
Open

Network self-organisation #11

SionoiS opened this issue May 31, 2022 · 10 comments

Comments

@SionoiS
Copy link

SionoiS commented May 31, 2022

Hi,

I had a thought that IPFS nodes may self-organize into higher level clusters because of the way connections are formed and maintained.

More specifically, knowing the high level of node churn, do longer running nodes tend towards connecting to each others?
I would think not because of the way latency is prioritised, which result in nodes organising based on distance. Is this good?

More generally, how do we measure self-organization, and could we not use this to our benefit too?

@yiannisbot
Copy link
Member

That's a great point! Using this to our benefit is probably a long shot as we'd need to implement some sort of metadata being communicated between the nodes. But it's worth finding out nevertheless.

Nodes are organised based on XOR distance, but every node keeps several k-buckets, so it's not unreasonable to think that a node might stay connected to a particular set of peers across the hash space.

@dennis-tra is that possible to derive from the crawler logs? 🤔 The script would have to go through multiple crawls and cross-check connectivity between nodes over time/snapshots of the network.

@SionoiS
Copy link
Author

SionoiS commented Jun 1, 2022

Nodes are organised based on XOR distance, but every node keeps several k-buckets, so it's not unreasonable to think that a node might stay connected to a particular set of peers across the hash space.

Ah! I was wrong, I thought that node connected to geographically close nodes. I need to read the specs again.

edit: from the paper

The longer a node has been up, the more likely it is to remain up another hour. By keeping the oldest live contacts around, k buckets maximize the probability that the nodes they contain will remain online. A second benefit of k-buckets is that they provide resistance to certain DoS attacks. One cannot flush nodes’ routing state by flooding the system with new nodes. Kademlia nodes will only insert the new nodes in the k-buckets when old nodes leave the system.

@dennis-tra
Copy link
Contributor

Hi @SionoiS,

I think it's a great idea to look deeper into the dynamics of how the DHT overlay organises itself!

When it comes to adding remote peers to ones routing table, the relevant logic is documented here. I could imagine that stable peers form higher connected sub-graphs because IIRC, Kademlia prioritises stable, long-running nodes. However, there is also a maxLatency check in the code, so the latency is not completely irrelevant here.

The crawl data can certainly be used to correlate the uptime of peers with their presence in other peers' routing tables. However, when it comes to latency, the crawler doesn't have vantage into the latency between two peers, so the crawl data won't give insights on that :/

@SionoiS
Copy link
Author

SionoiS commented Jun 1, 2022

Is it not possible to "map" the nodes using latency from know sources?

With 3 nodes pinging each other, one could start "triangulating" all the nodes and build a map of the network.

Not super reliable as latency can change but maybe good enough for long running nodes?

@guillaumemichel
Copy link
Contributor

Concerning the DHT:

By design, we expect nodes to form small locality clusters. The routing table of a peer P is organized in k-buckets: in bucket[i] all peers share a common prefix of length i with P. The buckets sharing the longest common prefix length (CPL), expressing closeness are not expected to be full, thus the peers inside these buckets will never get flushed. These buckets will most likely to be identical on all hosts sharing this long CPL, for the XOR operation is commutative (i.e at peer p0, bucket[CPL]=[p1, p2, p3] then it is probable that at peer p1 bucket[CPL]=[p0, p2, p3]). This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

@yiannisbot
Copy link
Member

Thanks everyone for the input!

This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

I believe this holds for every k-bucket, so each node is part of several disjoint clusters. Do you agree @guillaumemichel ?

Trying to think what useful information we could get out of that, the following two come up:

  1. Approximate the dynamicity of the network, by monitoring how frequently do clusters change by 1, 2 or more peers. This is closely related to (and influenced by) the DHT Routing Table refreshing.
  2. Develop a cluster co-efficient, or some similar parameter that correlates cluster participants with their geographic position in terms of network latency, or AS number. The longer the average latency within a cluster the slower the performance of lookups will be.

Any other ideas?

I think this is worth an RFM, but not sure what priority it will take on our list of items. @SionoiS if you're keen to work towards a solution on this using the crawler data, I'd be definitely interested to discuss.

@guillaumemichel
Copy link
Contributor

guillaumemichel commented Jun 2, 2022

This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

I believe this holds for every k-bucket, so each node is part of several disjoint clusters. Do you agree @guillaumemichel ?

No, I don't believe this would be the case. For buckets with a low CPL, each node will have plenty of choices to fill its buckets, for there are many nodes that would fall in this bucket. Thus, according to the bucket replacement policy, it is very unlikely that multiple nodes share the same low CPL bucket (i.e the set [Peers in this bucket + self] is identical for multiple nodes). If you have a node far away in your bucket[X], you won't necessarily be in its bucket[X].

When it comes to adding remote peers to ones routing table, the relevant logic is documented here

However, it is probable that a peer shares a lot of its 20 closest peers with its 20 closest peers, as they have to pick the only nodes that match the high CPL, thus forming a Common Prefix Cluster.

@dennis-tra
Copy link
Contributor

Is it not possible to "map" the nodes using latency from know sources?

With 3 nodes pinging each other, one could start "triangulating" all the nodes and build a map of the network.

Not super reliable as latency can change but maybe good enough for long running nodes?

@SionoiS Absolutely! :) We are just sitting on a bunch of data waiting to be analysed and I wanted to point out that more work is needed to include the latency somehow 👍


@guillaumemichel I also think that peers with high CPLs will likely be in each other's routing table and hence form a highly connected sub-graph - and as you said, this is by design.

This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

I probably miss something here, why strictly "<= 20"? Let's say, I had three consecutive high-CPL buckets that are filled with 12, 8, and 4 peers. Because the buckets are not full it's likely that each peer in these buckets is also present in the routing tables of the other peers, hence forming a >20 peers mini-cluster, right?

I think, more interesting would be larger cluster topologies - something that's unrelated to the deliberate design choices and just happens due to other dynamics.


  1. Approximate the dynamicity of the network, by monitoring how frequently do clusters change by 1, 2 or more peers. This is closely related to (and influenced by) the DHT Routing Table refreshing.
  2. Develop a cluster co-efficient, or some similar parameter that correlates cluster participants with their geographic position in terms of network latency, or AS number. The longer the average latency within a cluster the slower the performance of lookups will be.

@yiannisbot Sound good, and will try to come up with more. For now, I think as a first step we would need to have a condition that defines when we consider a set of peers being a cluster.

@SionoiS
Copy link
Author

SionoiS commented Jun 2, 2022

I think this is worth an RFM, but not sure what priority it will take on our list of items. @SionoiS if you're keen to work towards a solution on this using the crawler data, I'd be definitely interested to discuss.

Sorry, I have already too much to do. I just wanted to share my random thought. I guess it sparked some more thoughts from you guys :)

@guillaumemichel
Copy link
Contributor

@guillaumemichel I also think that peers with high CPLs will likely be in each other's routing table and hence form a highly connected sub-graph - and as you said, this is by design.

This implies that nodes will organize in small ( <=20 which is the bucket size), disjoint clusters with other close nodes.

I probably miss something here, why strictly "<= 20"? Let's say, I had three consecutive high-CPL buckets that are filled with 12, 8, and 4 peers. Because the buckets are not full it's likely that each peer in these buckets is also present in the routing tables of the other peers, hence forming a >20 peers mini-cluster, right?

@dennis-tra yes, you are right! I think the size of the largest high-CPL buckets cluster/full mesh graph that we can expect is 40 (2x bucket size). The existence of these small buckets could be a metric to the DHT routing table health.

I think, more interesting would be larger cluster topologies - something that's unrelated to the deliberate design choices and just happens due to other dynamics.

I agree! Maybe we could even identify better bucket replacement policies using this information.

Trying to think what useful information we could get out of that, the following two come up:

1. Approximate the dynamicity of the network, by monitoring how frequently do clusters change by 1, 2 or more peers. This is closely related to (and influenced by) the DHT Routing Table refreshing.

2. Develop a cluster co-efficient, or some similar parameter that correlates cluster participants with their geographic position in terms of network latency, or AS number. The longer the average latency within a cluster the slower the performance of lookups will be.

Any other ideas?

@yiannisbot we could evaluate the effects of having clusters in low CPL buckets, in order to propose bucket replacement policy improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants