-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network self-organisation #11
Comments
That's a great point! Using this to our benefit is probably a long shot as we'd need to implement some sort of metadata being communicated between the nodes. But it's worth finding out nevertheless. Nodes are organised based on XOR distance, but every node keeps several k-buckets, so it's not unreasonable to think that a node might stay connected to a particular set of peers across the hash space. @dennis-tra is that possible to derive from the crawler logs? 🤔 The script would have to go through multiple crawls and cross-check connectivity between nodes over time/snapshots of the network. |
Ah! I was wrong, I thought that node connected to geographically close nodes. I need to read the specs again. edit: from the paper
|
Hi @SionoiS, I think it's a great idea to look deeper into the dynamics of how the DHT overlay organises itself! When it comes to adding remote peers to ones routing table, the relevant logic is documented here. I could imagine that stable peers form higher connected sub-graphs because IIRC, Kademlia prioritises stable, long-running nodes. However, there is also a The crawl data can certainly be used to correlate the uptime of peers with their presence in other peers' routing tables. However, when it comes to latency, the crawler doesn't have vantage into the latency between two peers, so the crawl data won't give insights on that :/ |
Is it not possible to "map" the nodes using latency from know sources? With 3 nodes pinging each other, one could start "triangulating" all the nodes and build a map of the network. Not super reliable as latency can change but maybe good enough for long running nodes? |
Concerning the DHT: By design, we expect nodes to form small locality clusters. The routing table of a peer |
Thanks everyone for the input!
I believe this holds for every k-bucket, so each node is part of several disjoint clusters. Do you agree @guillaumemichel ? Trying to think what useful information we could get out of that, the following two come up:
Any other ideas? I think this is worth an RFM, but not sure what priority it will take on our list of items. @SionoiS if you're keen to work towards a solution on this using the crawler data, I'd be definitely interested to discuss. |
No, I don't believe this would be the case. For buckets with a low CPL, each node will have plenty of choices to fill its buckets, for there are many nodes that would fall in this bucket. Thus, according to the bucket replacement policy, it is very unlikely that multiple nodes share the same low CPL bucket (i.e the set [Peers in this bucket + self] is identical for multiple nodes). If you have a node far away in your
However, it is probable that a peer shares a lot of its 20 closest peers with its 20 closest peers, as they have to pick the only nodes that match the high CPL, thus forming a Common Prefix Cluster. |
@SionoiS Absolutely! :) We are just sitting on a bunch of data waiting to be analysed and I wanted to point out that more work is needed to include the latency somehow 👍 @guillaumemichel I also think that peers with high CPLs will likely be in each other's routing table and hence form a highly connected sub-graph - and as you said, this is by design.
I probably miss something here, why strictly "<= 20"? Let's say, I had three consecutive high-CPL buckets that are filled with 12, 8, and 4 peers. Because the buckets are not full it's likely that each peer in these buckets is also present in the routing tables of the other peers, hence forming a >20 peers mini-cluster, right? I think, more interesting would be larger cluster topologies - something that's unrelated to the deliberate design choices and just happens due to other dynamics.
@yiannisbot Sound good, and will try to come up with more. For now, I think as a first step we would need to have a condition that defines when we consider a set of peers being a cluster. |
Sorry, I have already too much to do. I just wanted to share my random thought. I guess it sparked some more thoughts from you guys :) |
@dennis-tra yes, you are right! I think the size of the largest high-CPL buckets cluster/full mesh graph that we can expect is 40 (2x bucket size). The existence of these small buckets could be a metric to the DHT routing table health.
I agree! Maybe we could even identify better bucket replacement policies using this information.
@yiannisbot we could evaluate the effects of having clusters in low CPL buckets, in order to propose bucket replacement policy improvements. |
Hi,
I had a thought that IPFS nodes may self-organize into higher level clusters because of the way connections are formed and maintained.
More specifically, knowing the high level of node churn, do longer running nodes tend towards connecting to each others?
I would think not because of the way latency is prioritised, which result in nodes organising based on distance. Is this good?
More generally, how do we measure self-organization, and could we not use this to our benefit too?
The text was updated successfully, but these errors were encountered: