-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFM Proposal: Number of Client nodes across various networks and implementations #45
Comments
I think this is a great initiative and would be super insightful!
I think the more common scenario would be that we might double-count peers across different data sources as opposed to a single peer participating in multiple networks. |
Thanks for creating this @yiannisbot. I'm pasting in some of the relevant info from FIL slack, in case someone can't easily access it: Concerning Number of Client vs Server Nodes in the DHT
I think addressing the cons is pretty important given themes of the last year that:
Our current network size KPIs aren't helping drive home the message of the diversity of the IPFS project. |
Here is a mock of what I'm thinking: https://docs.google.com/spreadsheets/d/1SHHPBZEsZvZ95skg8MgRNHoSaog6tJ-DpvuZePZZlJ4/edit#gid=0 Specifically, I think we need to think about our metric collection from "network probes". If implementations don't identify themselves, they get bucketed as unknown/other. For example:
The graph above is for a single month. I could imagine showing that collection of bars grouped together for each month and then displaying multiple months along the x-axis. |
I don't think this needs to be a priority currently. We can make it a caveat that nodes (peerids) will participate in multiple "networks" and that as a result, it is not accurate to say "the total number of unique IPFS peerds is the sum of all the bars". For example, I think it's fine for a Kubo peerId to count towards "Banana DHT server", "Banana DHT client", and "cid.contact IPNI". I do think we should deduplicate peerIds within a given "network". For example, a Kubo node that participates as a "Banana DHT client" every day for a month should only increase the count for that month by 1 (not 30). |
We are currently capturing the number of clients observed in the IPFS public DHT network and we report this as part of our weekly reports (currently in this repo - see example for Week 17 as well as at probelab.io: https://probelab.io/ipfsdht/#client-vs-server-node-estimate.
As per this discussion thread in Slack, this is great, but only captures part of the story, i.e., it focuses on the public IPFS DHT only, which in turn, means that it is mostly focusing on Kubo. However, IPFS is more than the kubo implementation and more than the public IPFS DHT. A request from @BigLep is to be able to "show the number of peer ids observed across various "networks" and break out by implementation".
In order to go about doing this, we'd need to identify data sources (i.e., how to collect the data) from different: i) IPFS implementations (e.g., Kubo, Helia, Iroh), and ii) networks that run IPFS nodes (e.g., the IPFS DHT, the Lotus DHT, cid.contact/IPNI, etc). We should also ideally deduplicate the PeerIDs to avoid double-counting a peer that participates in more than one network (?).
I'm starting this issue to capture first what we want to target and then come up with data collection ideas (e.g., through measurement tools, logs etc.).
cc: @BigLep @dennis-tra
The text was updated successfully, but these errors were encountered: