Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example: Measure subnet diversity #42

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SeverinAlexB
Copy link
Contributor

@SeverinAlexB SeverinAlexB commented Nov 26, 2024

This script looks up random IDs and counts the number of times, the found nodes share the same IP /8 subnet (the first byte of the IP).
Here is an example after 160 lookups:

  • 1 IPs: 14.55
  • 2 IPs: 4.11
  • 3 IPs: 1.06
  • 4 IPs: 0.24

On average looking up k=20 nodes (November 2024), you can expect

  • 14.55x unique subnets
  • 2.05x 2 identical subnets (=4.11)
  • 0.35x 3 identical subnets (=1.06)
  • 0.06x 4 indentical subnets (=0.24)

Related to #41

@Nuhvi
Copy link
Collaborator

Nuhvi commented Nov 26, 2024

This is nice, but it is not as useful as an example, I had some code to calculate this on the fly for each query, and can be averaged and stored to help decide where to store the data... we should start call it sk for secure k or something.

I didn't open a PR with that code because it is not a priority. But these numbers matches what I observed as well.

@Nuhvi
Copy link
Collaborator

Nuhvi commented Nov 26, 2024

Here is the commit 73efb7b

@SeverinAlexB
Copy link
Contributor Author

Not saying we need to merge this but calling it useless is quite a stretch? How would build counter measures if you don't know the base subnet distribution?

@Nuhvi
Copy link
Collaborator

Nuhvi commented Nov 26, 2024

"not as useful" not useless, it is useful to compare the results from thorough analysis to a quick and iterative one. Just like the size estimation, the base subnet distribution is going to be calculated from the average of previous queries.

Remember, the "base subnet distribution" is a dynamic thing like the size estimation, so we shouldn't just hardcode it after an expensive offline check, instead it should be something the client keeps track of from previous queries. If it doesn't have any previous queries, then it just defaults to storing to all responding nodes since that is the conservative choice.

@SeverinAlexB
Copy link
Contributor Author

Given that the BEP_0042 hash function provides a uniform distribution of subnets, the problem of detecting outliers can be determined without prior knowledge. It is very similar to the birthday problem. At the same time, you can statistically measure how close the given subnets are to a uniform distribution.

Did I have all this math in uni 10y ago? Yes. Can I still do it? Need to figure this out.

Priori assuming that you can only do this with previous data is wrong.

@Nuhvi
Copy link
Collaborator

Nuhvi commented Nov 26, 2024

I don't think you can predict this, you can only observe it. If all DHT nodes started to churn and only one subnet is still standing, are you just going to say no this is not what I expect so I won't use these nodes at all? I know this is an extreme example, but my point is that the only assumption you can make in realtime is that most of previous queries saw honest nodes so you are comparing the current query to previous distribution of nodes over subnets. Same as comparing the current distances distribution to the average of previous distributions (summarized by the dht size estimate).

@SeverinAlexB
Copy link
Contributor Author

If you want to have it work for testnet for example too, then you are correct.

@SeverinAlexB
Copy link
Contributor Author

Not a bad distribution except for some unused blocks and some hot spots. 13'400 sample nodes.
Currently, the biggest subnet contains 3.36% of all nodes.

If you want an easy mainnet only dirty rule: If any subnet is more than 20% of all nodes in a bucket (4 in case of k=20), kick some of them. 20% is very generous. You could easily do 10% (2 in case k=20).

If you wanna do your full flexible approach, feel free.

  1.0.0.0/8 probability: 0.56%
  2.0.0.0/8 probability: 0.75%
  3.0.0.0/8 probability: 0.00%
  4.0.0.0/8 probability: 0.00%
  5.0.0.0/8 probability: 2.35%
  8.0.0.0/8 probability: 0.00%
  9.0.0.0/8 probability: 0.00%
 12.0.0.0/8 probability: 0.00%
 13.0.0.0/8 probability: 0.00%
 14.0.0.0/8 probability: 1.16%
 15.0.0.0/8 probability: 0.04%
 16.0.0.0/8 probability: 0.00%
 17.0.0.0/8 probability: 0.00%
 18.0.0.0/8 probability: 0.00%
 19.0.0.0/8 probability: 0.00%
 20.0.0.0/8 probability: 0.04%
 23.0.0.0/8 probability: 0.11%
 24.0.0.0/8 probability: 0.90%
 25.0.0.0/8 probability: 0.00%
 27.0.0.0/8 probability: 0.34%
 31.0.0.0/8 probability: 1.19%
 32.0.0.0/8 probability: 0.04%
 34.0.0.0/8 probability: 0.00%
 35.0.0.0/8 probability: 0.11%
 36.0.0.0/8 probability: 0.19%
 37.0.0.0/8 probability: 1.68%
 38.0.0.0/8 probability: 0.15%
 39.0.0.0/8 probability: 0.41%
 40.0.0.0/8 probability: 0.00%
 41.0.0.0/8 probability: 0.37%
 42.0.0.0/8 probability: 0.22%
 43.0.0.0/8 probability: 0.11%
 44.0.0.0/8 probability: 0.00%
 45.0.0.0/8 probability: 1.98%
 46.0.0.0/8 probability: 2.91%
 47.0.0.0/8 probability: 0.52%
 48.0.0.0/8 probability: 0.00%
 49.0.0.0/8 probability: 1.23%
 50.0.0.0/8 probability: 0.15%
 51.0.0.0/8 probability: 0.41%
 52.0.0.0/8 probability: 0.04%
 53.0.0.0/8 probability: 0.00%
 54.0.0.0/8 probability: 0.07%
 57.0.0.0/8 probability: 0.00%
 58.0.0.0/8 probability: 0.56%
 59.0.0.0/8 probability: 0.63%
 60.0.0.0/8 probability: 0.26%
 61.0.0.0/8 probability: 0.56%
 62.0.0.0/8 probability: 0.93%
 63.0.0.0/8 probability: 0.00%
 64.0.0.0/8 probability: 0.19%
 65.0.0.0/8 probability: 0.19%
 66.0.0.0/8 probability: 0.49%
 67.0.0.0/8 probability: 0.37%
 68.0.0.0/8 probability: 0.37%
 69.0.0.0/8 probability: 0.11%
 70.0.0.0/8 probability: 0.41%
 71.0.0.0/8 probability: 0.60%
 72.0.0.0/8 probability: 0.82%
 73.0.0.0/8 probability: 0.45%
 74.0.0.0/8 probability: 0.41%
 75.0.0.0/8 probability: 0.11%
 76.0.0.0/8 probability: 0.49%
 77.0.0.0/8 probability: 1.16%
 78.0.0.0/8 probability: 1.16%
 79.0.0.0/8 probability: 1.01%
 80.0.0.0/8 probability: 0.86%
 81.0.0.0/8 probability: 0.78%
 82.0.0.0/8 probability: 1.01%
 83.0.0.0/8 probability: 0.75%
 84.0.0.0/8 probability: 0.71%
 85.0.0.0/8 probability: 1.49%
 86.0.0.0/8 probability: 0.97%
 87.0.0.0/8 probability: 0.71%
 88.0.0.0/8 probability: 0.60%
 89.0.0.0/8 probability: 1.38%
 90.0.0.0/8 probability: 0.75%
 91.0.0.0/8 probability: 1.23%
 92.0.0.0/8 probability: 0.97%
 93.0.0.0/8 probability: 0.71%
 94.0.0.0/8 probability: 1.57%
 95.0.0.0/8 probability: 3.36%
 96.0.0.0/8 probability: 0.19%
 97.0.0.0/8 probability: 0.04%
 98.0.0.0/8 probability: 0.30%
 99.0.0.0/8 probability: 0.49%
100.0.0.0/8 probability: 0.07%
101.0.0.0/8 probability: 0.34%
102.0.0.0/8 probability: 0.49%
103.0.0.0/8 probability: 0.93%
104.0.0.0/8 probability: 0.11%
105.0.0.0/8 probability: 0.49%
106.0.0.0/8 probability: 0.90%
107.0.0.0/8 probability: 0.26%
108.0.0.0/8 probability: 0.45%
109.0.0.0/8 probability: 1.31%
110.0.0.0/8 probability: 0.45%
111.0.0.0/8 probability: 0.19%
112.0.0.0/8 probability: 0.97%
113.0.0.0/8 probability: 0.22%
114.0.0.0/8 probability: 0.49%
115.0.0.0/8 probability: 0.37%
116.0.0.0/8 probability: 0.26%
117.0.0.0/8 probability: 0.26%
118.0.0.0/8 probability: 0.56%
119.0.0.0/8 probability: 0.52%
120.0.0.0/8 probability: 0.52%
121.0.0.0/8 probability: 0.71%
122.0.0.0/8 probability: 0.45%
123.0.0.0/8 probability: 0.26%
124.0.0.0/8 probability: 0.41%
125.0.0.0/8 probability: 0.52%
126.0.0.0/8 probability: 0.11%
127.0.0.0/8 probability: 0.00%
128.0.0.0/8 probability: 0.00%
129.0.0.0/8 probability: 0.11%
130.0.0.0/8 probability: 0.34%
131.0.0.0/8 probability: 0.22%
132.0.0.0/8 probability: 0.00%
133.0.0.0/8 probability: 0.04%
134.0.0.0/8 probability: 0.07%
135.0.0.0/8 probability: 0.00%
136.0.0.0/8 probability: 0.60%
137.0.0.0/8 probability: 0.00%
138.0.0.0/8 probability: 0.26%
139.0.0.0/8 probability: 0.07%
140.0.0.0/8 probability: 0.07%
141.0.0.0/8 probability: 0.19%
142.0.0.0/8 probability: 0.37%
143.0.0.0/8 probability: 0.19%
144.0.0.0/8 probability: 0.00%
145.0.0.0/8 probability: 0.19%
146.0.0.0/8 probability: 0.30%
147.0.0.0/8 probability: 0.04%
148.0.0.0/8 probability: 0.07%
149.0.0.0/8 probability: 0.56%
150.0.0.0/8 probability: 0.04%
151.0.0.0/8 probability: 0.34%
152.0.0.0/8 probability: 1.01%
153.0.0.0/8 probability: 0.00%
154.0.0.0/8 probability: 0.41%
155.0.0.0/8 probability: 0.07%
156.0.0.0/8 probability: 0.11%
157.0.0.0/8 probability: 0.34%
158.0.0.0/8 probability: 0.19%
159.0.0.0/8 probability: 0.15%
160.0.0.0/8 probability: 0.11%
161.0.0.0/8 probability: 0.15%
162.0.0.0/8 probability: 0.19%
163.0.0.0/8 probability: 0.15%
164.0.0.0/8 probability: 0.04%
165.0.0.0/8 probability: 0.26%
166.0.0.0/8 probability: 0.04%
167.0.0.0/8 probability: 0.15%
168.0.0.0/8 probability: 0.19%
169.0.0.0/8 probability: 0.45%
170.0.0.0/8 probability: 0.19%
171.0.0.0/8 probability: 0.41%
172.0.0.0/8 probability: 0.30%
173.0.0.0/8 probability: 0.52%
174.0.0.0/8 probability: 0.22%
175.0.0.0/8 probability: 0.82%
176.0.0.0/8 probability: 2.20%
177.0.0.0/8 probability: 0.93%
178.0.0.0/8 probability: 2.61%
179.0.0.0/8 probability: 0.37%
180.0.0.0/8 probability: 0.67%
181.0.0.0/8 probability: 0.67%
182.0.0.0/8 probability: 0.37%
183.0.0.0/8 probability: 0.45%
184.0.0.0/8 probability: 0.22%
185.0.0.0/8 probability: 2.95%
186.0.0.0/8 probability: 1.01%
187.0.0.0/8 probability: 0.78%
188.0.0.0/8 probability: 2.28%
189.0.0.0/8 probability: 0.78%
190.0.0.0/8 probability: 0.37%
191.0.0.0/8 probability: 0.37%
192.0.0.0/8 probability: 0.22%
193.0.0.0/8 probability: 1.08%
194.0.0.0/8 probability: 0.30%
195.0.0.0/8 probability: 0.49%
196.0.0.0/8 probability: 0.34%
197.0.0.0/8 probability: 0.49%
198.0.0.0/8 probability: 0.45%
199.0.0.0/8 probability: 0.00%
200.0.0.0/8 probability: 0.26%
201.0.0.0/8 probability: 0.26%
202.0.0.0/8 probability: 0.11%
203.0.0.0/8 probability: 0.22%
204.0.0.0/8 probability: 0.07%
205.0.0.0/8 probability: 0.07%
206.0.0.0/8 probability: 0.15%
207.0.0.0/8 probability: 0.19%
208.0.0.0/8 probability: 0.07%
209.0.0.0/8 probability: 0.07%
210.0.0.0/8 probability: 0.49%
211.0.0.0/8 probability: 0.86%
212.0.0.0/8 probability: 1.23%
213.0.0.0/8 probability: 0.63%
216.0.0.0/8 probability: 0.30%
217.0.0.0/8 probability: 0.52%
218.0.0.0/8 probability: 0.22%
219.0.0.0/8 probability: 0.15%
220.0.0.0/8 probability: 0.67%
221.0.0.0/8 probability: 0.34%
222.0.0.0/8 probability: 0.63%

@Nuhvi
Copy link
Collaborator

Nuhvi commented Nov 26, 2024

Kicking nodes is never advisable, the measurement should always be "how many nodes of the responders should we store data at, so that they have similar distribution to the average seen so far"... that's what we already do with distances, and the worst case is that you store at all nodes, which is not bad at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants