Example: Measure subnet diversity #42

SeverinAlexB · 2024-11-26T09:11:54Z

This script looks up random IDs and counts the number of times, the found nodes share the same IP /8 subnet (the first byte of the IP).
Here is an example after 160 lookups:

1 IPs: 14.55
2 IPs: 4.11
3 IPs: 1.06
4 IPs: 0.24

On average looking up k=20 nodes (November 2024), you can expect

14.55x unique subnets
2.05x 2 identical subnets (=4.11)
0.35x 3 identical subnets (=1.06)
0.06x 4 indentical subnets (=0.24)

Related to #41

Nuhvi · 2024-11-26T09:17:48Z

This is nice, but it is not as useful as an example, I had some code to calculate this on the fly for each query, and can be averaged and stored to help decide where to store the data... we should start call it sk for secure k or something.

I didn't open a PR with that code because it is not a priority. But these numbers matches what I observed as well.

Nuhvi · 2024-11-26T09:20:59Z

Here is the commit 73efb7b

SeverinAlexB · 2024-11-26T09:48:37Z

Not saying we need to merge this but calling it useless is quite a stretch? How would build counter measures if you don't know the base subnet distribution?

Nuhvi · 2024-11-26T09:53:17Z

"not as useful" not useless, it is useful to compare the results from thorough analysis to a quick and iterative one. Just like the size estimation, the base subnet distribution is going to be calculated from the average of previous queries.

Remember, the "base subnet distribution" is a dynamic thing like the size estimation, so we shouldn't just hardcode it after an expensive offline check, instead it should be something the client keeps track of from previous queries. If it doesn't have any previous queries, then it just defaults to storing to all responding nodes since that is the conservative choice.

SeverinAlexB · 2024-11-26T10:09:18Z

Given that the BEP_0042 hash function provides a uniform distribution of subnets, the problem of detecting outliers can be determined without prior knowledge. It is very similar to the birthday problem. At the same time, you can statistically measure how close the given subnets are to a uniform distribution.

Did I have all this math in uni 10y ago? Yes. Can I still do it? Need to figure this out.

Priori assuming that you can only do this with previous data is wrong.

Nuhvi · 2024-11-26T10:15:35Z

I don't think you can predict this, you can only observe it. If all DHT nodes started to churn and only one subnet is still standing, are you just going to say no this is not what I expect so I won't use these nodes at all? I know this is an extreme example, but my point is that the only assumption you can make in realtime is that most of previous queries saw honest nodes so you are comparing the current query to previous distribution of nodes over subnets. Same as comparing the current distances distribution to the average of previous distributions (summarized by the dht size estimate).

SeverinAlexB · 2024-11-26T10:36:49Z

If you want to have it work for testnet for example too, then you are correct.

SeverinAlexB · 2024-11-26T11:28:23Z

Not a bad distribution except for some unused blocks and some hot spots. 13'400 sample nodes.
Currently, the biggest subnet contains 3.36% of all nodes.

If you want an easy mainnet only dirty rule: If any subnet is more than 20% of all nodes in a bucket (4 in case of k=20), kick some of them. 20% is very generous. You could easily do 10% (2 in case k=20).

If you wanna do your full flexible approach, feel free.

  1.0.0.0/8 probability: 0.56%
  2.0.0.0/8 probability: 0.75%
  3.0.0.0/8 probability: 0.00%
  4.0.0.0/8 probability: 0.00%
  5.0.0.0/8 probability: 2.35%
  8.0.0.0/8 probability: 0.00%
  9.0.0.0/8 probability: 0.00%
 12.0.0.0/8 probability: 0.00%
 13.0.0.0/8 probability: 0.00%
 14.0.0.0/8 probability: 1.16%
 15.0.0.0/8 probability: 0.04%
 16.0.0.0/8 probability: 0.00%
 17.0.0.0/8 probability: 0.00%
 18.0.0.0/8 probability: 0.00%
 19.0.0.0/8 probability: 0.00%
 20.0.0.0/8 probability: 0.04%
 23.0.0.0/8 probability: 0.11%
 24.0.0.0/8 probability: 0.90%
 25.0.0.0/8 probability: 0.00%
 27.0.0.0/8 probability: 0.34%
 31.0.0.0/8 probability: 1.19%
 32.0.0.0/8 probability: 0.04%
 34.0.0.0/8 probability: 0.00%
 35.0.0.0/8 probability: 0.11%
 36.0.0.0/8 probability: 0.19%
 37.0.0.0/8 probability: 1.68%
 38.0.0.0/8 probability: 0.15%
 39.0.0.0/8 probability: 0.41%
 40.0.0.0/8 probability: 0.00%
 41.0.0.0/8 probability: 0.37%
 42.0.0.0/8 probability: 0.22%
 43.0.0.0/8 probability: 0.11%
 44.0.0.0/8 probability: 0.00%
 45.0.0.0/8 probability: 1.98%
 46.0.0.0/8 probability: 2.91%
 47.0.0.0/8 probability: 0.52%
 48.0.0.0/8 probability: 0.00%
 49.0.0.0/8 probability: 1.23%
 50.0.0.0/8 probability: 0.15%
 51.0.0.0/8 probability: 0.41%
 52.0.0.0/8 probability: 0.04%
 53.0.0.0/8 probability: 0.00%
 54.0.0.0/8 probability: 0.07%
 57.0.0.0/8 probability: 0.00%
 58.0.0.0/8 probability: 0.56%
 59.0.0.0/8 probability: 0.63%
 60.0.0.0/8 probability: 0.26%
 61.0.0.0/8 probability: 0.56%
 62.0.0.0/8 probability: 0.93%
 63.0.0.0/8 probability: 0.00%
 64.0.0.0/8 probability: 0.19%
 65.0.0.0/8 probability: 0.19%
 66.0.0.0/8 probability: 0.49%
 67.0.0.0/8 probability: 0.37%
 68.0.0.0/8 probability: 0.37%
 69.0.0.0/8 probability: 0.11%
 70.0.0.0/8 probability: 0.41%
 71.0.0.0/8 probability: 0.60%
 72.0.0.0/8 probability: 0.82%
 73.0.0.0/8 probability: 0.45%
 74.0.0.0/8 probability: 0.41%
 75.0.0.0/8 probability: 0.11%
 76.0.0.0/8 probability: 0.49%
 77.0.0.0/8 probability: 1.16%
 78.0.0.0/8 probability: 1.16%
 79.0.0.0/8 probability: 1.01%
 80.0.0.0/8 probability: 0.86%
 81.0.0.0/8 probability: 0.78%
 82.0.0.0/8 probability: 1.01%
 83.0.0.0/8 probability: 0.75%
 84.0.0.0/8 probability: 0.71%
 85.0.0.0/8 probability: 1.49%
 86.0.0.0/8 probability: 0.97%
 87.0.0.0/8 probability: 0.71%
 88.0.0.0/8 probability: 0.60%
 89.0.0.0/8 probability: 1.38%
 90.0.0.0/8 probability: 0.75%
 91.0.0.0/8 probability: 1.23%
 92.0.0.0/8 probability: 0.97%
 93.0.0.0/8 probability: 0.71%
 94.0.0.0/8 probability: 1.57%
 95.0.0.0/8 probability: 3.36%
 96.0.0.0/8 probability: 0.19%
 97.0.0.0/8 probability: 0.04%
 98.0.0.0/8 probability: 0.30%
 99.0.0.0/8 probability: 0.49%
100.0.0.0/8 probability: 0.07%
101.0.0.0/8 probability: 0.34%
102.0.0.0/8 probability: 0.49%
103.0.0.0/8 probability: 0.93%
104.0.0.0/8 probability: 0.11%
105.0.0.0/8 probability: 0.49%
106.0.0.0/8 probability: 0.90%
107.0.0.0/8 probability: 0.26%
108.0.0.0/8 probability: 0.45%
109.0.0.0/8 probability: 1.31%
110.0.0.0/8 probability: 0.45%
111.0.0.0/8 probability: 0.19%
112.0.0.0/8 probability: 0.97%
113.0.0.0/8 probability: 0.22%
114.0.0.0/8 probability: 0.49%
115.0.0.0/8 probability: 0.37%
116.0.0.0/8 probability: 0.26%
117.0.0.0/8 probability: 0.26%
118.0.0.0/8 probability: 0.56%
119.0.0.0/8 probability: 0.52%
120.0.0.0/8 probability: 0.52%
121.0.0.0/8 probability: 0.71%
122.0.0.0/8 probability: 0.45%
123.0.0.0/8 probability: 0.26%
124.0.0.0/8 probability: 0.41%
125.0.0.0/8 probability: 0.52%
126.0.0.0/8 probability: 0.11%
127.0.0.0/8 probability: 0.00%
128.0.0.0/8 probability: 0.00%
129.0.0.0/8 probability: 0.11%
130.0.0.0/8 probability: 0.34%
131.0.0.0/8 probability: 0.22%
132.0.0.0/8 probability: 0.00%
133.0.0.0/8 probability: 0.04%
134.0.0.0/8 probability: 0.07%
135.0.0.0/8 probability: 0.00%
136.0.0.0/8 probability: 0.60%
137.0.0.0/8 probability: 0.00%
138.0.0.0/8 probability: 0.26%
139.0.0.0/8 probability: 0.07%
140.0.0.0/8 probability: 0.07%
141.0.0.0/8 probability: 0.19%
142.0.0.0/8 probability: 0.37%
143.0.0.0/8 probability: 0.19%
144.0.0.0/8 probability: 0.00%
145.0.0.0/8 probability: 0.19%
146.0.0.0/8 probability: 0.30%
147.0.0.0/8 probability: 0.04%
148.0.0.0/8 probability: 0.07%
149.0.0.0/8 probability: 0.56%
150.0.0.0/8 probability: 0.04%
151.0.0.0/8 probability: 0.34%
152.0.0.0/8 probability: 1.01%
153.0.0.0/8 probability: 0.00%
154.0.0.0/8 probability: 0.41%
155.0.0.0/8 probability: 0.07%
156.0.0.0/8 probability: 0.11%
157.0.0.0/8 probability: 0.34%
158.0.0.0/8 probability: 0.19%
159.0.0.0/8 probability: 0.15%
160.0.0.0/8 probability: 0.11%
161.0.0.0/8 probability: 0.15%
162.0.0.0/8 probability: 0.19%
163.0.0.0/8 probability: 0.15%
164.0.0.0/8 probability: 0.04%
165.0.0.0/8 probability: 0.26%
166.0.0.0/8 probability: 0.04%
167.0.0.0/8 probability: 0.15%
168.0.0.0/8 probability: 0.19%
169.0.0.0/8 probability: 0.45%
170.0.0.0/8 probability: 0.19%
171.0.0.0/8 probability: 0.41%
172.0.0.0/8 probability: 0.30%
173.0.0.0/8 probability: 0.52%
174.0.0.0/8 probability: 0.22%
175.0.0.0/8 probability: 0.82%
176.0.0.0/8 probability: 2.20%
177.0.0.0/8 probability: 0.93%
178.0.0.0/8 probability: 2.61%
179.0.0.0/8 probability: 0.37%
180.0.0.0/8 probability: 0.67%
181.0.0.0/8 probability: 0.67%
182.0.0.0/8 probability: 0.37%
183.0.0.0/8 probability: 0.45%
184.0.0.0/8 probability: 0.22%
185.0.0.0/8 probability: 2.95%
186.0.0.0/8 probability: 1.01%
187.0.0.0/8 probability: 0.78%
188.0.0.0/8 probability: 2.28%
189.0.0.0/8 probability: 0.78%
190.0.0.0/8 probability: 0.37%
191.0.0.0/8 probability: 0.37%
192.0.0.0/8 probability: 0.22%
193.0.0.0/8 probability: 1.08%
194.0.0.0/8 probability: 0.30%
195.0.0.0/8 probability: 0.49%
196.0.0.0/8 probability: 0.34%
197.0.0.0/8 probability: 0.49%
198.0.0.0/8 probability: 0.45%
199.0.0.0/8 probability: 0.00%
200.0.0.0/8 probability: 0.26%
201.0.0.0/8 probability: 0.26%
202.0.0.0/8 probability: 0.11%
203.0.0.0/8 probability: 0.22%
204.0.0.0/8 probability: 0.07%
205.0.0.0/8 probability: 0.07%
206.0.0.0/8 probability: 0.15%
207.0.0.0/8 probability: 0.19%
208.0.0.0/8 probability: 0.07%
209.0.0.0/8 probability: 0.07%
210.0.0.0/8 probability: 0.49%
211.0.0.0/8 probability: 0.86%
212.0.0.0/8 probability: 1.23%
213.0.0.0/8 probability: 0.63%
216.0.0.0/8 probability: 0.30%
217.0.0.0/8 probability: 0.52%
218.0.0.0/8 probability: 0.22%
219.0.0.0/8 probability: 0.15%
220.0.0.0/8 probability: 0.67%
221.0.0.0/8 probability: 0.34%
222.0.0.0/8 probability: 0.63%

Nuhvi · 2024-11-26T11:32:01Z

Kicking nodes is never advisable, the measurement should always be "how many nodes of the responders should we store data at, so that they have similar distribution to the average seen so far"... that's what we already do with distances, and the worst case is that you store at all nodes, which is not bad at all.

Example to measure subnet diversity

a910e2d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example: Measure subnet diversity #42

Example: Measure subnet diversity #42

SeverinAlexB commented Nov 26, 2024 •

edited

Loading

Nuhvi commented Nov 26, 2024

Nuhvi commented Nov 26, 2024

SeverinAlexB commented Nov 26, 2024

Nuhvi commented Nov 26, 2024 •

edited

Loading

SeverinAlexB commented Nov 26, 2024

Nuhvi commented Nov 26, 2024

SeverinAlexB commented Nov 26, 2024

SeverinAlexB commented Nov 26, 2024

Nuhvi commented Nov 26, 2024

Example: Measure subnet diversity #42

Are you sure you want to change the base?

Example: Measure subnet diversity #42

Conversation

SeverinAlexB commented Nov 26, 2024 • edited Loading

Nuhvi commented Nov 26, 2024

Nuhvi commented Nov 26, 2024

SeverinAlexB commented Nov 26, 2024

Nuhvi commented Nov 26, 2024 • edited Loading

SeverinAlexB commented Nov 26, 2024

Nuhvi commented Nov 26, 2024

SeverinAlexB commented Nov 26, 2024

SeverinAlexB commented Nov 26, 2024

Nuhvi commented Nov 26, 2024

SeverinAlexB commented Nov 26, 2024 •

edited

Loading

Nuhvi commented Nov 26, 2024 •

edited

Loading