-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anomaly detection and alerting for interesting Bitcoin P2P metrics #13
Comments
Automatically detecting spy-nodes is a possibility too. Here, the anomaly is that they only listen to data from us, but never send us new transactions (or blocks). This is a bit more involved as is requires us to keep track and state of what we send to a peer and what they send us. |
Detecting the anomalies as described in https://arxiv.org/pdf/2108.00815v1 should be possible too. |
Indeed let's see if that metrics can be used to identify the anomaly. |
We should also monitor outbound connections. We expect to always have a minimum of 11 connections. If we have fewer for a longer timeframe or a large drop of outbound connections across multiple nodes at the same time, it's probably an anomaly. |
Indeed we can have alerts on that too! |
Having alerts on individual nodes as well as overall could be a better idea because then we'll know which nodes are experiencing anomalies, any thoughts on that? |
Yes, sounds good! |
I came across this blog post How to use Prometheus to efficiently detect anomalies at scale (based on this talk https://www.youtube.com/watch?v=BTAba-Vq3xE). This looks interesting and something I want to try out. They published prometheus recoding rules here: https://github.com/grafana/promql-anomaly-detection |
The current Grafana dashboards show a the raw numbers from Prometheus (via the
metrics
) tool. Anomaly detection and alerting is not yet implemented.For example:
Here, an anomaly could be a sudden drop in inbound peers connected to one or more peers as in https://b10c.me/observations/05-inbound-connection-flooder-down/. To detect this, a Z-score could be used. If the z-score is above a certain threshold, send an alert.
Here, a spike in outbound and (inbound too) address messages across all nodes could indicate an anomaly. Here a Z-score could be used. Maybe there are other possible ways to explore which can be used to detect anomalies.
This issue can be used for discussion and brainstorming.
The text was updated successfully, but these errors were encountered: