Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomaly detection and alerting for interesting Bitcoin P2P metrics #13

Open
0xB10C opened this issue Mar 15, 2024 · 8 comments
Open

Anomaly detection and alerting for interesting Bitcoin P2P metrics #13

0xB10C opened this issue Mar 15, 2024 · 8 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@0xB10C
Copy link
Owner

0xB10C commented Mar 15, 2024

The current Grafana dashboards show a the raw numbers from Prometheus (via the metrics) tool. Anomaly detection and alerting is not yet implemented.

For example:
image

Here, an anomaly could be a sudden drop in inbound peers connected to one or more peers as in https://b10c.me/observations/05-inbound-connection-flooder-down/. To detect this, a Z-score could be used. If the z-score is above a certain threshold, send an alert.

image

Here, a spike in outbound and (inbound too) address messages across all nodes could indicate an anomaly. Here a Z-score could be used. Maybe there are other possible ways to explore which can be used to detect anomalies.

This issue can be used for discussion and brainstorming.

@0xB10C 0xB10C added help wanted Extra attention is needed good first issue Good for newcomers labels Mar 15, 2024
@0xB10C
Copy link
Owner Author

0xB10C commented Mar 15, 2024

Automatically detecting spy-nodes is a possibility too. Here, the anomaly is that they only listen to data from us, but never send us new transactions (or blocks). This is a bit more involved as is requires us to keep track and state of what we send to a peer and what they send us.

@0xB10C
Copy link
Owner Author

0xB10C commented May 22, 2024

Detecting the anomalies as described in https://arxiv.org/pdf/2108.00815v1 should be possible too.

@i-am-yuvi
Copy link
Collaborator

Indeed let's see if that metrics can be used to identify the anomaly.

@0xB10C
Copy link
Owner Author

0xB10C commented May 28, 2024

We should also monitor outbound connections. We expect to always have a minimum of 11 connections. If we have fewer for a longer timeframe or a large drop of outbound connections across multiple nodes at the same time, it's probably an anomaly.

@i-am-yuvi
Copy link
Collaborator

Indeed we can have alerts on that too!

@i-am-yuvi
Copy link
Collaborator

We should also monitor outbound connections. We expect to always have a minimum of 11 connections. If we have fewer for a longer timeframe or a large drop of outbound connections across multiple nodes at the same time, it's probably an anomaly.

Having alerts on individual nodes as well as overall could be a better idea because then we'll know which nodes are experiencing anomalies, any thoughts on that?

@0xB10C
Copy link
Owner Author

0xB10C commented Jun 17, 2024

We should also monitor outbound connections. We expect to always have a minimum of 11 connections. If we have fewer for a longer timeframe or a large drop of outbound connections across multiple nodes at the same time, it's probably an anomaly.

Having alerts on individual nodes as well as overall could be a better idea because then we'll know which nodes are experiencing anomalies, any thoughts on that?

Yes, sounds good!

@0xB10C
Copy link
Owner Author

0xB10C commented Oct 10, 2024

I came across this blog post How to use Prometheus to efficiently detect anomalies at scale (based on this talk https://www.youtube.com/watch?v=BTAba-Vq3xE). This looks interesting and something I want to try out.

They published prometheus recoding rules here: https://github.com/grafana/promql-anomaly-detection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants