Skip to content
This repository has been archived by the owner on May 22, 2023. It is now read-only.

Monitor third party node downtime #802

Open
pdyraga opened this issue Jun 1, 2021 · 0 comments
Open

Monitor third party node downtime #802

pdyraga opened this issue Jun 1, 2021 · 0 comments

Comments

@pdyraga
Copy link
Member

pdyraga commented Jun 1, 2021

Keep ECDSA client offers plenty of metrics and diagnostics allowing to monitor the health of the node. However, there is no obvious way to monitor the health of third-party nodes which could be important especially if the node is a member of n-of-n threshold keep with the node being offline. Having an easy way to determine which nodes are offline and what is the impact could help operators to alert each other before a signature is requested from a keep.

One option to achieve it is to start warning in logs if a node sees a peer drop from their list for more than N minutes while they still have an active stake/keeps. We could also limit the warnings to the nodes with which the node being operated has active keeps with.

Another option, not requiring any change in the client, could be a remote telemetry service. The node exposes diagnostics with the list of connected peers that together with the graph can be used to identify offline operators that still have active keeps. This option could be even further enhanced by modeling the network topology for operators who opt-in to the mechanism and submit their diagnostics periodically.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant