-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4 - Collect feedback on dashboard prototype #3
Comments
As no feedback was received in this round of collection, we are happy to do another round of updates after the community has had a chance to use the tool. |
@joesixpack apologies for the delayed response. Can you please provide additional context? I'm assuming this is a screenshot of an implementation you attempted? If so, what steps did you follow? |
@joesixpack, few metrics will be displayed from prometheus and few are from based on the network url which you have configured. |
For network URL I'm using "http://157.230.100.229:3000" which is your server. Oasis config.toml has: metrics: Port 3000 is not available to use as that is what Grafana uses for its web dashboard. Is the network url actually supposed to point to my own node's metric address? That is not stated in the docs. That makes some kind of sense and I tried that and port 3001 also, but the dashboard errors (Bad Gateway) didn't resolve. Regardless, I ran into that edge case bug twice already so since I can't upgrade to 20.12.3 yet (I did accidentally and it worked fine before reverting), I'll have to shut down the mission contol to prevent another crash. |
There's also what looks like missing and/or wrongly named datasources in some of the dashboards. |
Sorry for that issue @joesixpack. If you have any other network's URL, you can mention that or else you can keep same one which we have provided. I will update the dashboards of grafana to resolve Bad Gateway. |
I'm seeing this in the log: 2021/01/05 01:26:58 Error while unmarshelling the validator set data proto: wrong wireType = 0 for field Ed25519 |
@PrathyushaLakkireddy please take a look 👆 @joesixpack note we currently recommend ONLY running Oasis Mission Control with v20.12.3. This is due to a bug in the Oasis code that was fixed in v20.12.3. See details here. Note that the chances of the bug crashing the validator when running Mission Control are very low. We ran Chainflow's instance without a problem for a couple months, then the bug nailed us. It's for this reason we're suggesting to stay on the safe side and wait until you're running v20.12.3 on mainnet. |
Fixed. |
Could you upload the dashboards to Grafana and provide the #'s to import? |
Oasis Mission Control Call for Feedback
Chainflow and our development partner Vitwit have been awarded an Oasis grant to build the Oasis Mission Control Validator Monitoring and Alerting Dashboard. You can find more details about that here.
We are feeling excited to share this prototype with the community. Validators, we're building this for you.
Please review the work done so far and provide feedback. We'll use this feedback to update the prototype to provide a final and open-sourced version for their use.
For example -
1 - Is the dashboard missing any key metrics?
2 - Are there any additional alerts you'd like to see be made available?
3 - Is there anything we can do to organize the information in a more user-friendly way, e.g. reorganize existing dashboards and/or create new ones?
Please provide your feedback in the comments of this issue.
Here's a brief overview of the dashboards and current alerts.
Summary Dashboard
This view provides a quick-look at overall validator and system health.
Validator Monitoring Dashboard
This view provides a comprehensive look at validator details and performance, expanding on the summary dashboard. It will also includes proposal information, once Oasis implements a Governance module.
Note: The system displays the number of total peers. For those that choose to implement a sentry node configuration, we will implement a metric that shows the peer names as well.
This is useful to confirm a validator is connected to the peers an operator would expect their validator to be connected to. In this scenario, there will also be an alert configured that alerts a user if the number of peers drops below a specified number.
For example, if your validator is connected to two sentries, the system will alert you if the number of peers drops below two.
System Monitoring Dashboard
This view provides a comprehensive look at system performance metrics, expanding on the summary dashboard. Here you'll find all the system metrics you'd expect to see in a comprehensive system monitoring tool.
Alerting
So far, these alerts are configured -
This image shows some of those alerts in action.
The text was updated successfully, but these errors were encountered: