-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add monquorum alert #52
Conversation
count(ceph_mon_quorum_status == 1) <= ((count(ceph_mon_metadata) %s 2) + 1) | ||
||| % '%', | ||
count(ceph_mon_quorum_status{%s} == 1) <= ((count(ceph_mon_metadata{%s}) %s 2) + 1) | ||
||| % [$._config.cephExporterSelector, $._config.cephExporterSelector, '%'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change made assuming that there will be multiple exporter jobs working at the same time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it makes sense, as if you have 2 clusters monitored with same Prometheus the current version wouldn't work
@anmolsachan fixed. PTAL 🙂 |
@@ -7,8 +7,8 @@ | |||
{ | |||
alert: 'CephMonQuorumAtRisk', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another similar already present at https://github.com/ceph/ceph-mixins/blob/master/alerts/monquorum.libsonnet
How its different from that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shtripat it's the same alert I didn't change it, except for adding a exporter selector {%s}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
looks good to me. |
@devopsjonas PR looks good to me now. Please attach the generated manifest. You can refer to merged PR's for refenrence. |
Generated manifest:
|
@umangachapagain I've run:
I believe |
@devopsjonas You can run |
2ebc784
to
30c213c
Compare
Done, PTAL :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@devopsjonas we have got unit test mechanism recently added to the project. I would expect you to add unit tests for the alerting rules as part of these PRs. Otherwise both the PRs looks good to be merged. |
@shtripat sure will do 👍 |
Merging this now. Add another PR for unit tests. |
Adds
CephMonHighNumberOfLeaderChanges
Fixes
CephMonQuorumAtRisk
to add label selector. In the case where you have multiple clusters monitored by same Prometheus instance.Example output: