Refactor severity to use monitor thresholds #17

lngarrett · 2017-05-19T22:02:05Z

On our team we wanted a way to centralize the management of both our monitors, but also the conditional notifications in the monitors' messages. This PR ties a team's severity notifications to the alert threshold. Instead of setting a monitor to have a certain severity, the message will have conditional blocks containing the appropriate notification channels based on the threshold of the alert. I also put all notifications into a is_recovery block so that alerts auto resolve as expected.

Nothing new is required in the config, and all fields are optional.

Example message:

{#is_warning}
 @slack-cloud-operations
 @slack-product-support
{/is_warning}
{#is_alert}
 @slack-cloud-operations
 @pagerduty-CloudOperations
 @slack-product-support
 @pagerduty-ProductSupport
{/is_alert}
{#is_recovery}
 @slack-cloud-operations
 @pagerduty-CloudOperations
 @slack-product-support
 @pagerduty-ProductSupport
{/is_recovery}

List of strings

lngarrett · 2017-05-19T22:21:18Z

After some experimentation, I think monitors would still benefit from being marked as critical or info. While my changes add functionality to centralize threshold logic, some monitors are simply not important enough to warrant paging an on-call engineer at any threshold. So, I think the full solution should involve both the original severity tagging along with my functionality. I'm going to gauge interest in these changes before adding that however.

I'm envisioning the teams config would look like this:

teams:
  eng:
    notifications:
      critical:
        alert:
        - '@hipchat-Engineering'
        - '@victorops-eng'
        warning:
        - '@hipchat-Engineering'
      info:
        alert:
        - '@hipchat-Engineering'

The idea here is that on a critical alert we would first alert the team chat channel so that during business hours engineers would see the issue. Then, if the monitor goes critical the engineer on call would be paged. However, for monitors tagged info we have decided to not do anything with warnings and only send a nonintrusive chat message when the monitor alerts.

astropuffin · 2018-04-24T02:53:59Z

I'm also looking to get this for my team. Is there anything missing in order to merge this? It doesn't seem entirely backward compatible, but works MUCH better for our workflow.

Logan Garrett and others added 2 commits May 19, 2017 17:52

Refactor severity to use monitor thresholds

f01f2dc

Update config-sample.yaml

d7bb47b

List of strings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor severity to use monitor thresholds #17

Refactor severity to use monitor thresholds #17

lngarrett commented May 19, 2017 •

edited

Loading

lngarrett commented May 19, 2017 •

edited

Loading

astropuffin commented Apr 24, 2018

Refactor severity to use monitor thresholds #17

Are you sure you want to change the base?

Refactor severity to use monitor thresholds #17

Conversation

lngarrett commented May 19, 2017 • edited Loading

lngarrett commented May 19, 2017 • edited Loading

astropuffin commented Apr 24, 2018

lngarrett commented May 19, 2017 •

edited

Loading

lngarrett commented May 19, 2017 •

edited

Loading