forked from joewilliams/deckard
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
126 lines (91 loc) · 3.84 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
deckard : http monitoring system
deckard is a http check monitoring system built on top of CouchDB.
license: apache 2
Features:
* Email and SMS based alerts (through email)
* Designated on-call sms email address
* Basic CouchDB replication latency alerts
* Basic content check alerts
* Content check alerts with EC2 elastic IP failover
* All checks are defined in CouchDB (CRUD checks with ReST)
* Alert priorities (log, email, SMS and notifo)
* Simple setup via cron
* Basic scheduling to silence alerts
* Adjustable delay before firing check content requests
* Basic Chef "tag" lookup support
* Alert stats database for trending and analysis
Usage:
$ deckard --all ./deckard.yml
You now have the option of running --all, --failover, --content, --replication if you only want to run a subset of checks
Setup:
* Setup and configure all appropriate databases and alert documents.
* Create a crontab entry
$ crontab -e
*/5 * * * * deckard --all /path/deckard.yml &> /dev/null
Example documents:
On Call document format:
{
"_id": "on_call_person",
"sms_email": "[email protected]"
"notifo_usernames" : ["jenny"]
}
For sms_email you will need to put in the phone number and sms to email host for your phone provider. Provide both an sms email and notifo username(s) and the sms will be only used for backup if something should go wrong with notifo. Saves you money on your text message bill! *You need the notifo application on your phone to use the notifo support. Note that notifo_usernames is an array of usernames so multiple people can get notifications.
Failover check document format:
{
"_id": "lb01",
"url": "http://somecheck.com/check.html",
"secondary_instance_id": "i-1234",
"priority": 2,
"region": "us-east-1",
"elastic_ip": "127.0.0.1",
"content": "sometext",
"failover": true,
"primary_instance_id": "i-4321"
}
This document needs all the details to cause an elastic ip switch in the case the content is not found on the url.
Replication check format:
{
"_id": "node01_node02",
"name": "test",
"master_url": "http://node01/db",
"slave_url": "http://node02/db",
"offset": 0,
"priority": 1,
"schedule": [
2,
3
]
}
This will test the doc counts between two databases and if they become out of sync by more or less than the thresholds specified in the config an alert is triggered.
HTTP content check format:
{
"_id": "deckard.com:5984/",
"url": "http://deckard.com:5984/",
"content": "couchdb",
"priority": 2
}
For all of these priority and schedule are optional fields in these documents, priority is 0, 1 and 2. 0 is log only, 1 is log and email and 2 is log, email and sms. The schedule is an array containing integers for the hours the alert should be silent. Check out the replication check definition above.
For Chef "tag" support you need to install a view in your chef database and configure the url to it in your Deckard config file.
function(doc) {
if (doc.chef_type != 'node') return;
emit(doc.automatic.ec2 ? doc.automatic.ec2.public_hostname : doc.automatic.fqdn, doc.normal.tags[0]);
}
Regarding alert stats, just create a stats database and puts it's name in the config file. When alerts happen you should begin to see documents get added. By itself it isn't all that helpful, analysis is the key. Here is a basic "counts" design document to get you started, it will give you stats about the different kinds of alerts you are having.
{
"_id": "_design/counts",
"language": "javascript",
"views": {
"by_type": {
"map": "function(doc) {\n emit(doc.type, 1);\n}",
"reduce": "_count"
},
"by_url": {
"map": "function(doc) {\n emit(doc.url, 1);\n}",
"reduce": "_count"
},
"by_error": {
"map": "function(doc) {\n emit(doc.error, 1);\n}",
"reduce": "_count"
}
}
}