SRE Trends calculation backend

The goal of this program is to get the current status of different clusters in a human-readable manner.

How it works

Queries Prometheus to get all the hosts from the clusters defined at trends.go.
Gets ALL the data points for the hosts' CPU usage from Prometheus. (One data point per minute)
Goes through each data point to see if they cross over 70%, 90% and 99.5% CPU usage.
Assigns a score to each cluster based on how much did its hosts cross 70%, 90% and 99.5% CPU usage.
Calculates the current state of the cluster based on the score. The status can be good, average or bad.

Why use scoring to calculate the status of a cluster?

Basically, the more points a cluster gets, the worse is its state.

This system has some advantages, we can have a very flexible criteria to define when a cluster is good and when it's bad.

For example, if we believe that having a host constantly over 70% CPU is something very bad, we can just raise the number of points that a cluster gets when a host is over 70%. Or if we believe that a host reaching 70% CPU is not a big deal, then we can just lower the amount of points that gets added when a host goes over 70%.

Current scoring definition:

Green - Good: If the cluster has less than 10 points
Yellow - Average: If the cluster has less than 40 points
Red - Bad: If the cluster has 40 points or more

Usage

go build -o trendsbin && ./trendsbin Then, query localhost:8080/summary or localhost:8080/trends

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
common		common
prometheus		prometheus
server		server
trends		trends
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRE Trends calculation backend

How it works

Why use scoring to calculate the status of a cluster?

Usage

About

Releases

Packages

Languages

devopsext/trends-back

Folders and files

Latest commit

History

Repository files navigation

SRE Trends calculation backend

How it works

Why use scoring to calculate the status of a cluster?

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages