Skip to content

Commit

Permalink
Merge pull request #120 from postfinance/feat/neighbour_limit
Browse files Browse the repository at this point in the history
Feat/neighbour limit
  • Loading branch information
clementnuss authored Mar 15, 2024
2 parents 4fa4b9f + cdcc063 commit d1fd0a8
Show file tree
Hide file tree
Showing 16 changed files with 246 additions and 48 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci-helm-deploy-nginx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.21'
go-version: '1.22'
- name: GoReleaser
uses: goreleaser/goreleaser-action@v4
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci-helm-deploy-traefik.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.21'
go-version: '1.22'
- name: GoReleaser
uses: goreleaser/goreleaser-action@v4
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci-kustomize-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.21'
go-version: '1.22'
- name: GoReleaser
uses: goreleaser/goreleaser-action@v4
with:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
steps:
- uses: actions/setup-go@v5
with:
go-version: '1.21'
go-version: '1.22'
- uses: actions/checkout@v4
- uses: golangci/golangci-lint-action@v4
with:
Expand All @@ -29,7 +29,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.21'
go-version: '1.22'
- name: Run unit tests
run: go test -race -covermode atomic -coverprofile=profile.cov ./...
- name: Send coverage report
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.21'
go-version: '1.22'
- name: Login to DockerHub
uses: docker/login-action@v3
with:
Expand Down
52 changes: 49 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@
![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/postfinance/kubenurse)

# Kubenurse

kubenurse is a little service that monitors all network connections in a Kubernetes
cluster. Kubenurse measures request durations, records errors and exports those metrics in Prometheus format.

## Deployment

You can get the Docker image from [Docker Hub](https://hub.docker.com/r/postfinance/kubenurse/).
The [examples](https://github.com/postfinance/kubenurse/tree/master/examples) directory
contains manifests which can be used to deploy kubenurse to the kube-system namespace of your cluster.
Expand Down Expand Up @@ -45,6 +47,7 @@ The following command can be used to install kubenurse with Helm: `helm upgrade
| insecure | Set `KUBENURSE_INSECURE` environment variable | `true` |
| allow_unschedulable | Sets `KUBENURSE_ALLOW_UNSCHEDULABLE` environment variable | `false` |
| neighbour_filter | Sets `KUBENURSE_NEIGHBOUR_FILTER` environment variable | `app.kubernetes.io/name=kubenurse` |
| neighbour_limit | Sets `KUBENURSE_NEIGHBOUR_LIMIT` environment variable | `10` |
| extra_ca | Sets `KUBENURSE_EXTRA_CA` environment variable | |
| check_api_server_direct | Sets `KUBENURSE_CHECK_API_SERVER_DIRECT` environment variable | `true` |
| check_api_server_dns | Sets `KUBENURSE_CHECK_API_SERVER_DNS` environment variable | `true` |
Expand Down Expand Up @@ -74,7 +77,6 @@ dashboards [as this example](./doc/grafana-kubenurse.json) that show network lat
![Grafana ingress view](doc/grafana_ingress.png "Grafana ingress view")
![Grafana path view](doc/grafana_path.png "Grafana path view")
## Configuration
kubenurse is configured with environment variables:
Expand All @@ -85,12 +87,13 @@ kubenurse is configured with environment variables:
- `KUBENURSE_EXTRA_CA`: Additional CA cert path for TLS connections
- `KUBENURSE_NAMESPACE`: Namespace in which to look for the neighbour kubenurses
- `KUBENURSE_NEIGHBOUR_FILTER`: A Kubernetes label selector (eg. `app=kubenurse`) to filter neighbour kubenurses
- `KUBENURSE_NEIGHBOUR_LIMIT`: The maximum number of neighbours each kubenurse will query
- `KUBENURSE_ALLOW_UNSCHEDULABLE`: If this is `"true"`, path checks to neighbouring kubenurses are made even if they are running on unschedulable nodes.
- `KUBENURSE_CHECK_API_SERVER_DIRECT`: If this is `"true"` kubenurse will perform the check [API Server Direct](#API Server Direct). default is "true"
- `KUBENURSE_CHECK_API_SERVER_DNS`: If this is `"true"`, kubenurse will perform the check [API Server DNS](#API Server DNS). default is "true"
- `KUBENURSE_CHECK_ME_INGRESS`: If this is `"true"`, kubenurse will perform the check [Me Ingress](#Me Ingress). default is "true"
- `KUBENURSE_CHECK_ME_SERVICE`: If this is `"true"`, kubenurse will perform the check [Me Service](#Me Service). default is "true"
- `KUBENURSE_CHECK_NEIGHBOURHOOD`: If this is `"true"`, kubenurse will perform the check [Neighbourhood](#Neighbourhood). default is "true"
- `KUBENURSE_CHECK_NEIGHBOURHOOD`: If this is `"true"`, kubenurse will perform the check [Neighbourhood](#neighbourhood). default is "true"
- `KUBENURSE_CHECK_INTERVAL`: the frequency to perform kubenurse checks. the string should be formatted for [time.ParseDuration](https://pkg.go.dev/time#ParseDuration). defaults to `5s`
- `KUBENURSE_REUSE_CONNECTIONS`: whether to reuse connections or not for all checks. default is "false"
- `KUBENURSE_HISTOGRAM_BUCKETS`: optional comma-separated list of float64, used in place of the [default prometheus histogram buckets](https://pkg.go.dev/github.com/prometheus/[email protected]/prometheus#DefBuckets)
Expand Down Expand Up @@ -152,8 +155,8 @@ The `/alive` endpoint returns a JSON like this with status code 200 if everythin
}
```


## Health Checks

Every five seconds and on every access of `/alive`, the checks described below are run.
Check results are cached for 3 seconds in order to prevent excessive network traffic.

Expand All @@ -162,19 +165,22 @@ A little illustration of what communication occurs, is here:
![Communication](doc/Communication.png "Communication")

### API Server Direct

Checks the `/version` endpoint of the Kubernetes API Server through
the direct link (`KUBERNETES_SERVICE_HOST`, `KUBERNETES_SERVICE_PORT`).

Metric type: `api_server_direct`

### API Server DNS

Checks the `/version` endpoint of the Kubernetes API Server through
the Cluster DNS URL `https://kubernetes.default.svc:$KUBERNETES_SERVICE_PORT`.
This also verifies a working `kube-dns` deployment.

Metric type: `api_server_dns`

### Me Ingress

Checks if the kubenurse is reachable at the `/alwayshappy` endpoint behind the ingress.
This address is provided by the environment variable `KUBENURSE_INGRESS_URL` that
could look like `https://kubenurse.example.com`.
Expand All @@ -183,6 +189,7 @@ This also verifies a correct upstream DNS resolution.
Metric type: `me_ingress`

### Me Service

Checks if the kubenurse is reachable at the `/alwayshappy` endpoint through the Kubernetes service.
The address is provided by the environment variable `KUBENURSE_SERVICE_URL` that
could look like `http://kubenurse.mynamespace.default.svc:8080`.
Expand All @@ -191,6 +198,7 @@ This also verifies a working `kube-proxy` setup.
Metric type: `me_service`

### Neighbourhood

Checks if every neighbour kubenurse is reachable at the `/alwayshappy` endpoint.
Neighbours are discovered by querying the kube-apiserver for every Pod in the
`KUBENURSE_NAMESPACE` with label `KUBENURSE_NEIGHBOUR_FILTER`.
Expand All @@ -201,7 +209,44 @@ this can be changed by setting `KUBENURSE_ALLOW_UNSCHEDULABLE="true"`.

Metric type: `path_$KUBELET_HOSTNAME`

#### Neighbourhood filtering

The number of checks for the neighbourhood used to grow as $O(N^2)$, which
rendered `kubenurse` impractical on large clusters, as documented in issue
[#55](https://github.com/postfinance/kubenurse/issues/55).
To combat this, a node filtering feature was implemented, which works as follows

- kubenurse computes the `sha256` checksums for all neighbours' node names
- it sorts those checksums (this is actually implemented with a max-heap)
- it computes its own node name checksum, and queries the next 10 (per default)
nodes in the sorted checksums list

Thanks to this, every node is making queries to the same 10 nodes, unless one
of those nodes disappears, in which case kubenurse will pick the next node in
the sorted checksums list. This comes with several advantages:

- because of the way we first hash the node names, the checks distribution is
randomly distributed, independant of the node names. if we only picked the 10
next nodes in a sorted list of the node names, then we might have biased the
results in environments where node names are sequential
- metrics-wise, a `kubenurse` pod should typically only have entries for ca. 10
other neighbouring nodes worth of checks, which greatly reduces the load on
your monitoring infrastructure
- because we use a deterministic algorithm to choose which nodes to query, the
metrics churn rate stays minimal. (that is, if we randomly picked 10 nodes
for every check, then in the end there would be one prometheus bucket for
every node on the cluster, which would put useless load on the monitoring
infrastructure)

Per default, the neighbourhood filtering is set to 10 nodes, which means that
on cluster with more than 10 nodes, each kubenurse will query 10 nodes, as
described above.

To bypass the node filtering feature, you simply need to set the
`KUBENURSE_NEIGHBOUR_LIMIT` environment variable to 0.

## Metrics

All performed checks expose metrics which can be used to monitor/alert:

- SDN network latencies and errors
Expand All @@ -214,5 +259,6 @@ All performed checks expose metrics which can be used to monitor/alert:
- External DNS resolution errors (ingress URL resolution)

At `/metrics` you will find these:

- `kubenurse_errors_total`: Kubenurse error counter partitioned by error type
- `kubenurse_request_duration`: a histogram for Kubenurse request duration partitioned by error type
4 changes: 1 addition & 3 deletions go.mod
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
module github.com/postfinance/kubenurse

go 1.21

toolchain go1.21.5
go 1.22

require (
github.com/prometheus/client_golang v1.19.0
Expand Down
2 changes: 2 additions & 0 deletions helm/kubenurse/templates/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ spec:
value: {{ .Release.Namespace }}
- name: KUBENURSE_NEIGHBOUR_FILTER
value: {{ .Values.neighbour_filter }}
- name: KUBENURSE_NEIGHBOUR_LIMIT
value: {{ .Values.neighbour_limit | quote }}
{{- if .Values.extra_ca }}
- name: KUBENURSE_EXTRA_CA
value: {{ .Values.extra_ca }}
Expand Down
2 changes: 2 additions & 0 deletions helm/kubenurse/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ service_url: ""
allow_unschedulable: false
# KUBENURSE_NEIGHBOUR_FILTER
neighbour_filter: app.kubernetes.io/name=kubenurse
# KUBENURSE_NEIGHBOUR_LIMIT
neighbour_limit: 10
# KUBENURSE_EXTRA_CA
extra_ca: ""
# KUBENURSE_CHECK_API_SERVER_DIRECT
Expand Down
6 changes: 3 additions & 3 deletions internal/kubenurse/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import (
)

func (s *Server) readyHandler() func(w http.ResponseWriter, r *http.Request) {
return func(w http.ResponseWriter, r *http.Request) {
return func(w http.ResponseWriter, _ *http.Request) {
s.mu.Lock()
defer s.mu.Unlock()

Expand All @@ -34,8 +34,8 @@ func (s *Server) aliveHandler() func(w http.ResponseWriter, r *http.Request) {
servicecheck.Result

// kubediscovery
NeighbourhoodState string `json:"neighbourhood_state"`
Neighbourhood []servicecheck.Neighbour `json:"neighbourhood"`
NeighbourhoodState string `json:"neighbourhood_state"`
Neighbourhood []*servicecheck.Neighbour `json:"neighbourhood"`
}

res := s.checker.LastCheckResult
Expand Down
14 changes: 12 additions & 2 deletions internal/kubenurse/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ type Server struct {
// * KUBERNETES_SERVICE_PORT
// * KUBENURSE_NAMESPACE
// * KUBENURSE_NEIGHBOUR_FILTER
// * KUBENURSE_NEIGHBOUR_LIMIT
// * KUBENURSE_SHUTDOWN_DURATION
// * KUBENURSE_CHECK_API_SERVER_DIRECT
// * KUBENURSE_CHECK_API_SERVER_DNS
Expand Down Expand Up @@ -126,21 +127,30 @@ func New(ctx context.Context, c client.Client) (*Server, error) { //nolint:funle
shutdownDuration := 5 * time.Second

if v, ok := os.LookupEnv("KUBENURSE_SHUTDOWN_DURATION"); ok {
var err error
shutdownDuration, err = time.ParseDuration(v)

if err != nil {
return nil, err
}
}

chk.ShutdownDuration = shutdownDuration
chk.KubenurseIngressURL = os.Getenv("KUBENURSE_INGRESS_URL")
chk.KubenurseServiceURL = os.Getenv("KUBENURSE_SERVICE_URL")
chk.KubernetesServiceHost = os.Getenv("KUBERNETES_SERVICE_HOST")
chk.KubernetesServicePort = os.Getenv("KUBERNETES_SERVICE_PORT")
chk.KubenurseNamespace = os.Getenv("KUBENURSE_NAMESPACE")
chk.NeighbourFilter = os.Getenv("KUBENURSE_NEIGHBOUR_FILTER")
chk.ShutdownDuration = shutdownDuration
neighLimit := os.Getenv("KUBENURSE_NEIGHBOUR_LIMIT")

if neighLimit != "" {
chk.NeighbourLimit, err = strconv.Atoi(neighLimit)
if err != nil {
return nil, err
}
} else {
chk.NeighbourLimit = 10
}

//nolint:goconst // No need to make "false" a constant in my opinion, readability is better like this.
chk.SkipCheckAPIServerDirect = os.Getenv("KUBENURSE_CHECK_API_SERVER_DIRECT") == "false"
Expand Down
30 changes: 14 additions & 16 deletions internal/servicecheck/httptrace.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package servicecheck

import (
"context"
"crypto/tls"
"log"
"net/http"
Expand Down Expand Up @@ -30,8 +31,7 @@ func withHttptrace(registry *prometheus.Registry, next http.RoundTripper, durati
Name: "httpclient_requests_total",
Help: "A counter for requests from the kubenurse http client.",
},
// []string{"code", "method", "type"}, // TODO
[]string{"code", "method"},
[]string{"code", "method", "type"},
)

httpclientReqDuration := prometheus.NewHistogramVec(
Expand All @@ -41,8 +41,7 @@ func withHttptrace(registry *prometheus.Registry, next http.RoundTripper, durati
Help: "A latency histogram of request latencies from the kubenurse http client.",
Buckets: durationHistogram,
},
// []string{"type"}, // TODO
[]string{},
[]string{"type"},
)

httpclientTraceReqDuration := prometheus.NewHistogramVec(
Expand All @@ -52,8 +51,7 @@ func withHttptrace(registry *prometheus.Registry, next http.RoundTripper, durati
Help: "Latency histogram for requests from the kubenurse http client. Time in seconds since the start of the http request.",
Buckets: durationHistogram,
},
[]string{"event"},
// []string{"event", "type"}, // TODO
[]string{"event", "type"},
)

registry.MustRegister(httpclientReqTotal, httpclientReqDuration, httpclientTraceReqDuration)
Expand All @@ -68,7 +66,7 @@ func withHttptrace(registry *prometheus.Registry, next http.RoundTripper, durati
return
}

httpclientTraceReqDuration.WithLabelValues(traceEventType).Observe(td) // TODO: add back kubenurseTypeKey
httpclientTraceReqDuration.WithLabelValues(traceEventType, kubenurseTypeLabel).Observe(td)
}

// Return a http.RoundTripper for tracing requests
Expand All @@ -78,10 +76,10 @@ func withHttptrace(registry *prometheus.Registry, next http.RoundTripper, durati

// Add tracing hooks
trace := &httptrace.ClientTrace{
GotConn: func(info httptrace.GotConnInfo) {
GotConn: func(_ httptrace.GotConnInfo) {
collectMetric("got_conn", start, r, nil)
},
DNSStart: func(info httptrace.DNSStartInfo) {
DNSStart: func(_ httptrace.DNSStartInfo) {
collectMetric("dns_start", start, r, nil)
},
DNSDone: func(info httptrace.DNSDoneInfo) {
Expand All @@ -96,7 +94,7 @@ func withHttptrace(registry *prometheus.Registry, next http.RoundTripper, durati
TLSHandshakeStart: func() {
collectMetric("tls_handshake_start", start, r, nil)
},
TLSHandshakeDone: func(_ tls.ConnectionState, err error) {
TLSHandshakeDone: func(_ tls.ConnectionState, _ error) {
collectMetric("tls_handshake_done", start, r, nil)
},
WroteRequest: func(info httptrace.WroteRequestInfo) {
Expand All @@ -110,14 +108,14 @@ func withHttptrace(registry *prometheus.Registry, next http.RoundTripper, durati
// Do request with tracing enabled
r = r.WithContext(httptrace.WithClientTrace(r.Context(), trace))

// // TODO: uncomment when issue #55 is solved (N^2 request will increase cardinality of path_ metrics too much otherwise)
// typeFromCtxFn := promhttp.WithLabelFromCtx("type", func(ctx context.Context) string {
// return ctx.Value(kubenurseTypeKey{}).(string)
// })
typeFromCtxFn := promhttp.WithLabelFromCtx("type", func(ctx context.Context) string {
return ctx.Value(kubenurseTypeKey{}).(string)
})

rt := next // variable pinning :) essential, to prevent always re-instrumenting the original variable
rt = promhttp.InstrumentRoundTripperCounter(httpclientReqTotal, rt)
rt = promhttp.InstrumentRoundTripperDuration(httpclientReqDuration, rt)
rt = promhttp.InstrumentRoundTripperCounter(httpclientReqTotal, rt, typeFromCtxFn)
rt = promhttp.InstrumentRoundTripperDuration(httpclientReqDuration, rt, typeFromCtxFn)

return rt.RoundTrip(r)
})
}
Loading

0 comments on commit d1fd0a8

Please sign in to comment.