Skip to content

Commit

Permalink
Merge pull request #282 from oldgiova/MC-7302-podmonitor
Browse files Browse the repository at this point in the history
PodMonitor for monitoring traefik and probesOverrides
  • Loading branch information
oldgiova authored May 20, 2024
2 parents 17c512d + 3fcdc20 commit 2b3d085
Show file tree
Hide file tree
Showing 18 changed files with 251 additions and 8 deletions.
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,9 @@ The following table lists the global, default, and other parameters supported by
| `default.pdb.minAvailable` | PodDistruptionBudget minAvailable | `1` |
| `default.imagePullSecrets` | Optional list of existing Image Pull Secrets in the format of `- name: my-custom-secret` | `[]` |
| `default.updateStrategy` | The strategy to use to update existing pods | `rollingUpdate = { maxSurge = 1, maxUnavailable = 1 }` |
| `default.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `default.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `default.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |
| `ingress.enabled` | Optional Mender Ingress | `false` |
| `dbmigration.enable` | Helm Chart hook that trigger a DB Migration utility just before an Helm Chart install or upgrade | `true` |
| `device_license_count.enabled` | Device license count feature - enterprise only | `false` |
Expand Down Expand Up @@ -340,6 +343,11 @@ The following table lists the parameters for the `api-gateway` component and the
| `api_gateway.certs.existingSecret` | Preexisting secret containing the Cert (key `cert.crt`) and the Key (key `private.key`) | `nil` |
| `api_gateway.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `api_gateway.accesslogs` | Traefik Access Logs, enabled by default | `true` |
| `api_gateway.podMonitor.enabled` | If enabled, creates a PodMonitor resource for scraping Traefik metrics | `false` |
| `api_gateway.podMonitor.customLabels` | PodMonitor custom labels | `nil` |
| `api_gateway.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `api_gateway.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `api_gateway.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `3` |

### Parameters: deployments

Expand Down Expand Up @@ -394,6 +402,9 @@ The following table lists the parameters for the `deployments` component and the
| `deployments.migrationRestartPolicy` | Migration job: restartPolicy option | `Never` |
| `deployments.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `deployments.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `deployments.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `deployments.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `deployments.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: device-auth

Expand Down Expand Up @@ -454,6 +465,9 @@ The following table lists the parameters for the `device-auth` component and the
| `device_auth.migrationRestartPolicy` | Migration job: restartPolicy option | `Never` |
| `device_auth.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `device_auth.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `device_auth.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `device_auth.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `device_auth.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: gui

Expand Down Expand Up @@ -492,6 +506,9 @@ The following table lists the parameters for the `gui` component and their defau
| `gui.containerSecurityContext.runAsUser` | User ID for the container | `65534` |
| `gui.priorityClassName` | Optional pre-existing priorityClassName to be assigned to the resource | `nil` |
| `gui.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `gui.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `gui.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `gui.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: inventory

Expand Down Expand Up @@ -543,6 +560,9 @@ The following table lists the parameters for the `inventory` component and their
| `inventory.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `inventory.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `inventory.mongodbExistingSecret` | Use a different MongoDB secret for this service | `nil` |
| `inventory.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `inventory.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `inventory.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: reporting

Expand Down Expand Up @@ -626,6 +646,9 @@ The following table lists the parameters for the `tenantadm` component and their
| `tenantadm.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `tenantadm.migrationArgs` | Migration job: optional migration args (list). | `["migrate"]` |
| `tenantadm.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `tenantadm.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `tenantadm.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `tenantadm.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

The default value for the rate limits are:

Expand Down Expand Up @@ -696,6 +719,9 @@ The following table lists the parameters for the `useradm` component and their d
| `useradm.migrationRestartPolicy` | Migration job: restartPolicy option | `Never` |
| `useradm.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `useradm.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `useradm.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `useradm.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `useradm.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: workflows

Expand Down Expand Up @@ -736,6 +762,9 @@ The following table lists the parameters for the `workflows-server` component an
| `workflows.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `workflows.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `workflows.mountSecrets` | Optional `volumeMounts` and `volumes` to inject a credential files in the workflows service | `nil` |
| `workflows.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `workflows.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `workflows.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: create_artifact_worker

Expand Down Expand Up @@ -812,6 +841,9 @@ The following table lists the parameters for the `auditlogs` component and their
| `auditlogs.migrationRestartPolicy` | Migration job: restartPolicy option | `Never` |
| `auditlogs.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `auditlogs.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `auditlogs.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `auditlogs.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `auditlogs.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: iot-manager

Expand Down Expand Up @@ -858,6 +890,9 @@ The following table lists the parameters for the `iot-manager` component and the
| `iot_manager.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `iot_manager.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `iot_manager.aesEncryptionKey.existingSecret` | Optional secret containing the AES encryption key. The secret key must be `AES_ENCRYPTION_KEY` | `nil` |
| `iot_manager.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `iot_manager.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `iot_manager.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: deviceconnect

Expand Down Expand Up @@ -905,6 +940,9 @@ The following table lists the parameters for the `deviceconnect` component and t
| `deviceconnect.migrationRestartPolicy` | Migration job: restartPolicy option | `Never` |
| `deviceconnect.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `deviceconnect.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `deviceconnect.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `deviceconnect.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `deviceconnect.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: deviceconfig

Expand Down Expand Up @@ -950,6 +988,9 @@ The following table lists the parameters for the `deviceconfig` component and th
| `deviceconfig.migrationRestartPolicy` | Migration job: restartPolicy option | `Never` |
| `deviceconfig.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `deviceconfig.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `deviceconfig.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `deviceconfig.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `deviceconfig.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: devicemonitor

Expand Down Expand Up @@ -997,6 +1038,9 @@ The following table lists the parameters for the `devicemonitor` component and t
| `devicemonitor.migrationRestartPolicy` | Migration job: restartPolicy option | `Never` |
| `devicemonitor.migrationResources` | Migration job: optional K8s resources. If not specified, uses the deployment resources | `nil` |
| `devicemonitor.updateStrategy` | The strategy to use to update existing pods | `nil` |
| `devicemonitor.probesOverrides.successThreshold` | Override the `successThreshold` for every Readiness and Liveness probes. | `nil` |
| `devicemonitor.probesOverrides.timeoutSeconds` | Override the `timeoutSeconds` for every Readiness and Liveness probes. | `nil` |
| `devicemonitor.probesOverrides.failureThreshold` | Override the `failureThreshold` for every Readiness and Liveness probes. | `nil` |

### Parameters: generate_delta_worker
Please notice that this feature is still under active development and it is
Expand Down
5 changes: 5 additions & 0 deletions mender/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@
* Move from megabytes to mebibytes for consistency.
* Added `inventory.mongodbExistingSecret` to override the default MongoDB secret.
* Not using `HAVE_ENTERPRISE` when in hosted mode.
* Added `podMonitor` resource for monitoring the `api-gateway` service (Traefik metrics).
* Allow overriding fullname (thanks @ignatiusreza)
* Removed unused `mender.name` function.
* Added `probesOverrides` to override the default timeout for readiness and liveness probes.
* Fix naming problem in templates using api_gateway and NodePort (thanks @j-rivero)

## Version 5.6.2
* Upgrade to Mender version `3.7.4`.
Expand Down
17 changes: 11 additions & 6 deletions mender/templates/_helpers.tpl
Original file line number Diff line number Diff line change
@@ -1,10 +1,4 @@
{{/* vim: set filetype=mustache: */}}
{{/*
Expand the name of the chart.
*/}}
{{- define "mender.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a default fully qualified app name.
Expand Down Expand Up @@ -330,3 +324,14 @@ Storage Proxy Rule
{{- define "mender.storageProxyRule" -}}
{{- default "HostRegexp(`{domain:^artifacts.*$}`)" .Values.api_gateway.storage_proxy.customRule | quote }}
{{- end -}}

{{/*
Use custom probes overrides
*/}}
{{- define "mender.probesOverrides" -}}
{{- $_ := dict }}
{{- $_ := (mergeOverwrite $_ .default .override) }}
{{- if $_ }}
{{- toYaml $_ }}
{{- end }}
{{- end }}
16 changes: 14 additions & 2 deletions mender/templates/api-gateway/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ spec:
- --providers.file.filename=/etc/traefik/config/traefik.yaml
- --ping=true
- --ping.manualrouting=true
{{- if .Values.api_gateway.podMonitor.enabled }}
- --entryPoints.metrics.address=:9090
{{- end }}
{{- if .Values.api_gateway.extraArgs }}
{{- .Values.api_gateway.extraArgs | toYaml | nindent 12 }}
{{- end }}
Expand All @@ -99,24 +102,33 @@ spec:
- containerPort: {{ .Values.api_gateway.httpsPort }}
{{- end }}
- containerPort: {{ .Values.api_gateway.httpPort }}
{{- if .Values.api_gateway.podMonitor.enabled }}
- containerPort: 9090
name: prom-metrics
protocol: TCP
{{- end }}

# Readiness/liveness/startup probes
livenessProbe:
tcpSocket:
failureThreshold: 3
httpGet:
path: /healthz
port: {{ .Values.api_gateway.httpPort }}
initialDelaySeconds: 5
periodSeconds: 5
{{- with include "mender.probesOverrides" (dict "default" .Values.default.probesOverrides "override" .Values.api_gateway.probesOverrides ) }}
{{- nindent 10 . }}
{{- end }}
readinessProbe:
tcpSocket:
failureThreshold: 1
httpGet:
path: /healthz
port: {{ .Values.api_gateway.httpPort }}
periodSeconds: 15
initialDelaySeconds: 5
{{- with include "mender.probesOverrides" (dict "default" .Values.default.probesOverrides "override" .Values.api_gateway.probesOverrides ) }}
{{- nindent 10 . }}
{{- end }}
startupProbe:
failureThreshold: 30
httpGet:
Expand Down
15 changes: 15 additions & 0 deletions mender/templates/api-gateway/podMonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{{- if .Values.api_gateway.podMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: {{ include "mender.fullname" . }}-api-gateway
labels:
{{- include "mender.labels" . | nindent 4 }}
{{- toYaml .Values.api_gateway.podMonitor.customLabels | nindent 4 }}
spec:
selector:
matchLabels:
app.kubernetes.io/name: {{ include "mender.fullname" . }}-api-gateway
podMetricsEndpoints:
- port: prom-metrics
{{- end }}
6 changes: 6 additions & 0 deletions mender/templates/auditlogs/_podtemplate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,17 @@ spec:
path: /api/internal/v1/auditlogs/health
port: 8080
periodSeconds: 15
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.auditlogs.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
livenessProbe:
httpGet:
path: /api/internal/v1/auditlogs/alive
port: 8080
periodSeconds: 5
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.auditlogs.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
startupProbe:
httpGet:
path: /api/internal/v1/auditlogs/alive
Expand Down
6 changes: 6 additions & 0 deletions mender/templates/deployments/_podtemplate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,17 @@ spec:
path: /api/internal/v1/deployments/health
port: 8080
periodSeconds: 15
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.deployments.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
livenessProbe:
httpGet:
path: /api/internal/v1/deployments/alive
port: 8080
periodSeconds: 5
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.deployments.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
startupProbe:
httpGet:
path: /api/internal/v1/deployments/alive
Expand Down
6 changes: 6 additions & 0 deletions mender/templates/device-auth/_podtemplate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,17 @@ spec:
path: /api/internal/v1/devauth/health
port: 8080
periodSeconds: 15
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.device_auth.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
livenessProbe:
httpGet:
path: /api/internal/v1/devauth/alive
port: 8080
periodSeconds: 5
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.device_auth.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
startupProbe:
httpGet:
path: /api/internal/v1/devauth/alive
Expand Down
6 changes: 6 additions & 0 deletions mender/templates/deviceconfig/_podtemplate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,17 @@ spec:
path: /api/internal/v1/deviceconfig/health
port: 8080
periodSeconds: 15
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.deviceconfig.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
livenessProbe:
httpGet:
path: /api/internal/v1/deviceconfig/alive
port: 8080
periodSeconds: 5
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.deviceconfig.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
startupProbe:
httpGet:
path: /api/internal/v1/deviceconfig/alive
Expand Down
6 changes: 6 additions & 0 deletions mender/templates/deviceconnect/_podtemplate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,17 @@ spec:
path: /api/internal/v1/deviceconnect/health
port: 8080
periodSeconds: 15
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.deviceconnect.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
livenessProbe:
httpGet:
path: /api/internal/v1/deviceconnect/alive
port: 8080
periodSeconds: 5
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.deviceconnect.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
startupProbe:
httpGet:
path: /api/internal/v1/deviceconnect/alive
Expand Down
6 changes: 6 additions & 0 deletions mender/templates/devicemonitor/_podtemplate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,17 @@ spec:
path: /api/internal/v1/devicemonitor/health
port: 8080
periodSeconds: 15
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.devicemonitor.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
livenessProbe:
httpGet:
path: /api/internal/v1/devicemonitor/alive
port: 8080
periodSeconds: 5
{{- with include "mender.probesOverrides" (dict "default" .dot.Values.default.probesOverrides "override" .dot.Values.devicemonitor.probesOverrides ) }}
{{- nindent 6 . }}
{{- end }}
startupProbe:
httpGet:
path: /api/internal/v1/devicemonitor/alive
Expand Down
Loading

0 comments on commit 2b3d085

Please sign in to comment.