Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert db init/upgrade job: pre-install/pre-upgrade #142

Merged
merged 1 commit into from
Dec 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions charts/zabbix/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -297,19 +297,17 @@ would like to use TimescaleDB instead, check the comments in the ``values.yaml``

# Support of native Zabbix Server High Availability

Since version 6.0, Zabbix has his own implementation of [High Availability](https://www.zabbix.com/documentation/current/en/manual/concepts/server/ha), which is a simple approach to realize a Hot-Standby high availability setup with Zabbix Server. This feature applies only to Zabbix Server component, not Zabbix Proxy, Webdriver, Web Frontend or such. In a Zabbix monitoring environment, by design, there can only be one central active Zabbix Server taking over the responsibility of storing data into database, calculating triggers, sending alerts, evt. The native High Availability concept does not change that, it just implements a way to have additional Zabbix Server processes being "standby" and "jumping in" as soon as the active one does not report it's availability (updating a table in the database), anymore. As such, the Zabbix Server High Availability works well together (and somewhat requires, to be an entirely high available setup), an also high available database setup. High availability of Postgres Database is not covered by this Helm Chart, but can rather easily be achieved by using one of the well-known Postgresql database operators [PGO](https://github.com/CrunchyData/postgres-operator) and [CNPG](https://cloudnative-pg.io), which are supported to be used with this Helm Chart.
Since version 6.0, Zabbix has its own implementation of [High Availability](https://www.zabbix.com/documentation/current/en/manual/concepts/server/ha), which is a simple approach to realize a Hot-Standby high availability setup with Zabbix Server. This feature applies only to the Zabbix Server component, not Zabbix Proxy, Webdriver, Web Frontend or such. In a Zabbix monitoring environment, by design, there can only be one central active Zabbix Server taking over the responsibility of storing data into database, calculating triggers, sending alerts, evt. The native High Availability concept does not change that, it just implements a way to have additional Zabbix Server processes being "standby" and "jumping in" as soon as the active one does not report it's availability (updating a table in the database), anymore. As such, the Zabbix Server High Availability works well together (and somewhat requires, to be an entirely high available setup), an also high available database setup. High availability of Postgres Database is not covered by this Helm Chart, but can rather easily be achieved by using one of the well-known Postgresql database operators [PGO](https://github.com/CrunchyData/postgres-operator) and [CNPG](https://cloudnative-pg.io), which are supported to be used with this Helm Chart.

For the HA feature, which has not been designed for usage in Kubernetes, to work in K8S, there have been some challenges to overcome, primarily the fact that Zabbix Server doesn't allow to upgrade or to initialize database schema when running in HA mode enabled. Intention by Zabbix is to turn HA mode off, issue Major Release Upgrade, turn HA mode back on. This doesn't conclude with Kubernetes concepts. Beside of that, some additional circumstances led us to an implementation as follows:

* added a portion in values.yaml generally switching "Zabbix Server HA" on or off. If turned off, the Zabbix Server deployment will always be started with 1 replica and without the ZBX_HANODENAME env variable. This is an easy-to-use setup with no additional job pods, but it's not possible to just scale up zabbix server pods from here
* when .Values.zabbixServer.zabbixServerHA.enabled is set to true, a Kubernetes Job, marked as Helm post-install,post-upgrade hook, is being deployed together with a Role, Rolebinding and ServiceAccount, allowing this job pod to execute some changes via Kubernetes API. The job runs after each installation and upgrade process, scales down zabbix server pods if needed, manages db entries for active HA and non-HA server nodes being connected to the database, etc. Additionally, this job figures out whether a migration from a non-HA enabled setup to a HA-enabled one has been done, and handles necessary actions (scale down pods, delete entries in database) accordingly
* the sidecar containers running together with the Zabbix Server pods have been updated not only to prevent starting Zabbix Server pods when database is not available, but also when the schema version of the database is not yet the correct one, adding an additional layer of preventing pods from crashing
* when .Values.zabbixServer.zabbixServerHA.enabled is set to true, a Kubernetes Job, marked as Helm pre-install,pre-upgrade hook, is being deployed used to prepare the database and the database's schema version (by "schema", we refer to the tables, their structure, etc.) prior to any Zabbix Server pods trying to access the database. This job also handles major release upgrades. In case the job is being started in a `helm upgrade` situation, it scales down zabbix server deployment before upgrading database schema, manages entries in the DBs `ha_node` table, etc. Additionally, this job figures out whether a migration from a non-HA enabled setup to a HA-enabled one has been done, and handles necessary actions (scale down pods, delete entries in database) accordingly. The image bases off the zabbix_server image and can be found [here](https://github.com/zabbix-community/helm-zabbix-image-db-init-upgrade-job).

Additionally, in order to make it possible to use **Active checks** and **Active Zabbix Proxies** with a Zabbix Server setup having High Availability enabled, a **HA Labels sidecar** has been introduced, continuously monitoring the Zabbix server pod for amount of running Zabbix server processes to figure out whether the Pod is being "active" or "standby" Zabbix Server node, and updating HA-related labels on the pod, accordingly.
Additionally, in order to make it possible to use **Active checks** and **Active Zabbix Proxies** with a Zabbix Server setup having High Availability enabled, a **HA Labels sidecar** has been introduced, continuously monitoring the Zabbix server pod for amount of running Zabbix server processes to figure out whether the Pod is being "active" or "standby" Zabbix Server node, and updating HA-related labels on the pod, accordingly. The image for these sidecar containers is been contained [here within this Github organization](https://github.com/zabbix-community/helm-zabbix-image-ha-labels-sidecar).

The reason to implement it this way and not by probing the port number, which was my initial approach, is that probing the port of Zabbix Server will make it generate a message in the log, stating that a connection without a proper payload has been initiated towards the Zabbix Server. More info: #115


# Thanks

> **About the new home of helm chart**
Expand Down
211 changes: 0 additions & 211 deletions charts/zabbix/templates/configmap-zabbix-server-init-waitschema.yaml

This file was deleted.

25 changes: 1 addition & 24 deletions charts/zabbix/templates/deployment-zabbix-server.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,31 +88,8 @@ spec:
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
initContainers:
{{- if .Values.zabbixServer.zabbixServerHA.enabled }}
- name: init-wait-for-database-schema
{{- if .Values.zabbixServer.image.tag }}
image: "{{ .Values.zabbixServer.image.repository }}:{{ .Values.zabbixServer.image.tag }}"
{{- else }}
image: "{{ .Values.zabbixServer.image.repository }}:{{ .Values.zabbixImageTag }}"
{{- end }}
env:
{{- include "zabbix.postgresAccess.variables" (list $ . "zabbix") | nindent 12 }}
{{- with .Values.zabbixServer.extraEnv }}
{{- toYaml . | nindent 12 }}
{{- end }}
securityContext:
{{- toYaml .Values.zabbixServer.securityContext | nindent 12 }}
resources:
{{- toYaml .Values.zabbixServer.resources | nindent 12 }}
command:
- "/bin/bash"
- "/script/wait_db_schema.sh"
volumeMounts:
- name: init-waitschema-script
mountPath: /script
{{- end }}
{{- with .Values.zabbixServer.extraInitContainers }}
initContainers:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
Expand Down
20 changes: 16 additions & 4 deletions charts/zabbix/templates/job-create-upgrade-db.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ metadata:
{{- toYaml .Values.zabbixServer.zabbixServerHA.dbCreateUpgradeJob.jobLabels | nindent 6 }}
{{- end }}
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded
{{- range $key,$value := .Values.zabbixServer.zabbixServerHA.dbCreateUpgradeJob.jobAnnotations }}
Expand All @@ -42,25 +42,29 @@ spec:
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- if .Release.IsUpgrade }}
serviceAccountName: {{ template "zabbix.fullname" . }}-ha-helper
{{- end }}
containers:
{{- with .Values.zabbixServer.zabbixServerHA.dbCreateUpgradeJob.extraContainers }}
{{- toYaml . | nindent 8 }}
{{- end }}
- name: create-upgrade-db
{{- $pattern := "[0-9]+\\.[0-9]+" -}}
{{- $tag := "" -}}
{{- if .Values.zabbixServer.image.tag }}
{{- if .Values.zabbixServer.zabbixServerHA.dbCreateUpgradeJob.image.tag }}
{{- $tag = .Values.zabbixServer.zabbixServerHA.dbCreateUpgradeJob.image.tag }}
{{- else if .Values.zabbixServer.image.tag }}
{{- $zabbixTag := .Values.zabbixServer.image.tag -}}
{{- $match := regexFind $pattern $zabbixTag -}}
{{- if $match }}
{{- $tag = printf "%s-latest" $match -}}
{{- $tag = printf "%s-%s" $match .Values.zabbixServer.zabbixServerHA.dbCreateUpgradeJob.image.tagSuffix -}}
{{- end }}
{{- else }}
{{- $globalTag := .Values.zabbixImageTag -}}
{{- $match := regexFind $pattern $globalTag -}}
{{- if $match }}
{{- $tag = printf "%s-latest" $match -}}
{{- $tag = printf "%s-%s" $match .Values.zabbixServer.zabbixServerHA.dbCreateUpgradeJob.image.tagSuffix -}}
{{- end }}
{{- end }}
{{- if eq $tag "" }}
Expand All @@ -78,6 +82,14 @@ spec:
env:
- name: ZBX_SERVER_DEPLOYMENT_NAME
value: {{ template "zabbix.fullname" . }}-zabbix-server
- name: HELM_HOOK_TYPE
{{- if .Release.IsUpgrade }}
value: upgrade
{{- else if .Release.IsInstall }}
value: install
{{- else }}
value: unknown
{{- end }}
{{- include "zabbix.postgresAccess.variables" (list $ . "zabbix") | nindent 10 }}
{{- with .Values.zabbixServer.extraEnv }}
{{- toYaml . | nindent 10 }}
Expand Down
Loading
Loading