Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30306 Allow arbitrary script based plane validation #17785

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions helm/hpcc/docs/expert.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ global:
NB: Some components (e.g. DfuServer and Thor) also have an 'expert' settings area (see values schema) that can be used for relavent settings
on a per component instance basis, rather than setting them globally.

Planes can also have an expert section (see Plane Expert Settings section)


The following options are currently available:

Expand Down Expand Up @@ -70,3 +72,12 @@ Foreign file reads (~foreign::) are forbidden by default since the official sant
service via remote file reads with the ~remote:: syntax.
Setting expert.allowForeign to true, enables foreign access for compatibility with legacy bare-metal environments
that have their Dali and Dafilesrv's open.


# Plane Expert Settings

## validatePlaneScript (list of { string })

Optional list of bash commands to execute within an init container in pods that use this plane.
This can be used to validate that the plane is healthy, e.g. that it is mounted as expected.
If the script returns a non-zero result, the init container and therefore the pod will fail.
29 changes: 27 additions & 2 deletions helm/hpcc/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -814,9 +814,9 @@ Specifically for now (but could be extended), this container generates sysctl co
A kludge to ensure until the mount of a PVC appears (this can happen with some types of host storage)
*/}}
{{- define "hpcc.waitForMount" -}}
- name: wait-mount-container
- name: {{ printf "wait-mount-container-%s" .volumeName }}
{{- include "hpcc.addImageAttrs" . | nindent 2 }}
command: ["/bin/sh"]
command: ["/bin/bash"]
args:
- "-c"
- {{ printf "until mountpoint -q %s; do sleep 5; done" .volumePath }}
Expand All @@ -825,6 +825,25 @@ A kludge to ensure until the mount of a PVC appears (this can happen with some t
mountPath: {{ .volumePath | quote }}
{{- end }}

{{/*
Inject container to perform any post plane initialization validation
Pass in dict with volumeName, volumePath and cmds
*/}}
{{- define "hpcc.validatePlaneScript" -}}
- name: {{ printf "validate-plane-script-container-%s" .volumeName }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth combining this into the wait for mount container if possible? What is the extra overhead starting a new container?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well both are a kludge, but I wouldn't see both being used in tandem.
I was talking to the person behind the TF module that provisions the raided nvme on the nodes, and he thinks that it may require a new CSI behind the persistent volume to correctly wait for the mount to be fully ready.
But in the short term, in the next release, he is adding a marker file in the mount this mechanism could look for.

All less than ideal, once k8s pods see the pvc, that should be it - we shouldn't have to wait or check like this, but I think the provisioning of these raided nvme's is unusual and not fully supported by AKS out of the box.

{{- include "hpcc.addImageAttrs" . | nindent 2 }}
command: ["/bin/bash"]
args:
- -c
- |
{{- range $cmd := .cmds }}
{{ $cmd }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a ; after each command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's because the block scalar passes it as a script (complete with line breaks) which bash interprets as individual commands (as it would in a script.
e.g. it's the equivalent of:

/bin/bash -c "
echo 'command 1'
echo 'command 2'
echo 'command 3'
"

The example in the JIRA is generated as:

          initContainers:
          - args:
            - |
              echo "Hello, World!"
              ls -l
              df -h
              end_time=$((SECONDS + 1200)); while [ $SECONDS -lt $end_time ]; do if mountpoint -q '/var/lib/HPCCSystems/hpcc-data'; then device=$(df /var/lib/HPCCSystems/hpcc-data | awk 'NR==2 {print $1}'); if [[ "$device" == "/dev/md0" ]]; then echo "correct mount device"; exit 0; else echo "wrong mount device \"$device\", will look again!"; fi; fi; echo waiting for mount point; sleep 5; done; echo failed; exit 1
            command:
            - /bin/bash
            - -c

.. and executes as expected.

{{- end }}
volumeMounts:
- name: {{ .volumeName | quote}}
mountPath: {{ .volumePath | quote }}
{{- end }}


{{/*
A kludge to ensure mounted storage (e.g. for nfs, minikube or docker for desktop) has correct permissions for PV
Expand Down Expand Up @@ -888,6 +907,12 @@ NB: uid=10000 and gid=10001 are the uid/gid of the hpcc user, built into platfor
{{- $volumeName := (printf "%s-pv" $plane.name) -}}
{{- include "hpcc.waitForMount" (dict "root" $root "me" $component "uid" $uid "gid" $gid "volumeName" $volumeName "volumePath" $plane.prefix) | nindent 0 }}
{{- end -}}
{{- if hasKey $plane "expert" -}}
{{- if $plane.expert.validatePlaneScript -}}
{{- $volumeName := (printf "%s-pv" $plane.name) -}}
{{- include "hpcc.validatePlaneScript" (dict "root" $root "me" $component "uid" $uid "gid" $gid "volumeName" $volumeName "volumePath" $plane.prefix "cmds" $plane.expert.validatePlaneScript) | nindent 0 }}
{{- end -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{- end -}}
Expand Down
11 changes: 11 additions & 0 deletions helm/hpcc/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,17 @@
"waitForMount": {
"type": "boolean"
},
"expert": {
"type": "object",
"description": "Custom internal options usually reserved for internal testing",
"properties": {
"validatePlaneScript": {
"description": "a list of bash commands to run to validate the plane is healthy",
"type": "array",
"items": { "type": "string" }
}
}
},
"blockedFileIOKB": {
"description": "Optimal block size for efficient reading from this plane. Implementations will use if they can",
"type": "integer",
Expand Down
Loading