-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scalar-manager] Create the Helm chart for Scalar Manager #254
Conversation
e4ea3c9
to
c2ffb76
Compare
charts/scalar-manager/Chart.yaml
Outdated
version: 2.0.0-SNAPSHOT | ||
appVersion: 2.0.0-SNAPSHOT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we only have 1.0.0-SNAPSHOT images.
I use 2.0.0-SNAPSHOT here because we will bump the SNAPSHOT images version to 2.0.0 after officially release Scalar Manager.
- apiGroups: ["", "apps", "batch", "rbac.authorization.k8s.io"] | ||
resources: ["pods", "deployments", "services", "namespaces", "configmaps", "secrets", "cronjobs", "serviceaccounts", "roles", "rolebindings", "jobs"] | ||
verbs: ["get", "list", "create", "patch", "delete", "update"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put all resources and verbs in one rule for convenience.
However, this rule grants too many permissions.
I guess in better practice we should separate them accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, you should define separate rules for each api group. For example, the current configuration grants permissions to a pod resource in the apps
group that does not exist. This can be a problem if this resource is created in the future and should not be granted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your advise.
I made a change according to your suggestion in 3182519
The rules became to
rules:
- apiGroups: [""]
resources: ["pods", "services", "namespaces", "configmaps", "secrets", "serviceaccounts"]
verbs: ["get", "list", "create", "patch", "delete", "update"]
- apiGroups: ["batch"]
resources: ["cronjobs", "jobs"]
verbs: ["get", "list", "create", "delete"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["roles", "rolebindings"]
verbs: ["get", "list", "create", "delete"]
In the core group, for example, this application doesn't delete or update pods. If so, should we separate the resources with proper verbs in general practice?
Like, for example,
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets", "serviceaccounts"]
verbs: ["get", "list", "create", "patch", "delete", "update"]
- apiGroups: [""]
resources: ["pods", "services", "namespaces"]
verbs: ["get", "list"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this application doesn't delete or update pods. If so, should we separate the resources with proper verbs in general practice?
Yes, that is correct. Based on the principle of least privilege, unnecessary privileges should not be granted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@superbrothers
Thank you very much.
Sorry, I recalled that we probably need these permissions because the application in this Helm chart (Scalar Manager) uses Helm to install another Helm chart. Allow me to keep them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating the chart!
I left some suggestions and questions.
Please take a look when you have time!
charts/scalar-manager/values.yaml
Outdated
|
||
serviceAccount: | ||
# Specifies whether a service account should be created | ||
create: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct, I think Scalar Manager always needs a ServiceAccount
to run some Kubernetes API. In other words, there is no use case that ServiceAccount is not necessary (configure serviceAccount.create=false)
in the helm chart of Scalar Manager.
Is my understanding correct?
(If yes, I think the create (true or false)
parameter might cause some confusion...)
Also, recently, if we always need to create ServiceAccount
(e.g., in the ScalarDB Cluster chart), we don't provide the parameter *.create (true or false)
.
(Previously, we used *.rbac.create
in some charts, but it is used for Pod Security Policy (PSP) things. However, the PSP has already been depreciated. So, we don't use *.rbac.create
for some resources recently.)
In addition, to keep configuration consistency between each Scalar Helm Chart, it would be better to follow the ScalarDB Cluster chart as follows:
https://github.com/scalar-labs/helm-charts/blob/scalardb-cluster-1.3.2/charts/scalardb-cluster/values.yaml#L270-L274
scalardbCluster:
serviceAccount:
serviceAccountName: "foo"
automountServiceAccountToken: true
(Sorry, it is a bit long and duplicated key name... It was my mistake... But I want to prioritize the consistency at this point.)
FYI:
In the ScalarDB Cluster chart, we support the following two features:
- If users do NOT specify the
scalardbCluster.serviceAccount.serviceAccountName
, the chart createsServiceAccount
automatically and adds necessary permissions by usingRole
andRoleBinding
. - If users specify the
scalardbCluster.serviceAccount.serviceAccountName
, the chart uses the existing (user created)ServiceAccount
and adds necessary permissions to the existingServiceAccount
by usingRole
andRoleBinding
.
The one of the use cases of 2.
is AWS Marketplace. If you use AWS Marketplace, you have to apply the AWS IAM Role to the pod to run the AWS Marketplace metering API.
In this case, users must create ServiceAccout
by following the AWS document, and mount it to the ScalarDB Cluster pod.
To support such operations, we support either create ServiceAccount automatically (default behavior)
or use existing ServiceAccount
in the ScalarDB Cluster chart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, let's use the same implementation as ScalarDB Cluster chart.
{{- if .Values.serviceAccount.create }} | ||
{{- default (include "scalar-manager.fullname" .) .Values.serviceAccount.name }} | ||
{{- else }} | ||
{{- print (include "scalar-manager.fullname" .) "-sa" | trunc 63 | trimSuffix "-" }} | ||
{{- default "default" .Values.serviceAccount.name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in another comment, to keep configuration consistency between each Scalar Helm Chart, I think it would be better to use the same implementation as the ScalarDB Cluster chart.
https://github.com/scalar-labs/helm-charts/blob/scalardb-cluster-1.3.2/charts/scalardb-cluster/values.yaml#L270-L274
What do you think?
@@ -0,0 +1,19 @@ | |||
apiVersion: rbac.authorization.k8s.io/v1 | |||
kind: ClusterRole |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. What is the reason why we use ClusterRole
instead of Role
?
In other words, does Scalar Manager access the following resources that need the ClusterRole
?
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example
- cluster-scoped resources (like nodes)
- non-resource endpoints (like
/healthz
) - namespaced resources (like Pods), across all namespaces
charts/scalar-manager/values.yaml
Outdated
|
||
# refreshInterval -- The interval that Scalar Manager refresh the status of the monitoring targets | ||
refreshInterval: 30 | ||
type: LoadBalancer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: LoadBalancer | |
type: ClusterIP |
I think it would be better to use ClusterIP
as a default value of service.type
.
This is because if we set LoadBalancer
by default, an actual Load Balancer will be created unexpectedly even if users don't set any configurations (annotations) about Load Balancer.
It might expose some ports to LAN or WAN accidentally even if a user does not want to expose ports. In other words, there is a risk of exposing some ports unexpectedly.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we will set ClusterIP
as a default value in the values.yaml
.
charts/scalar-manager/values.yaml
Outdated
grafanaKubernetesServiceLabelName: "app.kubernetes.io/name" | ||
grafanaKubernetesServiceLabelValue: "grafana" | ||
grafanaKubernetesServicePortName: "http-web" | ||
prometheusKubernetesServiceLabelName: "app" | ||
prometheusKubernetesServiceLabelValue: "kube-prometheus-stack-prometheus" | ||
prometheusKubernetesServicePortName: "http-web" | ||
lokiKubernetesServiceLabelName: "app" | ||
lokiKubernetesServiceLabelValue: "loki" | ||
lokiKubernetesServicePortName: "http-metrics" | ||
helmScalarRepositoryName: "scalar-labs" | ||
helmScalarRepositoryUrl: "https://scalar-labs.github.io/helm-charts" | ||
helmScalarAdminForKubernetesChartName: "scalar-admin-for-kubernetes" | ||
helmScalarAdminForKubernetesChartVersion: "1.0.0" | ||
pausedStateRetentionStorage: "configmap" | ||
pausedStateRetentionMaxNumber: "100" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. Does Scalar Manger only support configuration via environment variables?
If Scalar Manager can accept some configuration files like .properties
, I think it would be better to support accepting arbitrary configuration files in Helm Chart.
This is because if we add a new parameter on the Scalar Manager side, we must also update to add the new parameter as a custom value on the Scalar Helm Chart side.
For example, in the ScalarDB Cluster chart, to avoid such updates, we accept all configurations via the database.properties
file.
So, I think it would be better to accept arbitrary configurations via configuration files if we can, to reduce maintenance costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we will support accepting arbitrary application.properteis
instead of the current environment variables format.
template: | ||
metadata: | ||
annotations: | ||
checksum/config: {{ include (print $.Template.BasePath "/scalar-manager/configmap.yaml") . | sha256sum }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checksum/config: {{ include (print $.Template.BasePath "/scalar-manager/configmap.yaml") . | sha256sum }} |
I think this ConfigMap (/scalar-manager/configmap.yaml)
is used for storing cluster or pause information. And, we don't update this ConfigMap
by using the helm upgrade
command directly (it will be updated by Scalar Manager API).
If the above understanding is correct, this checksum/config
annotation doesn't work. And, I think we don't need this annotation because we don't need to restart (re-create) pods when the ConfigMap
updates.
So, I think we can remove this annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we will remove this annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in another comment, we decided to use application.properties
. So, we will keep this annotation to apply the updated application.properties
properly when we run the helm upgrade
command.
value: {{ .Values.api.pausedStateRetentionStorage | quote }} | ||
- name: PAUSED_STATE_RETENTION_MAX_NUMBER | ||
value: {{ .Values.api.pausedStateRetentionMaxNumber | quote }} | ||
- name: {{ .Chart.Name }}-web |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my remembering is correct, Scalar Manager works as follows.
+-[Kubernetes Cluster A]---+ +-[Kubernetes Cluster B]---+ +-[Kubernetes Cluster C]---+
| | | | | |
| +--------------------+ | | +--------------------+ | | +--------------------+ |
| | Scalar products | | | | Scalar products | | | | Scalar products | |
| +--------------------+ | | +--------------------+ | | +--------------------+ |
| | | | | |
| +--------------------+ | | +--------------------+ | | +--------------------+ |
| | Scalar Manager API | | | | Scalar Manager API | | | | Scalar Manager API | |
| +--------------------+ | | +--------------------+ | | +--------------------+ |
| | | | | | | | |
+------------+-------------+ +------------+-------------+ +------------+-------------+
| | |
| | |
| | |
+-----------------------------+-----------------------------+
|
|
|
+---------+----------+
| Scalar Manager Web |
+--------------------+
So, we don't need to deploy Scalar Manager API
and Scalar Manager Web
in the same one pod.
Vice versa, it would be better to deploy Scalar Manager API
and Scalar Manager Web
separately.
Is my understanding correct? (Sorry, I might miss some Scalar Manager specifications...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, at this time, we must deploy api
and web
containers in one pod. It's a specification of the current Scalar Manager.
resources: | ||
{{- toYaml .Values.resources | nindent 12 }} | ||
ports: | ||
- containerPort: 3000 | ||
imagePullPolicy: {{ .Values.web.image.pullPolicy }} | ||
securityContext: | ||
{{- toYaml .Values.securityContext | nindent 12 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a confirmation. Is it fine to set the same resources
and securityContext
to both containers?
For example, if either one of the workloads is heavy, users might want to set resources
separately.
So, I want to confirm whether the requirements of resources
and securityContext
are completely the same or not between Scalar Manager API
and Scalar Manager Web
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we will separate resources
configuration to api.resources
and web.resources
.
And, we keep the securityContext
as is.
restartPolicy: Always | ||
serviceAccountName: {{ include "scalar-manager.serviceAccountName" . }} | ||
automountServiceAccountToken: {{ .Values.serviceAccount.automount }} | ||
terminationGracePeriodSeconds: 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. How long does Scalar Manager take to stop gracefully?
To consider the value of terminationGracePeriodSeconds
, I want to confirm how long it takes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we can use the default value. So, we can remove this configuration from here.
@@ -0,0 +1,12 @@ | |||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we will update Scalar Manager API to create ConfigMap
by itself and remove this ConfigMap
from the helm chart side in the future release.
restartPolicy: Always | ||
serviceAccountName: {{ include "scalar-manager.serviceAccountName" . }} | ||
automountServiceAccountToken: {{ .Values.serviceAccount.automount }} | ||
terminationGracePeriodSeconds: 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we can use the default value. So, we can remove this configuration from here.
template: | ||
metadata: | ||
annotations: | ||
checksum/config: {{ include (print $.Template.BasePath "/scalar-manager/configmap.yaml") . | sha256sum }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we will remove this annotation.
name: {{ include "scalar-manager.fullname" . }} | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
app.kubernetes.io/app: {{ include "scalar-manager.fullname" . }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we will move this label to _helpers.tpl
.
value: {{ .Values.api.pausedStateRetentionStorage | quote }} | ||
- name: PAUSED_STATE_RETENTION_MAX_NUMBER | ||
value: {{ .Values.api.pausedStateRetentionMaxNumber | quote }} | ||
- name: {{ .Chart.Name }}-web |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, at this time, we must deploy api
and web
containers in one pod. It's a specification of the current Scalar Manager.
charts/scalar-manager/values.yaml
Outdated
|
||
serviceAccount: | ||
# Specifies whether a service account should be created | ||
create: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, let's use the same implementation as ScalarDB Cluster chart.
charts/scalar-manager/values.yaml
Outdated
grafanaKubernetesServiceLabelName: "app.kubernetes.io/name" | ||
grafanaKubernetesServiceLabelValue: "grafana" | ||
grafanaKubernetesServicePortName: "http-web" | ||
prometheusKubernetesServiceLabelName: "app" | ||
prometheusKubernetesServiceLabelValue: "kube-prometheus-stack-prometheus" | ||
prometheusKubernetesServicePortName: "http-web" | ||
lokiKubernetesServiceLabelName: "app" | ||
lokiKubernetesServiceLabelValue: "loki" | ||
lokiKubernetesServicePortName: "http-metrics" | ||
helmScalarRepositoryName: "scalar-labs" | ||
helmScalarRepositoryUrl: "https://scalar-labs.github.io/helm-charts" | ||
helmScalarAdminForKubernetesChartName: "scalar-admin-for-kubernetes" | ||
helmScalarAdminForKubernetesChartVersion: "1.0.0" | ||
pausedStateRetentionStorage: "configmap" | ||
pausedStateRetentionMaxNumber: "100" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we will support accepting arbitrary application.properteis
instead of the current environment variables format.
template: | ||
metadata: | ||
annotations: | ||
checksum/config: {{ include (print $.Template.BasePath "/scalar-manager/configmap.yaml") . | sha256sum }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in another comment, we decided to use application.properties
. So, we will keep this annotation to apply the updated application.properties
properly when we run the helm upgrade
command.
@@ -0,0 +1,12 @@ | |||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we noticed that this ConfigMap
overwrites some stored metadata (e.g., managed cluster or pause duration info) when we run the helm upgrade
command. So, it would be better to remove this ConfigMap
from the helm chart side and create it on the Scalar Manager API side.
63f852d
to
4b85505
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- containerPort: 3000 | ||
imagePullPolicy: {{ .Values.web.image.pullPolicy }} | ||
securityContext: | ||
runAsUser: 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question 1
runAsUser: 1000 | |
runAsUser: 1001 |
It seems that we use 1001
on the Dockerfile
side of Scalar Manager API.
https://github.com/scalar-labs/scalar-manager-web/blob/main/Dockerfile#L41
Is this UID 1000
correct? Or, we should use 1001
instead of 1000
?
Question 2
It seems that we specify the USER
directive on the Dockerfile
side, and we set runAsNonRoot: true
on the helm chart side as a default value.
In this case, I think the scalar-manager-api
container image will run with UID 1001
automatically.
Do we really need to specify runAsUser
as a hard-coded value here? If we don't need it, I think it would be better to remove this hard-coded value from here. (For example, I guess this hard-coded value might cause some issues in the OpenShift environment.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kota2and3kan
Thank you for your advice.
1000 was a typo. It should be 1001
.
Regarding the question 2.
I did this because I encountered an issue from creating the container with the runAsNonRoot
security context. This is the error message from kubectl describe pod <pod-name>
.
Warning Failed 12s (x7 over 65s) kubelet Error: container has runAsNonRoot and image has non-numeric user (nextjs), cannot verify user is non-root (pod: "scalar-manager-57cdcb57d4-pbvvc_scalar-manager(da3940c6-0d80-4287-aa2a-1e9b1f8adc6d)", container: scalar-manager-web)
I am not sure if it's because of the USER command in the Dockerfile here https://github.com/scalar-labs/scalar-manager-web/blob/main/Dockerfile#L54
It specifies the user name but not the user ID.
I tried to change it locally with
USER 1001
like we did for the API
https://github.com/scalar-labs/scalar-manager-api/blob/main/Dockerfile#L22
However, my internet is too slow here to finish building the image so I couldn't really test it.
(the node packages are just a mess ...
Anyway, since it's not a good practice to use runAsUser
context, I removed it in 5502ca0
I will create a PR in the web repository to specify the user by user ID instead of the user name and try to test it later with better Internet conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR using the numeric user ID instead of the user name is submitted in
https://github.com/scalar-labs/scalar-manager-web/pull/70
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the CI failed because of the removal of runAsUser
.
We can test it after we merge https://github.com/scalar-labs/scalar-manager-web/pull/70 (the workflow will create the snapshot image and we can re-run the action here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@supl
Thank you for the updates!
As I mentioned on the Scalar Manager Web side, according to the code comment of Kubernetes, it seems that we MUST use number (UID)
instead of name (string)
in the container image if we want to use a non-root user.
Thank you for the updates! Overall looks good to me, but I am waiting for the release of Scalar Manager Web and Scalar Manager API, just in case. After they are released (and the CI passed), I will merge this PR and release the helm chart. |
@kota2and3kan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
Description
This PR deletes the previous Helm chart of Scalar Manager and creates a new one from scratch.
This is my first trial. Please forgive me if I missed some practice and didn't do it properly.
Related issues and/or PRs
N/A
Changes made
Checklist