Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(helm chart): Add metrics-server hardening options #1288

Conversation

mkilchhofer
Copy link
Contributor

@mkilchhofer mkilchhofer commented Jul 14, 2023

What this PR does / why we need it:

It adds options on howto secure the connection between metrics-server and the kube-apiserver.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

related to:

Additional info:
n/a

/cc: @serathius

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 14, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: mkilchhofer / name: Marco Maurer (-Kilchhofer) (86b05bc)

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 14, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @mkilchhofer!

It looks like this is your first PR to kubernetes-sigs/metrics-server 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/metrics-server has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 14, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @mkilchhofer. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 14, 2023
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 14, 2023
@serathius
Copy link
Contributor

/assign @stevehipwell

Copy link
Contributor

@stevehipwell stevehipwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @mkilchhofer, I've got a couple of high level review point which it would be good if you could address before I review this in more details

  • There is no need to run functions for the APIService if it's not enabled
  • The helm mode would be a better default (see below)
  • You don't need to recreate the cert on each Helm install, you can look up the existing secret
  • I don't think the chart should support creating a secret, the secret option should only be existing
  • Please remove the extraObjects value, this isn't a "good" pattern

@mkilchhofer
Copy link
Contributor Author

  • There is no need to run functions for the APIService if it's not enabled

How do you mean that? The genSelfSignedCert function?

  • The helm mode would be a better default (see below)

Sure, will do

  • You don't need to recreate the cert on each Helm install, you can look up the existing secret

I think we should not rely on the lookup function. There are GitOps solutions in the wild which only use helm for rendering (helm template subcommand). The lookup function of helm returns nothing during template as it cannot talk to the kubeapi by design. Only upgrade or install talks to the apiserver.

  • I don't think the chart should support creating a secret, the secret option should only be existing

Hmm why not? It is pretty common that helm charts support this.

  • Please remove the extraObjects value, this isn't a "good" pattern

How can I test the existing secret method then? I think it is an approved feature request, so contributors can file PRs for this, ref:

@stevehipwell
Copy link
Contributor

How do you mean that? The genSelfSignedCert function?

You moved the test for if the APIService was enabled below the new code.

I think we should not rely on the lookup function. There are GitOps solutions in the wild which only use helm for rendering (helm template subcommand). The lookup function of helm returns nothing during template as it cannot talk to the kubeapi by design. Only upgrade or install talks to the apiserver.

In that case they can use one of the other patterns, GitOps shouldn't define architecture it should support best practice.

Hmm why not? It is pretty common that helm charts support this.

Because it's unnecessary and leaks data into the chart values.

How can I test the existing secret method then? I think it is an approved feature request, so contributors can file PRs for this, ref:

We could add a test secret as part or the CI initialisation or it could be generated through the chart as a testing resource.

@mkilchhofer
Copy link
Contributor Author

mkilchhofer commented Jul 17, 2023

Because it's unnecessary and leaks data into the chart values.

Well, why is that an issue? This is how helm is designed for. These values are stored as a K8s secret.

@stevehipwell
Copy link
Contributor

Well, why is that an issue? This is how helm is designed for. These values are stored as a K8s secret.

It's the unnecessary part which is the primary concern here; but the Helm implementation and increased surface area for secret information shouldn't be discounted as unimportant.

But as I've said in my other replies you can easily extend this chart by creating a wrapper chart which depends on it and at which point you can make your own architectural decisions.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 18, 2023
@mkilchhofer mkilchhofer force-pushed the feature/helm-chart_hardening branch 3 times, most recently from 81f71cd to ee37851 Compare July 18, 2023 06:36
@mkilchhofer
Copy link
Contributor Author

So could you confirm if you have you tested all these configurations locally?

Yep, I tested it within KinD (kindest/node:v1.29.2):

Bootstrap a kind cluster

$ pwd
/Users/mkilchhofer/git/github/metrics-server/charts/metrics-server

$ kind create cluster --config ../../test/kind-config.yaml
Creating cluster "kind" ...
(..)

$ kubie ctx kind-kind

$ helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --wait \
  --set installCRDs=true \
  --set extraArgs='{--enable-certificate-owner-ref}'
NAME: cert-manager
LAST DEPLOYED: Thu Aug  1 14:22:38 2024
(..)

Test case Option 1 (helm)

$ helm install metrics-server . --namespace kube-system --values ci/tls-helm-values.yaml
NAME: metrics-server
LAST DEPLOYED: Thu Aug  1 14:26:24 2024
(..)

$ kubectl top pods
NAME                                         CPU(cores)   MEMORY(bytes)
coredns-76f75df574-j7bsd                     3m           14Mi
coredns-76f75df574-lfxqp                     3m           14Mi
etcd-kind-control-plane                      44m          44Mi
kindnet-94f7t                                1m           10Mi
kindnet-s7zgz                                1m           9Mi
kindnet-t5tv7                                1m           9Mi
kube-apiserver-kind-control-plane            137m         277Mi
kube-controller-manager-kind-control-plane   47m          52Mi
kube-proxy-2xpwm                             2m           14Mi
kube-proxy-72gdt                             1m           15Mi
kube-proxy-gdhf7                             1m           16Mi
kube-scheduler-kind-control-plane            6m           22Mi
metrics-server-5bc86f74cc-x5q7j              9m           19M

$ helm uninstall metrics-server --namespace kube-system
release "metrics-server" uninstalled

Test case Option 2 (cert-manager)

$ helm install metrics-server . --namespace kube-system --values ci/tls-certManager-values.yaml --wait
NAME: metrics-server
LAST DEPLOYED: Thu Aug  1 14:28:33 2024
(..)

$ kubectl top pods
NAME                                         CPU(cores)   MEMORY(bytes)
coredns-76f75df574-j7bsd                     4m           14Mi
coredns-76f75df574-lfxqp                     4m           14Mi
etcd-kind-control-plane                      53m          44Mi
kindnet-94f7t                                1m           10Mi
kindnet-s7zgz                                1m           9Mi
kindnet-t5tv7                                1m           9Mi
kube-apiserver-kind-control-plane            146m         294Mi
kube-controller-manager-kind-control-plane   60m          55Mi
kube-proxy-2xpwm                             1m           14Mi
kube-proxy-72gdt                             1m           15Mi
kube-proxy-gdhf7                             1m           16Mi
kube-scheduler-kind-control-plane            8m           22Mi
metrics-server-5bc86f74cc-k7ggm              9m           19Mi

$ helm uninstall metrics-server --namespace kube-system
release "metrics-server" uninstalled

Test case Option 3 (existing secret)

$ openssl req -x509 -newkey rsa:2048 -sha256 -days 365 \
  -nodes -keyout tls.key -out tls.crt \
  -subj "/CN=metrics-server" \
  -addext "subjectAltName=DNS:metrics-server,DNS:metrics-server.kube-system.svc"
.....+ (..)

$ kubectl -n kube-system create secret generic metrics-server-existing \
  --from-file=tls.key \
  --from-file=tls.crt
secret/metrics-server-existing created

$ cat <<EOF >> ci/tls-existingSecret-values.yaml
apiService:
  insecureSkipTLSVerify: false
  caBundle: |
$(cat tls.crt  | sed -e "s/^/    /g")
EOF

$ cat ci/tls-existingSecret-values.yaml
args:
  - --kubelet-insecure-tls

## Set via GH action (step "Prepare existing secret test scenario")
# apiService:
#   insecureSkipTLSVerify: false
#   caBundle: |

tls:
  type: existingSecret
  existingSecret:
    name: metrics-server-existing
apiService:
  insecureSkipTLSVerify: false
  caBundle: |
    -----BEGIN CERTIFICATE-----
    (..)
    -----END CERTIFICATE-----

$ helm install metrics-server . --namespace kube-system --values ci/tls-existingSecret-values.yaml --wait
NAME: metrics-server
LAST DEPLOYED: Thu Aug  1 14:44:15 2024
(..)

$ kubectl top pods
NAME                                         CPU(cores)   MEMORY(bytes)
coredns-76f75df574-j7bsd                     4m           22Mi
coredns-76f75df574-lfxqp                     4m           21Mi
etcd-kind-control-plane                      50m          53Mi
kindnet-94f7t                                2m           16Mi
kindnet-s7zgz                                1m           15Mi
kindnet-t5tv7                                1m           17Mi
kube-apiserver-kind-control-plane            146m         290Mi
kube-controller-manager-kind-control-plane   51m          53Mi
kube-proxy-2xpwm                             1m           17Mi
kube-proxy-72gdt                             2m           16Mi
kube-proxy-gdhf7                             3m           16Mi
kube-scheduler-kind-control-plane            7m           24Mi
metrics-server-868545df69-r96pf              11m          19Mi

$ helm uninstall metrics-server --namespace kube-system
release "metrics-server" uninstalled

When I have time at a later time, I can investigate if we can test these scenarios with a helm test.

@stevehipwell
Copy link
Contributor

@mkilchhofer RE testing I was thinking of using Helm testing hooks to install a pod and a job to check the metrics for the pod are being served. This test would work for all CI tests (existing pattern and new patterns).

Copy link
Contributor

@stevehipwell stevehipwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2024
@stevehipwell
Copy link
Contributor

@logicalhan could you please approve this?

@mkilchhofer
Copy link
Contributor Author

@logicalhan friendly ping

@mkilchhofer
Copy link
Contributor Author

@stevehipwell @logicalhan friendly ping

@stevehipwell
Copy link
Contributor

@mkilchhofer I don't have approval permission outside of the Helm chart directory.

CC @dgrisonnet @serathius

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2024
@judahrand
Copy link

I hope that this does end up getting merged!

@mkilchhofer
Copy link
Contributor Author

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Oct 3, 2024
@k8s-ci-robot
Copy link
Contributor

@mkilchhofer: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 3, 2024
@mkilchhofer
Copy link
Contributor Author

Rebased. @dgrisonnet @serathius can you please review?

@dgrisonnet
Copy link
Member

/retest

Copy link
Member

@dgrisonnet dgrisonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I missed the notifications.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 3, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, mkilchhofer, stevehipwell

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2024
@dgrisonnet
Copy link
Member

/retest

@mkilchhofer
Copy link
Contributor Author

/retest-required

@k8s-ci-robot k8s-ci-robot merged commit 5a7a444 into kubernetes-sigs:master Oct 3, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants