Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ux for workload identity for cloud provider azure #4138

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sonasingh46
Copy link
Contributor

@sonasingh46 sonasingh46 commented Oct 16, 2023

What this PR does / why we need it:
This PR adds capability to specify to use workload identity for cloud provider azure on workload clusters on azure.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3589

Special notes for your reviewer:

  • Contains an API change where there is one more acceptable value for AzureMachineTemplate .spec.identity and the new acceptable value is WorkloadIdentity

  • To create a workload cluster with workload identity enabled, certain flags needs to be passed on kube-apiserver and kube-controller-manager which is describe in a comment in the full workload cluster YAML pasted below.

  • cherry-pick candidate

How this will work:

Assume you have a management cluster with the workload identity enabled. Now, you can create a workload cluster which will have workload identity enabled on it.

Here are the pre-reqiuistes steps that should be followed which is already documented here

  • Generate a public/private key pair
  • Generate OIDC and JWKS document.
  • Upload the OIDC and JWKS documents on a public accessible URL. This is called SERVICE_ISSUER_URL
  • Create a federated credential for cloud provider azure.

Distribute the key pairs using the following steps:

  • Encode the public and private key into base64 and put that into a secret with the name <your-workload-cluster-name>-sa in the same namespace as that you the workload cluster Cluster object.
apiVersion: v1
kind: Secret
metadata:
  name: azwi-cluster-sa
type: kubernetes.io/tls
data:
  tls.crt: "<base64-encoded-public-key>"
  tls.key: "<base-64-encoded-private-key>"

Now following is a sample workload cluster YAML that can be used to create a workload cluster which will ensure that the cloud provider azure uses workload identity.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: azwi-cluster
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: azwi-cluster-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AzureCluster
    name: azwi-cluster
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureCluster
metadata:
  name: azwi-cluster
  namespace: default
spec:
  identityRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AzureClusterIdentity
    name: cluster-identity
  location: eastus
  networkSpec:
    subnets:
    - name: control-plane-subnet
      role: control-plane
    - name: node-subnet
      role: node
    vnet:
      name: azwi-cluster-vnet
  resourceGroup: azwi-cluster
  subscriptionID: <your-subscription-id>
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: azwi-cluster-control-plane
  namespace: default
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: external
          # This flag needs to be passed for the workload cluster to be able to use workload identity
          service-account-issuer: <your-service-issuer-url>
          # This flag needs to be passed for the workload cluster to be able to use workload identity
          service-account-key-file: /etc/kubernetes/pki/sa.pub
          # This flag needs to be passed for the workload cluster to be able to use workload identity
          service-account-signing-key-file: /etc/kubernetes/pki/sa.key
        timeoutForControlPlane: 20m
      controllerManager:
        extraArgs:
          allocate-node-cidrs: "false"
          cloud-provider: external
          cluster-name: azwi-cluster
          # This flag needs to be passed for the workload cluster to be able to use workload identity
          service-account-private-key-file: /etc/kubernetes/pki/sa.key
      etcd:
        local:
          dataDir: /var/lib/etcddisk/etcd
          extraArgs:
            quota-backend-bytes: "8589934592"
    diskSetup:
      filesystems:
      - device: /dev/disk/azure/scsi1/lun0
        extraOpts:
        - -E
        - lazy_itable_init=1,lazy_journal_init=1
        filesystem: ext4
        label: etcd_disk
      - device: ephemeral0.1
        filesystem: ext4
        label: ephemeral0
        replaceFS: ntfs
      partitions:
      - device: /dev/disk/azure/scsi1/lun0
        layout: true
        overwrite: false
        tableType: gpt
    files:
    - contentFrom:
        secret:
          key: control-plane-azure.json
          name: azwi-cluster-control-plane-azure-json
      owner: root:root
      path: /etc/kubernetes/azure.json
      permissions: "0644"
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          azure-container-registry-config: /etc/kubernetes/azure.json
          cloud-provider: external
        name: '{{ ds.meta_data["local_hostname"] }}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          azure-container-registry-config: /etc/kubernetes/azure.json
          cloud-provider: external
        name: '{{ ds.meta_data["local_hostname"] }}'
    mounts:
    - - LABEL=etcd_disk
      - /var/lib/etcddisk
    postKubeadmCommands: []
    preKubeadmCommands: []
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: AzureMachineTemplate
      name: azwi-cluster-control-plane
  replicas: 1
  version: v1.27.3
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
metadata:
  name: azwi-cluster-control-plane
  namespace: default
spec:
  template:
    spec:
      dataDisks:
      - diskSizeGB: 256
        lun: 0
        nameSuffix: etcddisk
      osDisk:
        diskSizeGB: 128
        osType: Linux
      # This is the API change which introduce a new value for the identity key
      identity: WorkloadIdentity
      vmSize: Standard_B2s
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: azwi-cluster-md-0
  namespace: default
spec:
  clusterName: azwi-cluster
  replicas: 1
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: azwi-cluster-md-0
      clusterName: azwi-cluster
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AzureMachineTemplate
        name: azwi-cluster-md-0
      version: v1.27.3
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
metadata:
  name: azwi-cluster-md-0
  namespace: default
spec:
  template:
    spec:
      osDisk:
        diskSizeGB: 128
        osType: Linux
      # This is the API change which introduce a new value for the identity key  
      identity: WorkloadIdentity
      vmSize: Standard_B2s
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: azwi-cluster-md-0
  namespace: default
spec:
  template:
    spec:
      files:
      - contentFrom:
          secret:
            key: worker-node-azure.json
            name: azwi-cluster-md-0-azure-json
        owner: root:root
        path: /etc/kubernetes/azure.json
        permissions: "0644"
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            azure-container-registry-config: /etc/kubernetes/azure.json
            cloud-provider: external
          name: '{{ ds.meta_data["local_hostname"] }}'
      preKubeadmCommands: []
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureClusterIdentity
metadata:
  labels:
    clusterctl.cluster.x-k8s.io/move-hierarchy: "true"
  name: cluster-identity
  namespace: default
spec:
  allowedNamespaces: {}
  clientID: <your-client-id>
  tenantID: <your-tenant-id>
  type: WorkloadIdentity

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Add UX to use wrokload identity for cloud provider azure

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 16, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from sonasingh46. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

// ToDo: Find a way to make this configurable for a user.
// This is the path where the projected service account token should be present for
// cloud provider azure.
aadFederatedTokenFilePath = "/var/run/secrets/azure/tokens/azure-identity-token" //nolint:gosec // Path of projected service account token
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of making an API change to configure this. I think we should first go with using an annotation in AzureClusterIdentity object.
@CecileRobertMichon @mboersma -- thoughts??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this file get on the VM?

@codecov
Copy link

codecov bot commented Oct 16, 2023

Codecov Report

Attention: Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.

Project coverage is 57.79%. Comparing base (b7de934) to head (4bf0aa8).
Report is 1290 commits behind head on main.

Files with missing lines Patch % Lines
controllers/helpers.go 0.00% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4138      +/-   ##
==========================================
- Coverage   57.83%   57.79%   -0.04%     
==========================================
  Files         187      187              
  Lines       19195    19208      +13     
==========================================
  Hits        11101    11101              
- Misses       7466     7479      +13     
  Partials      628      628              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 16, 2023
@sonasingh46
Copy link
Contributor Author

Raised a PR in cloud provider azure to project service account token
kubernetes-sigs/cloud-provider-azure#4809

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only suggestion from my end would be to add a test case for workload identity here

@CecileRobertMichon
Copy link
Contributor

/assign

@CecileRobertMichon
Copy link
Contributor

@sonasingh46 could you take what you have in the PR description and document it in https://capz.sigs.k8s.io/topics/vm-identity?

@@ -544,6 +544,8 @@ const (
VMIdentitySystemAssigned VMIdentity = "SystemAssigned"
// VMIdentityUserAssigned ...
VMIdentityUserAssigned VMIdentity = "UserAssigned"
// VMIdentityWorkloadIdentity ...
VMIdentityWorkloadIdentity VMIdentity = "WorkloadIdentity"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when WorkloadIdentity type is configured on the VM, do we need to make any changes to the Azure VM configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure VM configuration -- if you mean the configuration from Azure API point of view, then no.

We just need a change in azure.json -- and azure.json object is built reading this.
https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/4138/files#diff-7d084ba6e9e15005850c1b44d72d457cef09f12badee416beda37f957a242364R211

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean the configuration from Azure API point of view, then no.

yes, that's what I meant

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sonasingh46 can we also update the e2e test that tests workload identity to use this flow so it can be e2e tested?

@sonasingh46
Copy link
Contributor Author

/test optional

@k8s-ci-robot
Copy link
Contributor

@sonasingh46: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-azure-build
  • /test pull-cluster-api-provider-azure-ci-entrypoint
  • /test pull-cluster-api-provider-azure-e2e
  • /test pull-cluster-api-provider-azure-e2e-aks
  • /test pull-cluster-api-provider-azure-test
  • /test pull-cluster-api-provider-azure-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-azure-apidiff
  • /test pull-cluster-api-provider-azure-apiversion-upgrade
  • /test pull-cluster-api-provider-azure-capi-e2e
  • /test pull-cluster-api-provider-azure-conformance
  • /test pull-cluster-api-provider-azure-conformance-custom-builds
  • /test pull-cluster-api-provider-azure-conformance-dual-stack-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-conformance-ipv6-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-e2e-optional
  • /test pull-cluster-api-provider-azure-e2e-workload-upgrade
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow
  • /test pull-cluster-api-provider-azure-windows-custom-builds
  • /test pull-cluster-api-provider-azure-windows-with-ci-artifacts

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-azure-apidiff
  • pull-cluster-api-provider-azure-build
  • pull-cluster-api-provider-azure-ci-entrypoint
  • pull-cluster-api-provider-azure-e2e
  • pull-cluster-api-provider-azure-e2e-aks
  • pull-cluster-api-provider-azure-test
  • pull-cluster-api-provider-azure-verify

In response to this:

/test optional

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sonasingh46
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@sonasingh46
Copy link
Contributor Author

Working on e2e test,
trying to get around with this issue https://kubernetes.slack.com/archives/CEX9HENG7/p1700128923684359

@dtzar dtzar modified the milestones: v1.12, v1.13 Nov 16, 2023
Copy link
Member

@aramase aramase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a discussion with @sonasingh46 this morning about the PR.

This PR as-is is not sufficient for using workload identity (WI) with the workload clusters. The --service-account-issuer-url in the workload cluster will need to be configured with a valid URL and FIC needs to be configured for cloud provider in workload cluster to use WI.

  1. We'll need to come up with the design for using the existing storage account to host the issuer URL (discovery document and JWKS endpoint).
  2. Need to configure that URL as --service-account-issuer-url in the Kube API server.
  3. Handle rotation of the signing keys + updating JWKS endpoint with the new keys (adding new keys to the list after rotation).

Summarizing few other details I mentioned to @sonasingh46

  1. Signing keys should be 1:1 with workload cluster and should not be re-used.
  2. Issuer URL is unique per workload cluster.

If we want to merge this PR as a first step, it would be fine. For testing in CI, we'll need to pre-create the signing keys and issuer URL (similar to the one we have for management cluster in CI) and use those as input for the workload cluster created to test WI.

Comment on lines +256 to +259
// secret is not needed in workload identity.
controlPlaneConfig.AadClientSecret = ""
controlPlaneConfig.UseFederatedWorkloadIdentityExtension = true
controlPlaneConfig.AADFederatedTokenFile = aadFederatedTokenFilePath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to actually work with the workload cluster, the --service-account-issuer-url needs to be configured with an OIDC issuer URL that's https and publicly accessible serving the discovery document and jwks endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workload cluster that will be created will have the --service-account-issuer-url provided via the workload cluster template.

Copying the yaml from the PR description

  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: external
          # This flag needs to be passed for the workload cluster to be able to use workload identity
          service-account-issuer: <your-service-issuer-url>
          # This flag needs to be passed for the workload cluster to be able to use workload identity
          service-account-key-file: /etc/kubernetes/pki/sa.pub
          # This flag needs to be passed for the workload cluster to be able to use workload identity
          service-account-signing-key-file: /etc/kubernetes/pki/sa.key
        timeoutForControlPlane: 20m
      controllerManager:
        extraArgs:
          allocate-node-cidrs: "false"
          cloud-provider: external
          cluster-name: azwi-cluster
          # This flag needs to be passed for the workload cluster to be able to use workload identity
          service-account-private-key-file: /etc/kubernetes/pki/sa.key

@mboersma mboersma assigned mboersma and unassigned mboersma Dec 14, 2023
@sonasingh46
Copy link
Contributor Author

I did not get time to work on this. I will try to take a shot on this weekend and update. Also, if anyone feels that they can help move this faster than me,(and this issue is being a blocker) I am happy to assign this work and provide assistance here or on slack if needed. cc @CecileRobertMichon @mboersma

@jackfrancis jackfrancis modified the milestones: v1.13, next Jan 16, 2024
@mboersma mboersma modified the milestones: next, v1.14 Jan 19, 2024
@nojnhuh nojnhuh modified the milestones: v1.14, next Feb 29, 2024
@mboersma mboersma modified the milestones: next, v1.15 Mar 13, 2024
@mboersma
Copy link
Contributor

mboersma commented Apr 4, 2024

/assign

@mboersma mboersma mentioned this pull request Apr 25, 2024
4 tasks
@jackfrancis jackfrancis modified the milestones: v1.15, next May 2, 2024
@mboersma mboersma assigned jackfrancis and unassigned mboersma May 9, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 29, 2024
@willie-yao
Copy link
Contributor

Is this still being worked on or applicable with all the workload identity changes? @mboersma @sonasingh46

@nawazkh
Copy link
Member

nawazkh commented Nov 7, 2024

Hey @jackfrancis , what are your thoughts on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Wait-On-Author
Development

Successfully merging this pull request may close these issues.

capz should support UX for using workload identity in cloud provider azure