Skip to content

Latest commit

 

History

History
453 lines (358 loc) · 18.1 KB

README.md

File metadata and controls

453 lines (358 loc) · 18.1 KB

Lab 9: Deploying SPIRE in High Availability Mode

Prerequisites

  • A 64-bit Linux environment (AMD, Intel, or Arm).
  • Basic familiarity with Kubernetes manifests and kubectl commands.
  • General understanding of helm commands is helpful.

Introduction

Welcome to Coastal Containers Ltd., where the seamless management of tug boat fleets is paramount to ensuring the efficient maneuvering of colossal freighters within bustling ports. In a world where security and availability are pivotal, SPIRE (the SPIFFE Runtime Environment) sails to the rescue, ensuring our tug boats, represented by workloads in our Kubernetes clusters, are continuously authenticated and authorized to communicate with the headquarters (HQ). Please check out the architecture diagram below, which provides a simplified overview of the setup we will be looking to deploy.

graph TD
    subgraph "Kubernetes Cluster"
        subgraph "spire ns"
            postgres-db[PostgreSQL DataStore]
            spire-server-0[SPIRE Server]
            spire-server-1[SPIRE Server]
            spire-server-2[SPIRE Server]
            spire-agent[SPIRE Agent]
            postgres-db <-->|Read/Write| spire-server-0
            postgres-db <-->|Read/Write| spire-server-1
            postgres-db <-->|Read/Write| spire-server-2
            spire-server-0 <-->|Node API| spire-agent
            spire-server-1 <-->|Node API| spire-agent
            spire-server-2 <-->|Node API| spire-agent
        end
        subgraph "default ns"
            subgraph "fleet-alpha"
                alpha-deployment((Tugboat Pods))
            end
            subgraph "fleet-beta"
                beta-deployment((Tugboat Pods))
            end
        end
    end
    spire-agent <-->|Workload API| fleet-alpha
    spire-agent <-->|Workload API| fleet-beta
Loading

In this lab, we'll embark on a journey where two fleets, fleet-alpha and fleet-beta, will be deployed, each representing a set of tug boats that need to authenticate themselves to the Coastal Containers Ltd. HQ. The HQ, represented by three SPIRE servers (spire-server-0, spire-server-1, and spire-server-2), will issue SPIFFE Verifiable Identity Documents (SVIDs) to authenticate each tug boat in the fleets, ensuring secure and trusted communication channels, while maintaining high availability through a shared Postgres datastore.

Step-by-Step Instructions

Step 1: Provision Infrastructure

To set sail, spin up the Kubernetes demo cluster using Kind by issusing the following make command while in the root lab directory:

make cluster-up

If your cluster is already running, you can skip this step and continue on with the lab.

Step 2: Add the SPIRE Helm Repository

Add the SPIRE Helm repository to your Helm client:

make spire-add-helm-repo

If this operation is successful, you should see an update to the spiffe chart repository and the output:

Update Complete. *Happy Helming!*

Step 3: Deploy the Postgres Database

Deploy a Postgres database that will be used as a shared datastore by the SPIRE servers to ensure high availability:

make deploy-postgres

Alternative database solutions can be investigated within the SPIRE Server Configuration Reference, and more specifically, the Server Plugin: DataStore "sql" documentation.

For the purposes of this demonstration, we will be using PostgreSQL due to its native resilience and scalability.

Step 4: Setup SPIRE with Helm

Deploy SPIRE using Helm, ensuring the SPIRE servers are configured for high availability and connected to the shared Postgres datastore:

make spire-helm-install

If SPIRE deploys successfully, you should see an output similar to:

🏗️ Installing SPIRE using Helm...
NAME: spire-crds
LAST DEPLOYED: Fri Mar  8 10:45:50 2024
NAMESPACE: spire
STATUS: deployed
REVISION: 1
TEST SUITE: None
NAME: spire
LAST DEPLOYED: Fri Mar  8 10:45:51 2024
NAMESPACE: spire
STATUS: deployed
REVISION: 1
NOTES:
Installed spire…

Spire CR's will be handled only if className is set to "spire-spire"
✔️ SPIRE installed using Helm.

Note: SPIRE may take a few minutes to get running, so please be patient until all pods are running in the spire namespace.

Before we move on, let's dissect the values.yaml file to understand the configuration that enables SPIRE to operate in a high-availability mode:

global:
  spire:
    clusterName: kind-kind
    trustDomain: coastal-containers.example
spire-server:
  replicaCount: 3
  ca_subject:
    country: UK
    organization: CoastalContainers
    common_name: Coastal Containers Ltd
  controllerManager:
    enabled: true
    image:
      registry: ""
      pullPolicy: Never
      tag: latest
  image:
    registry: ""
    pullPolicy: Never
    tag: latest
  dataStore:
    sql:
      databaseType: postgres
      databaseName: spire
      host: spire-db
      port: 5432
      username: spire
      password: coastal-containers
      options:
        - sslmode: disable
spiffe-oidc-discovery-provider:
  enabled: false
spire-agent:
  image:
    registry: ""
    pullPolicy: Never
    tag: latest
  waitForIt:
    image:
      registry: ""
      pullPolicy: Never
      tag: latest
spiffe-csi-driver:
  image:
    registry: ""
    pullPolicy: Never
    tag: latest
  nodeDriverRegistrar:
    image:
      registry: ""
      pullPolicy: Never
      tag: latest

Global Configuration:

  • clusterName: The name of the K8s cluster, in this case it is kind-kind with our infrastructure setup.
  • trustDomain: The trust domain for SPIFFE IDs, in this case it is coastal-containers.example per our scenario setup.

SPIRE Server Configuration:

  • replicaCount: Number of replicas for the SPIRE Server, ensuring high availability. This is set to 2 for our purposes.
  • ca_subject: Contains the CA subject information for generating self-signed certificates.
    • country: CA country name, set to UK for the Coastal Containers HQ.
    • organization: CA organization name, set to CoastalContainers for our scenario.
    • common_name: CA common name, set to Coastal Containers Ltd for our scenario setup.

DataStore Configuration:

  • sql: Configurations for the SQL datastore.
    • databaseType: Type of the database, postgres will be used for our HA setup.
    • databaseName: Name of the database, spire for demo purposes.
    • host: Hostname or IP address of the database, this is set to spire-db - the name of our postgres kubernetes service.
    • username: Username for database access, set to spire for demo purposes.
    • password: Password for database access, set to coastal-containers for our scenario setup. (In production, use secure mechanisms like encrypted secrets passed through an environment variable, e.g., ${DBPW}, instead of hard coding.)
    • options: Additional options for the database connection, we simply disable SSl for demo purposes.

High Availability Insights:

  • Replica Count: By setting replicaCount to 3, we ensure three instances of the SPIRE Server are running, providing redundancy and high availability.
  • DataStore: Using a shared postgres datastore allows the SPIRE Server instances to maintain a synchronized state, ensuring continuous operations even if one instance fails.

That's it! This step provides a detailed understanding of how the values.yaml file configures SPIRE for high availability and how the various components are interconnected. As always, make sure to always adhere to best practices for managing sensitive data, especially in production environments.

📝Note: You can find a detailed break-down of the available SPIRE helm chart configuration via the values.yaml manifest provided within the official SPIRE Helm Chart Repository.

Step 5: View the Running Setup

With SPIRE up and running, we can now investigate the setup to see the deployed components and their configuration per our specification in the values.yaml file.

First, to view the SPIRE Server configuration, run the following:

make spire-view-server-config

Next, to view the SPIRE Agent configuration, run the following:

make spire-view-agent-config

These commands will output the set configurations within the respective server.conf and agent.conf files based on our Helm chart deployment.

To investigate the Postgres database and ensure it is operating smoothly, run the following:

kubectl logs -l app=spire-db -n spire

Step 6: Build and Deploy the Workloads

Now, with SPIRE up and running, we can deploy the fleet-alpha and fleet-beta workloads, which represent our tug boat fleets. Before we deploy them, however, we must build the workload Docker image we will be using for their respective deployments. To do this, navigate to the lab root directory and run:

make cluster-build-load-image DIR=workload

This will create a Docker image with the tag workload:latest and load it into your kind cluster, which we will then use to deploy fleet-alpha and fleet-beta. Next, to deploy the two 'fleet' workloads, run:

make deploy-fleets

If the workloads deployed as expected, you should see:

Deploying tugboat fleet alpha...
serviceaccount/fleet-alpha created
deployment.apps/fleet-alpha created
Tugboat fleet alpha deployed.
Deploying tugboat fleet beta...
serviceaccount/fleet-beta created
deployment.apps/fleet-beta created
Tugboat fleet beta deployed.

Each tug boat fleet uses a simple go-spiffe workload image which watches for X509-SVID updates and outputs them in PEM format. This setup is based on the spiffe-watcher example.

Step 7: Check SPIRE Server Registration Entries

With the provided Helm chart deployment, the two SPIRE Servers will automatically attest the fleet workloads and assign SVIDs via the SPIRE Controller Manager. Before we move on, let's dive a little deeper into what the SPIRE Controller Manager is and how it helps us.

SPIRE Controller Manager

In a high-availability SPIRE deployment like the one you're working through in this lab, managing and rotating SPIFFE Verifiable Identity Documents (SVIDs) is a fundamental operation to ensure the continuity and security of workload communications. This is where the SPIRE Controller Manager comes into play.

The SPIRE Controller Manager serves as a bridge between your Kubernetes environment and the SPIRE Server, facilitating the automatic registration of workloads and the establishment of federation relationships. It manages the lifecycle of SVIDs, ensuring that they are issued, renewed, and rotated as necessary, in alignment with the SPIFFE specification. Functionally, the SPIRE Controller Manager works through multiple custom resources and reconciliation processes (outlined below) to provide automated management of SVIDs.

  • Custom Resources:

    • ClusterSPIFFEID: Describes the identity shape applied to workloads and selectors identifying the applicable workloads. The SPIRE Controller Manager watches this resource for updates. Upon deployment, the spire-controller-manager-service-account-based ClusterSPIFFEID is automatically created.
    • ClusterStaticEntry: Declares static SPIRE registration entries, generally for workloads outside the Kubernetes cluster but within the trust domain.
    • ClusterFederatedTrustDomain: Describes a federation relationship for the cluster.
  • Reconciliation:

    • Workload Registration: Triggered when changes on Pods, ClusterSPIFFEID, or ClusterStaticEntry resources are detected, ensuring SPIRE entries match the declared state.
    • Federation: Triggered when changes on ClusterFederatedTrustDomain resources are detected, ensuring SPIRE federation relationships match the declared state.

The SPIRE Controller Manager is designed to be deployed within the same pod as the SPIRE Server, communicating via a private Unix Domain Socket in a shared volume. By default, the SPIRE Helm Chart applies configuration settings for deploying the SPIRE Controller Manager alongside the SPIRE Server. As the provided values.yaml file for this lab relies on the default configuration for the controllerManager, we've provided a simple values.yaml manifest snippet on how to declaratively enable the SPIRE Controller Manager:

spire-server:
  enabled: true
  controllerManager:
    enabled: true

For a more in-depth demonstration of how the SPIRE Controller Manager works, consult the documented demo located on the spire-controller-manager repository. Once you're comfortable with your knowledge of the SPIRE Controller Manager, let's move on to the next steps of this exercise.

In order to verify the registration entries were created, we can check each SPIRE Server.

First, verify the registration entries within spire-server-0, by running:

kubectl exec -n spire spire-server-0 -- /opt/spire/bin/spire-server entry show

The output of these commands should show registration entries for the spire-server (3), spiffe-csi-driver, spire-agent, and fleet-alpha (5) / fleet-beta (5) workload deployments.

Keep in mind that the structure of assigned SPIFFE IDs will follow the default K8s URI schema shown below:

spiffe://coastal-containers.example/ns/{namespace}/sa/{serviceaccount}

Five SVIDs will have been created for both the fleet-alpha and fleet-beta workloads given the number of replicas in each deployment. This is the expected and desired outcome.Note: These registration entries will be reflected within the two other SPIRE Server instances as well, and can be verified by the same command using spire-server-1 and spire-server-2

Step 8: Investigate Fleet Deployments

To dive even deeper, you can explore the logs of the fleet-alpha and fleet-beta deployments to observe the issued SVIDs.

Inspect fleet-alpha by running:

kubectl logs deployment/fleet-alpha

Inspect fleet-beta by running:

kubectl logs deployment/fleet-beta

Alternatively, you can inspect each individual tug boat pod by running:

kubectl logs fleet-<name>-<podid>

If everything is working properly, you will see an outputted SVID in PEM format that looks similar to this:

-----BEGIN CERTIFICATE-----
MIIDJjCCAg6gAwIBAgIQU6rttK6HtG5KnqGuP9OW1DANBgkqhkiG9w0BAQsFADBU
MQswCQYDVQQGEwJVSzEgMB4GA1UEChMXQ29hc3RhbCBDb250YWluZXJzIEx0ZC4x
IzAhBgNVBAMTGmNvYXN0YWwtY29udGFpbmVycy5leGFtcGxlMB4XDTIzMTAwNjE4
NDU0OVoXDTIzMTAwNjIyNDU1OVowSDELMAkGA1UEBhMCVVMxDjAMBgNVBAoTBVNQ
SVJFMSkwJwYDVQQtEyAwZGNjZmEwOTNlNjFmZWVkNzUyNzllNzM0M2FhYWJlNzBZ
MBMGByqGSM49AgEGCCqGSM49AwEHA0IABEtJnJ0UAcamPt/G91i6pUSTSWwtG0C4
mREnyBKIi08zJdNtSTWU2RQwBrVfp4ZQS0ZV+KyW9fi61eSrZyx0MUmjgcowgccw
DgYDVR0PAQH/BAQDAgOoMB0GA1UdJQQWMBQGCCsGAQUFBwMBBggrBgEFBQcDAjAM
BgNVHRMBAf8EAjAAMB0GA1UdDgQWBBT0dXY49uK3gVD7qw8uWFgp9cYs/zAfBgNV
HSMEGDAWgBRhGyRyDhionv6w4AayIYbRgqDwBDBIBgNVHREEQTA/hj1zcGlmZmU6
Ly9jb2FzdGFsLWNvbnRhaW5lcnMuZXhhbXBsZS9ucy9kZWZhdWx0L3NhL2ZsZWV0
LWFscGhhMA0GCSqGSIb3DQEBCwUAA4IBAQAO0Kkx6ogTGnqB/CBnQUi4k442smgl
DI0yHxKRT1dtQkkMaj+V436bP9aC8rSzdyAqLxktapxMrPVmk85vWxuI9Lj9giq6
EhOkAWWBrANgni4YYrh5o2TaMJthS42OW6eBpi08JNSszodCjVf8kY4gCVDydhLN
MVxbZokCrX+lXSnBX/azztTga2fF6ZNgRDV6smpiZLM+uVVUIQrq1bHn4Ujmwdza
z+qFtOiqs2rEERBfdqHfK3ByvwP18lGLphsvFiCIeAdzHNAWMEPrUUotF1EK08hu
18YJxC4ZaweZdRjgajsmYDI2YXCC7MqRJZuLTP8HpUwcCiW7WSuEnedT
-----END CERTIFICATE-----

Step 9: Cleanup

To tear down the entire Kind cluster, run:

make cluster-down

Conclusion

Congratulations, Captain! 🚢 You've successfully navigated through the deployment of SPIRE in a high-availability mode, ensuring that the Coastal Containers Ltd. HQ can authenticate its tug boat fleets (fleet-alpha and fleet-beta) even in the face of partial system failures.

Your journey involved deploying SPIRE servers with a shared Postgres datastore, ensuring that if one server encounters turbulent waters, the other can seamlessly take over, ensuring no disruption in issuing SVIDs to the fleets. The tug boats, represented by Kubernetes pods, continuously receive their rotated SVIDs, ensuring secure and authenticated communication with the HQ.

Final Considerations: Cert Manager Setup

Cert Manager is an extensible X.509 certificate controller made for Kubernetes and OpenShift workloads. Fundamentally, cert-manager works to manage the automated issuance and renewal of certificates, thereby helping to build Public Key Infrastructure (PKI) by securing ingress and pod-to-pod communication. As cert-manager operates based on custom certificate and issuer resources within K8s, it is scalable and extensible by nature. Due to this, cert-manager can be considered as an supplemental tool when setting up SPIRE in high availability mode. Namely, cert-manager can be deployed as a shared root authority (via the UpstreamAuthority plugin), which distributes intermediate certificates to each SPIRE Server instance.

For those looking to further secure their SPIRE architecture by integrating cert-manager as an upstream CA, consider the following snippet in your values.yaml for the SPIRE Helm chart:

  upstreamAuthority:
    certManager:
      enabled: true
      issuerName: "spire-server-ca"
      issuerKind: "Issuer"
      issuerGroup: "cert-manager.io"
      namespace: "cert-manager"

Note: Make sure to tailor this manifest configuration to your cert-manager setup and assigned issuers.