Skip to content

Latest commit

 

History

History
271 lines (177 loc) · 27.1 KB

README.md

File metadata and controls

271 lines (177 loc) · 27.1 KB

KMS Vault operator

Black Lives Matter CircleCI GitHub tag (latest SemVer) Docker Pulls Keybase BTC Keybase PGP GitHub

Intro

We all know (or should know) that keeping secrets in plain text in source control is definitely a big no-no. That forces us to keep sensitive text in a different way like a password manager, or worse, in an encrypted file in the same repository (but where do we keep the password for that?!). So we either have a "first secret" problem, or a "source of truth" problem, or an automation problem.

KMS provides a nice solution to the encryption/decryption problem so we can keep encrypted secrets in source control (and thus closer to the code that depends on it). But more often than not, said secret is not very valuable by itself and needs to be decrypted and put where it can be consumed at runtime, e.g. in Vault.

One pattern that can accomplish this is KMS + Vault + Terraform. By creating a KMS key (which can be done with Terraform), we can encrypt a secret that can be safely stored in source control. It can then be decrypted by a Terraform aws_kms_secrets data source and passed to a vault_generic_secret resource to be written into Vault. Top it off with the inmem storage backend and the secret will never be in plaintext in durable storage.

While this gets the job done for storing secrets in a secure way, it doesn't lend itself to automation. Options like Terraform Enterprise are often prohibitively expensive, and ad-hoc solutions like running terraform apply on a cron job are fragile and insecure.

The goal of this operator is to minimize the security risk of secret exposure, while leveraging Kubernetes to provide the automation to minimize configuration drift.

Description

This operator manages KMSVaultSecret CRDs containing secrets encrypted with a KMS key (and base64-encoded) and stores them in Vault. This allows you to securely manage your Vault secrets in source control and because the decryption only happens at runtime, it's only in the operator memory, minimizing the exposure of the plain text secret. The operator resources and scaffolding code are managed and generated automatically with the operator-sdk framework.

The AWS credentials to do the decryption operation use aws-sdk-go and follows the default credential precedence order, so if you're going to inject environment variables or configuration files, you should do so on the operator's Deployment manifest (e.g. deploy/operator.yaml).

Each secret will define a path, a set of kvSettings, and a list of secrets. Each secret will have a key and encryptedSecret field of type string, and a secretContext field that's an arbitrary set of key-value pairs corresponding to the encryption context with which the secret was encrypted.

This first version of the operator supports authenticating to Vault via the Kubernetes auth method, the Userpass authe method, or directly via a Vault token. Support for more authentication methods will be added in the future. Note that the configuration required for the operator to perform KMS and Vault operations is not done on the KMSVaultSecret CR but on the operator Deployment itself, and is documented below.

Configuration

AWS

As stated above, the required configuration to allow the operator to decrypt secrets should be injected via the operator Deployment manifest (as AWS_* environment variables or ~/.aws/credentials/~/.aws/config files). The aws-sdk-go library will also discover IAM roles at runtime if the operator is running on an EC2 instance with an instance role, in which case no additional configuration should be required on the operator.

Vault

In addition to the AWS configuration, Vault configuration should be injected too. The common VAULT_* environment variables will be read and used by the client. The variables that would need to be set will vary depending on your environment, but you'll typically want to at least set VAULT_ADDR, and either VAULT_CACERT/VAULT_CAPATH (pointing to a correspoinging mounted file or directory) or VAULT_SKIP_VERIFY. This is all done on the operator Deployment manifest.

The following sections document the environment variables used by each authentication method and their defaults. They are used in addition to the AWS and base Vault variables described before.

Kubernetes authentication method (--vault-authentication-method=k8s)

Environment variable Required? Default Description
VAULT_K8S_ROLE N kms-vault-operator The Vault role that the client should authenticate as on the Kubernetes login endpoint.
VAULT_K8S_LOGIN_ENDPOINT N auth/kubernetes/login The Kubernetes authentication endpoint in Vault

Vault token authentication method (--vault-authentication-method=token)

This method simply follows the convention and uses the VAULT_TOKEN environment variable to authenticate.

Environment variable Required? Default Description
VAULT_TOKEN Y The Vault token used to perform operations on Vault.

Vault userpass authentication method (--vault-authentication-method=userpass)

Environment variable Required? Default Description
VAULT_USERNAME Y The Vault username to use for authentication
VAULT_PASSWORD Y The password corresponding to VAULT_USERNAME

Vault approle authentication method (--vault-authentication-method=approle)

Environment variable Required? Default Description
VAULT_APPROLE_ROLE_ID Y The AppRole role id to use for authentication
VAULT_APPROLE_SECRET_ID Y The AppRole secret id to use for authentication
VAULT_APPROLE_ENDPOINT N auth/approle/login The Vault endpoint to use for this authentication method

Vault github authentication method (--vault-authentication-method=github)

Environment variable Required? Default Description
VAULT_GITHUB_TOKEN Y The GitHub token to use for authentication
VAULT_GITHUB_AUTH_ENDPOINT N auth/github/login The Vault endpoint to use for this authentication method

Vault iam authentication method (--vault-authentication-method=iam)

This authentication method uses Vault AWS auth method, but specifically the iam method only. Support for the ec2 method might be added in the future.

Since the operator itself is already assumed to be running with some form of access to IAM credentials for decrypting operations (via environment variables, kube2iam, kiam, etc.), those same credentials can be be used by default to log in to Vault using the aws method. However, you can also inject static credentials via the environment variables below if wish to keep decryption credentials separate from login credentials.

Environment variable Required? Default Description
VAULT_IAM_AWS_ACCESS_KEY_ID N No default, but if not specified, a dynamic access key id will be retrieved at runtime using the standard AWS credentials provider chain, assuming one is available. A static value to use as the access key ID for Vault login purposes.
VAULT_IAM_AWS_SECRET_ACCESS_KEY N No default, but if not specified, a dynamic secret access key will be retrieved at runtime using the standard AWS credentials provider chain, assuming one is available. A static value to use as the secret access key for Vault login purposes.
VAULT_IAM_ROLE N No default, but if a role is not specified, Vault will try to guess the role name based on the principal name associated with the credentials (e.g. the IAM user name or role). The name of a Vault role to assume with the IAM credentials provided. This is the Vault role that must be previously configured, not the IAM role you may be authenticating with.
VAULT_IAM_AUTH_ENDPOINT N auth/aws/login The Vault endpoint to use for this authentication method

NOTE: the remote Vault instance will also require runtime permissions to perform the IAM validation actions. Those credentials cannot be set by the operator and must be set directly in the target Vault cluster by other means. Refer to the official Vault documentation for the recommended IAM policy.

Command-line flags

Flag Default Description
--vault-authentication-method token Method to be used for the controller to authenticate with Vault.
--sync-period-seconds 120 Amount of time in seconds to wait between before syncing the secret to Vault

Creating a secret

Setting up the AWS resources required to encrypt/decrypt a KMS secret are outside of the scope of this documentation (but you can read about KMS here), so we're going to assume that the KMS key already exists and that the credentials used to encrypt the secret offline, as well as those use to decrypt it at runtime have the required permissions to perform those actions. If you're Terraform-oriented, you might find these modules useful for dealing with KMS.

From the command line, you can use the aws kms encrypt command, e.g.

aws kms encrypt --key-id <key-id-or-alias> --plaintext "Hello world" --output text --query CiphertextBlob

Then take that output and set it as an item the spec.secrets array of your KMSVaultSecret resource, e.g.

apiVersion: k8s.patoarvizu.dev/v1alpha1
kind: KMSVaultSecret
metadata:
  name: example-kmsvaultsecret
  namesace: default
spec:
  path: secret/test/kms-vault-secret
  kvSettings:
    engineVersion: v1
  secrets:
    - key: test
      encryptedSecret: <kms-encrypted-secret>

After that, assuming the resource is in deploy/example-kms-vault-secret.yaml, just run:

kubectl apply -f deploy/example-kms-vault-secret.yaml

Partial secrets

In addition to managing KMSVaultSecret custom resources, this operator also handles a second type of resource called PartialKMSVaultSecret. This CRD is similar to KMSVaultSecret but only supports the secrets field, and doesn't have its own controller. Instead, the purpose of this resource is to hold secrets that can be included in a KMSVaultSecret, via the includeSecrets field. The single kmsvaultsecret_controller.go will aggregate the included secrets along with those of the resource itself and write them all together as a single item in Vault. To keep things as simple as possible, the first iteration of this feature won't support nesting PartialKMSVaultSecrets (e.g. by including PartialKMSVaultSecrets in other PartialKMSVaultSecrets). Rather, the way to include multiple partial secrets is to just list them all in the includeSecrets field of the KMSVaultSecret resource.

Because of their abstract nature, PartialKMSVaultSecrets don't have a path, Vault authenticating method, or KV settings, but they do support the KMS secret encryption context, which is passed down to the concrete KMSVaultSecret object.

Empty secrets

Although rarely an empty string is required as a secret, sometimes it is needed for backwards compatibility or as a placeholder. Since an empty string is not a valid KMS-encrypted string, the CRD includes a field that signals to the operator that an empty string should be put in the indicated path and field. To do this, simply set emptySecret: true to each individual item under secrets that you want to inject as a an empty string. Note that when you do this, the operator will ignore anything set in the encryptedSecret field, even if it's a valid KMS-encrypted string.

Validating webhook

The Docker image contains another binary (kms-vault-validating-webhook) that can be used as a server that a ValidatingWebhookConfiguration calls to validate either KMSVaultSecrets or PartialKMSVaultSecrets and prevent them from being picked up by the controller in the first place. Since this binary is separate from the main one, it would need to be deployed either as a sidecar or as a separate Deployment, as well as requiring its own Service. You can find an example of how to deploy it as a sidecar here.

Keep in mind that a ValidatingWebhookConfiguration requires a valid CA bundle to trust the webhook over TLS. While this can be any certificate generated offline, you can also use cert-manager to make it easy to generate certificates as Kubernetes Secrets and mount them on containers (like the webhook), or to inject the corresponding CA bundle in ValidatingWebhookConfigurations.

Auto-reloading certificate

The webhook performs a hot reload if the underlying TLS certificate (indicated by the -tls-cert-file flag) on disk is modified. This is helpful when using automatic certificate provisioners like cert-manager that will do automatic rotation of the certificates but can't control the lifecycle of the workloads using the certificate.

The way this is achieved is by initially loading the certificate and keeping it in a local cache, then using the radovskyb/watcher library to watch for changes on the file and updating the cached version if the file changes.

Monitoring

If your Kubernetes cluster is running the Prometheus operator, this operator will automatically create an additional Service called kms-vault-operator-metrics and a corresponding ServiceMonitor of the same name. This monitor will scrape the operator for metrics on two different ports. Port 8383 will post general metrics about the running process, while port 8686 will post metrics about the custom resources managed by the operator. More information can be found on the Operator SDK website.

Up until version v0.14.0, this operator was using a version of the operator-sdk that supported automatic creation a Service and ServiceMonitor objects to scrape Prometheus metrics, but that functionality has been removed. If you're running the Prometheus operator in your cluster and you want to scrape metrics for this operator, you're going to have to explicitly create them yourself, querying the /metrics endpoint on port :8080.

For security nerds

NOTE: Due to technical issues with the Notary client, starting on January 4th 2023 and until further notice new images will NOT be signed. The images will still be built for multi-architecture, and will include the Git and GPG metadata, but they won't pass Docker Content Trust validation if you have it enabled.

Docker images are signed and published to Docker Hub's Notary server

The Notary project is a CNCF incubating project that aims to provide trust and security to software distribution. Docker Hub runs a Notary server at https://notary.docker.io for the repositories it hosts.

Docker Content Trust is the mechanism used to verify digital signatures and enforce security by adding a validating layer.

You can inspect the signed tags for this project by doing docker trust inspect --pretty docker.io/patoarvizu/kms-vault-operator, or (if you already have notary installed) notary -d ~/.docker/trust/ -s https://notary.docker.io list docker.io/patoarvizu/kms-vault-operator.

If you run docker pull with DOCKER_CONTENT_TRUST=1, the Docker client will only pull images that come from registries that have a Notary server attached (like Docker Hub).

Docker images are labeled with Git and GPG metadata

In addition to the digital validation done by Docker on the image itself, you can do your own human validation by making sure the image's content matches the Git commit information (including tags if there are any) and that the GPG signature on the commit matches the key on the commit on github.com.

For example, if you run docker pull patoarvizu/kms-vault-operator:898c158e9c3c313984d9a0c7947f0f3178501cf2 to pull the image tagged with that commit id, then run docker inspect patoarvizu/kms-vault-operator:898c158e9c3c313984d9a0c7947f0f3178501cf2 | jq -r '.[0].ContainerConfig.Labels' (assuming you have jq installed) you should see that the GIT_COMMIT label matches the tag on the image. Furthermore, if you go to https://github.com/patoarvizu/kms-vault-operator/commit/898c158e9c3c313984d9a0c7947f0f3178501cf2 (notice the matching commit id), and click on the Verified button, you should be able to confirm that the GPG key ID used to match this commit matches the value of the SIGNATURE_KEY label, and that the key belongs to the AUTHOR_EMAIL label. When an image belongs to a commit that was tagged, it'll also include a GIT_TAG label, to further validate that the image matches the content.

Keep in mind that this isn't tamper-proof. A malicious actor with access to publish images can create one with malicious content but with values for the labels matching those of a valid commit id. However, when combined with Docker Content Trust, the certainty of using a legitimate image is increased because the chances of a bad actor having both the credentials for publishing images, as well as Notary signing credentials are significantly lower and even in that scenario, compromised signing keys can be revoked or rotated.

Here's the list of included Docker labels:

  • AUTHOR_EMAIL
  • COMMIT_TIMESTAMP
  • GIT_COMMIT
  • GIT_TAG
  • SIGNATURE_KEY

Multi-architecture images

Manifests published with the semver tag (e.g. patoarvizu/kms-vault-operator:v0.15.0), as well as latest are multi-architecture manifest lists. In addition to those, there are architecture-specific tags that correspond to an image manifest directly, tagged with the corresponding architecture as a suffix, e.g. v0.15.0-amd64. Both types (image manifests or manifest lists) are signed with Notary as described above.

Here's the list of architectures the images are being built for, and their corresponding suffixes for images:

  • linux/amd64, -amd64
  • linux/arm64, -arm64
  • linux/arm/v7, -arm7

Important notes by this project

Kubernetes namespaces and Vault namespaces

The KMSVaultSecret CRD is a namespaced resource, but please note that the Kubernetes namespace doesn't map to a Vault namespace. Support for Vault namespaces is outside of the scope of this project, since they're only available in Vault enterprise for now.

Additionally, the PartialKMSVaultSecret CRD is also namespaced, but the includeSecrets field on a KMSVaultSecret object will only discover partial secrets within the same namespace. Support for referencing secrets from another namespace will be added in a future version.

Multiple secrets writing to the same location

Also, the operator doesn't make any guarantees or checks about KMSVaultSecrets in different namespaces writing to the same Vault paths. The operator is designed to continuously write the secret, so if two or more resources are pointing to the same location, the operator will constantly overwrite them.

No validation on target path

Because the controller is designed to write the secret to Vault continuously, it doesn't perform any validation on what may exist on the configured path before writing to it. Be careful when deploying a KMSVaultSecret to make sure you don't overwrite your existing secrets.

Removing secrets when a KMSVaultSecret is deleted.

The kms-vault-operator controller supports removing secrets from Vault by setting delete.k8s.patoarvizu.dev as a Kubernetes finalizer. Support for this for K/V V1 is simple since secrets are not versioned, but when the secret is for K/V V2, deleting a KMSVaultSecret object will delete ALL of its versions and metadata from Vault, so handle it with care. If the secret is V2, the path for the DELETE operation is the same as the input one, replacing secret/data/ with secret/metadata/. There is currently no support for removing a single version of a K/V V2 secret.

Decryption or decoding errors are ignored (but logged)

If the validating webhook mentioned above is deployed, then the controller won't (in theory) need to deal with erroneous secrets since they're never committed to storage. However, if the webhook is not in place, and a secret is incorrectly encoded or encrypted (including if the encryption context doesn't match the secret), the operator will log the error, skip those secrets, and continue writing the rest. This applies to individual items in the secrets list, i.e. the controller will still apply other secrets within the same KMSVaultSecret even if one of them fails. The controller, however, will trigger an event of type Warning for each encryptedSecret that it wasn't able to decode or decrypt.

Support for K/V V2 is limited (as of this version)

The KMSVaultSecret CRD supports specifying kvSettings.engineVersion: v2 and a check-and-set index with kvSettings.casIndex but support for it is limited. For example, the operator doesn't doesn't enforce or validate that the path is V2-friendly, and no metadata operations are available.

Partial secrets don't validate keys

The operator doesn't do any validation in regards to keys included both in the KMSVaultSecret object and in any included PartialKMSVaultSecrets (or if they exist in multiple included PartialKMSVaultSecrets), and it also doesn't provide any guarantees regarding the order of precedence of secrets. Because of this, you should make sure that your secret aggregation avoids these overlaps. Support for validating this can be added in a future version.

Partial secrets don't support finalizers (yet)

Because PartialKMSVaultSecrets don't have their own controller (as of the latest version), it's not possible to handle finalizers on them (specifically the delete.k8s.patoarvizu.dev finalizer). That means that the presence of finalizers on those objects won't have the same effect as setting them on KMSVaultSecret objects. If a PartialKMSVaultSecret object is deleted, the direct effect is that any KMSVaultSecret that includes a deleted PartialKMSVaultSecret could see the included keys deleted from Vault! This would only happen if the Vault KV secret backend is v1, since to update v2 secrets you need to update the casIndex field. Supporting finalizers for partial secrets could be added to a future version.

Help wanted!

I (the author of this operator) am not a "real" golang developer and the code probably shows it. It's also my first venture into writing a Kubernetes operator. I'd welcome any Issues or PRs on this repo, including (and specially) support for additional Vault authentication methods.