Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

add one prod ready example #166

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

pgold30
Copy link
Contributor

@pgold30 pgold30 commented Oct 11, 2019

Vault auto unseal cluster private with kms and acl for consul with encryption

Tested with latest version of Vault 1.2.3 and consul 1.6.1

This example combines the following examples: Vaul auto unseal - Vault private cluster - Consul with encryption and also adds some stuff that we expect in a prod ready working state, mainly:

  • Secure instances using security groups and remove [0.0.0.0/0]
  • Enable ACL on consul and pass the token in a secure way using AWS Secrets manager
  • Enable encryption on Consul (based on https://github.com/hashicorp/terraform-aws-consul/tree/master/examples/example-with-encryption)
  • Auto generate gossip_encryption_key and store it in AWS Secrets manager
  • Only allow ssh from Vpn server
  • Creates an ELB in front of Consul instances
  • Creates an ELB in front of Vault instances
  • Removed the block for deploying cluster in default VPC and availability zones, this needs to be set explicitly to enforce security

This example use Terragrunt to solve some dependencies but can by used with terraform also.

Thanks @mkrull for the ACL token provision

@brikis98
Copy link
Collaborator

brikis98 commented Oct 15, 2019

@pgold30 Thanks for the PR! This is awesome work. However, a few gotchas:

  1. This code includes references to a bunch of proprietary Gruntwork code. That means the examples won't work for many open source users, leading to lots of 404's, confusion, etc. We should only use open source code in these open source repos.
  2. It looks like this code is meant to be used in the Gruntwork Reference Architecture: e.g., it pulls in VPC data via terraform_remote_state. While very handy, it again won't work for many open source users who aren't using the Reference Architecture, Terragrunt, etc.
  3. Each example in the /examples folder needs a corresponding test in the /test folder. Note that creating a KMS master key for each test run will be a bit too expensive ($1 per run!), so we'll need not only new test code, but also some tweaks to the module to make it more testable.
  4. Side note: there's something odd/ironic about using AWS Secrets Manager to bootstrap Vault 😬

If you want to simplify this example to something any open source user can use and make it testable, we can put it in this repo. Alternatively, we are planning to create an alternative repo later this quarter for Gruntwork customers where we can put such production-ready examples that include proprietary code, so another option is to wait until that's ready, and move your PR that to that repo. Let me know what you'd prefer!

@pgold30
Copy link
Contributor Author

pgold30 commented Oct 15, 2019

Hi @brikis98 thanks for your feedback!
1- Indeed we use gruntwork , terragrunt and other stuff, didn't wanted to remove this because i assume that will be nice for people using the same stack to have a prod ready example, will remove this stuff then in a new commit.
2- Will remove those refs
3- I dont know how many test we run but having a prod ready example seems appropiate , the issue here seems to be in the test code more than in this approach , since the auto-unseal example makes use of this also.
4- Indeed, but wanted to find a solution for bootstraping vault, and use acl and token in consul whitout hardcoding so this looks like a nice and simple solution to do this, let me know if you think there is a better way.

Will apply the corrections to make it open source here and will be happy to help when you make the specific Gruntwork prod-ready example

@pgold30
Copy link
Contributor Author

pgold30 commented Nov 7, 2019

@brikis98 Please let me know if the comments make sense or if i need to modify some stuff

@brikis98
Copy link
Collaborator

Apologies, am currently traveling. Will check this out as soon as I can!

Copy link
Collaborator

@brikis98 brikis98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies again for the long delay. This is a large PR that required a fair amount of thought, and I just was not able to carve out time to get to it until now. I finally went through and left a bunch of comments, especially around a lot of copy/paste.

A candid piece of feedback: I'm a bit worried about this example being the "prod-ready" one. One issue is there are no automated tests; all of the code in our repos must be tested, or we can't maintain it or feel confident it works as expected. But the even bigger issue is that this example code is still missing a number of items that you would need for going to prod, especially related to server-hardening, monitoring, and logging. Adding all of these items is out of scope for this repo, but without those items, I feel like we'd be misleading users by saying this is production-ready code. What do you think of, as an alternative, writing up a set of docs that go through a (relatively comprehensive) list of the considerations to keep in mind when going to prod, rather than partial example code? It could be a step-by-step guide that includes many of the items you have in this example (e.g., enabling gossip encryption, passing secrets securely, etc), plus any items that are missing.

examples/vault-prod-ready/README.md Outdated Show resolved Hide resolved
* Enable ACL on consul and pass the token in a secure way using AWS Secrets manager
* Enable encryption on Consul (based on https://github.com/hashicorp/terraform-aws-consul/tree/master/examples/example-with-encryption)
* Auto generate gossip_encryption_key and store it in AWS Secrets manager
* Only allow ssh from Vpn server
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not everyone who uses Vault will have a VPN server. Perhaps this should be a more generic list of security group IDs to whitelist? You could put the security group ID of a VPN server in that list, or a bastion host, or whatever else your organization is using.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

* Auto generate gossip_encryption_key and store it in AWS Secrets manager
* Only allow ssh from Vpn server
* Creates an ELB in front of Consul instances
* Creates an ELB in front of Vault instances
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the ELBs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be any load balancer , to balance request and still have one record entry when having multiple instances?

* Creates an ELB in front of Vault instances
* Removed the block for deploying cluster in default VPC and availability zones, this needs to be set explicitly to enforce security

This example use Terragrunt to solve some dependencies but can by used with terraform also.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should keep these examples to Terraform-only. Anything that works with Terraform works with Terragrunt, but not necessarily vice versa.

# HOWTO
* 1 - Run terragrunt
* 2 - run `vault operator init` on one of the vault servers
* 3 - To login in consul look for master token under aws secrets manager
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what this is saying. Could you try to clarify the language a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change it to make it more clear. But basically it says that you need to ssh vault and run that command to get the token and that the consul token will be stored under aws secret manager

# Secret version updated with the random uuid
resource "aws_secretsmanager_secret_version" "consul_token" {
secret_id = aws_secretsmanager_secret.consul_token.id
secret_string = random_uuid.consul_token.result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in the secret being stored in Terraform state, which is not great. Would be better to generate this secret out of band and to use an IAM role that gives your servers permissions to read the secret.

}

# Random uuid used as gossip encryption key
resource "random_string" "gossip_encryption_key" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too ends up stored in state. Again, would be better to add it to secrets manager out of band.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

# Policy to allow Consul to write the consul_token secret and gossip encryption key
resource "aws_iam_policy" "secretsmanager_get_token" {
name = var.consul_cluster_name
policy = <<EOF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the aws_iam_policy data source.

# DEPLOY THE CLUSTERS IN THE DEFAULT VPC AND AVAILABILITY ZONES
# Using the default VPC and subnets makes this example easy to run and test, but it means Consul and Vault are
# accessible from the public Internet. In a production deployment, we strongly recommend deploying into a custom VPC
# and private subnets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the docs said you weren't using a default VPC for this example?

exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1

# These variables are passed in via Terraform template interpolation
#/opt/consul/bin/run-consul --server --cluster-tag-key "${consul_cluster_tag_key}" --cluster-tag-value "${consul_cluster_tag_value}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@pgold30
Copy link
Contributor Author

pgold30 commented Jan 10, 2020

Sorry just got the chance to read all your suggestions , next week i will make all the changes requested, regarding the first part we can change prod ready for something like vault consul auto unseal with encryption or something similar.
I can also write a guide on GW on how to deploy this in a prod way if you think thats the better approach since if not will need to remove some modules from here, can do both.
Thanks for your time regarding this PR , i will make it cleaner and address the changes next week.

@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes


Pablo Loschi seems not to be a GitHub user.
You need a GitHub account to be able to sign the CLA. If you already have a GitHub account, please add the email address used for this commit to your account.

Have you signed the CLA already but the status is still pending? Recheck it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants