-
Notifications
You must be signed in to change notification settings - Fork 466
add one prod ready example #166
base: master
Are you sure you want to change the base?
Conversation
@pgold30 Thanks for the PR! This is awesome work. However, a few gotchas:
If you want to simplify this example to something any open source user can use and make it testable, we can put it in this repo. Alternatively, we are planning to create an alternative repo later this quarter for Gruntwork customers where we can put such production-ready examples that include proprietary code, so another option is to wait until that's ready, and move your PR that to that repo. Let me know what you'd prefer! |
Hi @brikis98 thanks for your feedback! Will apply the corrections to make it open source here and will be happy to help when you make the specific Gruntwork prod-ready example |
@brikis98 Please let me know if the comments make sense or if i need to modify some stuff |
Apologies, am currently traveling. Will check this out as soon as I can! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies again for the long delay. This is a large PR that required a fair amount of thought, and I just was not able to carve out time to get to it until now. I finally went through and left a bunch of comments, especially around a lot of copy/paste.
A candid piece of feedback: I'm a bit worried about this example being the "prod-ready" one. One issue is there are no automated tests; all of the code in our repos must be tested, or we can't maintain it or feel confident it works as expected. But the even bigger issue is that this example code is still missing a number of items that you would need for going to prod, especially related to server-hardening, monitoring, and logging. Adding all of these items is out of scope for this repo, but without those items, I feel like we'd be misleading users by saying this is production-ready code. What do you think of, as an alternative, writing up a set of docs that go through a (relatively comprehensive) list of the considerations to keep in mind when going to prod, rather than partial example code? It could be a step-by-step guide that includes many of the items you have in this example (e.g., enabling gossip encryption, passing secrets securely, etc), plus any items that are missing.
* Enable ACL on consul and pass the token in a secure way using AWS Secrets manager | ||
* Enable encryption on Consul (based on https://github.com/hashicorp/terraform-aws-consul/tree/master/examples/example-with-encryption) | ||
* Auto generate gossip_encryption_key and store it in AWS Secrets manager | ||
* Only allow ssh from Vpn server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not everyone who uses Vault will have a VPN server. Perhaps this should be a more generic list of security group IDs to whitelist? You could put the security group ID of a VPN server in that list, or a bastion host, or whatever else your organization is using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
* Auto generate gossip_encryption_key and store it in AWS Secrets manager | ||
* Only allow ssh from Vpn server | ||
* Creates an ELB in front of Consul instances | ||
* Creates an ELB in front of Vault instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the ELBs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be any load balancer , to balance request and still have one record entry when having multiple instances?
* Creates an ELB in front of Vault instances | ||
* Removed the block for deploying cluster in default VPC and availability zones, this needs to be set explicitly to enforce security | ||
|
||
This example use Terragrunt to solve some dependencies but can by used with terraform also. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should keep these examples to Terraform-only. Anything that works with Terraform works with Terragrunt, but not necessarily vice versa.
# HOWTO | ||
* 1 - Run terragrunt | ||
* 2 - run `vault operator init` on one of the vault servers | ||
* 3 - To login in consul look for master token under aws secrets manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what this is saying. Could you try to clarify the language a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will change it to make it more clear. But basically it says that you need to ssh vault and run that command to get the token and that the consul token will be stored under aws secret manager
# Secret version updated with the random uuid | ||
resource "aws_secretsmanager_secret_version" "consul_token" { | ||
secret_id = aws_secretsmanager_secret.consul_token.id | ||
secret_string = random_uuid.consul_token.result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This results in the secret being stored in Terraform state, which is not great. Would be better to generate this secret out of band and to use an IAM role that gives your servers permissions to read the secret.
} | ||
|
||
# Random uuid used as gossip encryption key | ||
resource "random_string" "gossip_encryption_key" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too ends up stored in state. Again, would be better to add it to secrets manager out of band.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
# Policy to allow Consul to write the consul_token secret and gossip encryption key | ||
resource "aws_iam_policy" "secretsmanager_get_token" { | ||
name = var.consul_cluster_name | ||
policy = <<EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the aws_iam_policy
data source.
# DEPLOY THE CLUSTERS IN THE DEFAULT VPC AND AVAILABILITY ZONES | ||
# Using the default VPC and subnets makes this example easy to run and test, but it means Consul and Vault are | ||
# accessible from the public Internet. In a production deployment, we strongly recommend deploying into a custom VPC | ||
# and private subnets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the docs said you weren't using a default VPC for this example?
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 | ||
|
||
# These variables are passed in via Terraform template interpolation | ||
#/opt/consul/bin/run-consul --server --cluster-tag-key "${consul_cluster_tag_key}" --cluster-tag-value "${consul_cluster_tag_value}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Co-Authored-By: Yevgeniy Brikman <[email protected]>
Sorry just got the chance to read all your suggestions , next week i will make all the changes requested, regarding the first part we can change prod ready for something like vault consul auto unseal with encryption or something similar. |
Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement Learn more about why HashiCorp requires a CLA and what the CLA includes Pablo Loschi seems not to be a GitHub user. Have you signed the CLA already but the status is still pending? Recheck it. |
Vault auto unseal cluster private with kms and acl for consul with encryption
Tested with latest version of Vault 1.2.3 and consul 1.6.1
This example combines the following examples: Vaul auto unseal - Vault private cluster - Consul with encryption and also adds some stuff that we expect in a prod ready working state, mainly:
This example use Terragrunt to solve some dependencies but can by used with terraform also.
Thanks @mkrull for the ACL token provision