Skip to content

Commit

Permalink
AWS Managed Prometheus (AMP) (#368)
Browse files Browse the repository at this point in the history
  • Loading branch information
sekka1 authored Aug 16, 2022
1 parent 862e28d commit ac99d0b
Show file tree
Hide file tree
Showing 6 changed files with 360 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
variable helm_version {
type = string
default = "34.1.1"
default = "39.6.0"
description = "Helm chart version"
}

Expand Down
199 changes: 199 additions & 0 deletions terraform-modules/aws/prometheus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
# AWS Managed Prometheus (AMP)

AMP Docs: https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html

This module is geared towards creating a centralized Prometheus collector.

```
┌───────────────────────────────────────────────────────────────┐
│ │
│ │
│ │
│ │
┌──────────────────────────────┐ │ ┌──────────────────────────────────────────────┐ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
│ ┌─────────────────┐ │ │ │ ┌─────────────────────────────┐ │ │
│ │ AWS │ │ │ │ │ │ │ │
│ │ Managed │ │ (3) remote │ │ │ kube-prometheus-stack │ │ │
│ │ Prometheus │◄─────┼───────────────┼─────────┼────────┤ │ │ │
│ │ │ │ write │ │ │ │ │ │
│ └─────────────────┘ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
│ ┌──────────┐ │ │ │ │ ┌──────────┐ │ │ │
│ │ AWS │ │ (1) trust │ │ │ │ service │ │ │ │
│ │ identity ├────────────┼───────────────┼────────►│ │ │ account │ │ │ │
│ │ provider │ │ relationship│ │ │ └─────▲────┘ │ │ │
│ └──────────┘ │ to EKS │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │
│ ┌───────────┐ │ │ │ └────────────┼────────────────┘ │ │
│ │ IAM │ │ (2) Service │ │ │ │ │
│ │ role ├───────────┼───────────────┼─────────┼─────────────────────┘ │ │
│ │Federation │ │ account │ │ │ │
│ └───────────┘ │ assume │ │ │ │
│ │ role │ │ │ │
│ │ │ │ EKS Cluster │ │
│ VPC │ │ │ │ │
│ AWS Account #1 │ │ └──────────────────────────────────────────────┘ │
└──────────────────────────────┘ │ │
│ │
│ │
│ VPC │
│ AWS Account #2 │
└───────────────────────────────────────────────────────────────┘
```

The main use case for this is that when we deploy out kubernetes-ops we will be creating at least 2 or 3 environments. One
for dev, qa, sandbox/uat/etc, and prod (maybe more depending on how many levels we want to go through). The desire is to
not have to go to each environment's Prometheus to get metrics or to and/or to offload the metrics to another environment
for redundancy/backup/compliance reasons. There could also be more than one setup for this, one for pre-production environments and one for production environments, depending on what the requirements are for separating the data.

# How
AWS has a blog that does essentially the exact thing we want but for an S3 bucket:

https://aws.amazon.com/blogs/containers/cross-account-iam-roles-for-kubernetes-service-accounts/

We are going to take this process and narrowly alter it for our needs.

This module is creating an AWS Identity Provider in the account where AMP is created in and it grants access
to one or more external EKS cluster that are not in the same AWS account (however, the EKS cluster can be
in the same AWS account). The IAM role created uses this Identity Provider for authentication of the remote
EKS pod trying to assume this role and it is scoped down to the EKS Kubernetes service account and Kubernetes
namespace.

From the above picture:
(1) An AWS Identity Provider is created in the AWS account #1 that has the identity provider (IdP) of the AWS
account #2's EKS cluster (each EKS cluster creates an IdP). This means that aws account #1 knows about the
AWS account #2's EKS cluster at this point. It knows it's endpoint but we havent applied any rules to what
to trust about it yet. We will now be able to reference this Identity Providers in the upcoming IAM roles
and when AWS is trying to validate an incoming authentication request it will go to this EKS cluster's IdP for
verification of the identity.

(2) There are two things happening here.

The first one is that in the AWS account #1 an IAM role is created with certain attributes:
* The Trust Relationship json points to the AWS Identity provider we created in the previous step. This is
saying that when something is trying to assume this IAM role, this is the how we will validate it.
* The Trust Relationship json validates the attributes in the authentication JWT sent to us. These items
are cryptographically signed by the source EKS cluster so it is very difficult to spoof this if you don't
have control of the source EKS cluster. We are limiting the access to that particular EKS cluster, that
it is a kubernetes service accout, the kubernetes namespace that it comes from, and the name of the
kubernetes service account. This scopes it down tightly to a very limited set on what the source identity can be.
* The permission which is the IAM policy attached to this role grants permission for assuming roles and
permission to write to the AWS managed Prometheus. It also scopes the resource to the ARN of this AMP
that this module creates.

The second thing is on the AWS account #2 side, a service account is created and the EKS IRSA assume role
annotations are placed onto it. The role ARN is the ARN from the AWS account #1 side. The service account
will have all of the previous attributes we specified and the Prometheus using this service account identity
will assume the IAM role that was created in AWS account #1. It will send the it's identity and the cryptographically
signed items of it's own identity (which we have meticulously crafted to match in the previous section) to AWS STS.
Then AWS STS will proceed to validate the identity to ensure that the identity sent in is who they say they are.
It will go to the AWS account #2's EKS cluster's IdP to validate per the AWS account #1's Identity provider's
information. This is important because at this point AWS account #1 can NOT trust anything that the AWS account
#2 assume role information that is given. It could be just any random auth request from the internet. With the
information that was previously configured into the AWS account #1's about this identity provider it uses this
information to go forth and verifying the information. Once that information is verified successfully it will return to
the AWS account #2's prometheus a set of temporary AWS credential it can use to gain access to the AWS account #1's
AMP per the IAM policy attached to this role.

TODO:
* A ladder diagram of this flow might be easier to express and understand?

# Troubleshooting
Getting this to work correctly can sometimes be challenging. A few things has to align specifically or the
assuming the role will not work and sometimes it is hard to get information on which part is not working. The
following tries to help you in getting a setup where you can test out trying to assume the role from a remote
EKS cluster's pod so that you can test out what exact strings you have to pass into the module to make this work
correctly.

The AWS blog above in the section `Verify the procedure works` has a good walk through (and where this was taken
from) on how to test it out.

Start an ubuntu pod in the namespace where you are adding permissions for Prometheus in.

Attach the Kubernetes service account to this ubuntu pod

Exec into the ubuntu pod and install the AWS CLI

Run:
```
aws sts assume-role-with-web-identity --role-arn $AWS_ROLE_ARN --role-session-name x-account --web-identity-token file://$AWS_WEB_IDENTITY_TOKEN_FILE --duration 1500
```

The `AWS_*` vars are autopopulated into your pod when you add in the service account and if you are in AWS.

If the command is successful, you should get a return response such as:
```
{
"Credentials": {
"AccessKeyId": "ASIATKVYE24LRQWGORWC",
"SecretAccessKey": "bMkN8TCD7thnQPbJzOpbRUIbCJMuUV.....",
"SessionToken": "IQoJb3JpZ2luX2VjEF8aCXVzLXdlc3QtMiJHMEUCIQDYGv0dkM4Rk9gePcs0wdSJcdLSmg9F+mG5Qt8iWtfGygIgBInTt0sYANyCWThIMgYy07wxkkSVHSc5hk0Xf9kyYZ4q+gQIyP//////////ARABGgwyMjkwOTMyMDE2ODciDAAHYiOCftTyXRSU7irOBOIeuZMMsIcpPd36icCkeDAYC2qw3HMxNmbT/21slnIEjLzuk5FYBV/SEDXck5ZwcAYxXKFY2iag2Rrr/bcsAblOsP2HH.........",
"Expiration": "2022-08-15T23:31:53+00:00"
},
"SubjectFromWebIdentityToken": "system:serviceaccount:default:s3-shared-content",
"AssumedRoleUser": {
"AssumedRoleId": "AROATKVYE24L2HFUPPRWQ:x-account",
"Arn": "arn:aws:sts::1111111111111:assumed-role/aws-managed-prometheus-ops/x-account"
},
"Provider": "arn:aws:iam::1111111111111:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/B4EA44BE30ABD91AC23C475F32379593",
"Audience": "sts.amazonaws.com"
}
```

If not, you will receive an error.

Thing to try:
* Look at the AWS Role that was created and look at the Trust Relationship
* The Statement.Principal.Federated value should be the arn of the Identity Provider that was created in this module
* The Statement.Condition.StringEqual
* `oidc.eks.us-west-2.amazonaws.com/id/<Remote EKS ID>:sub` - the `<Remote EKS ID>` should be the ID of the remote EKS cluster
* should be in this format. `system:serviceaccount:<k8s namespace>:<k8s service account name>`. It will not work if it is not.

## Requirements

No requirements.

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | n/a |
| <a name="provider_template"></a> [template](#provider\_template) | n/a |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_iam_assumable_role_admin"></a> [iam\_assumable\_role\_admin](#module\_iam\_assumable\_role\_admin) | terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc | 3.6.0 |

## Resources

| Name | Type |
|------|------|
| [aws_iam_openid_connect_provider.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_openid_connect_provider) | resource |
| [aws_iam_policy.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource |
| [aws_prometheus_workspace.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_workspace) | resource |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_region.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/region) | data source |
| [template_file.iam_policy](https://registry.terraform.io/providers/hashicorp/template/latest/docs/data-sources/file) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_account_id"></a> [account\_id](#input\_account\_id) | The account\_id of your AWS Account. This allows sure the use of the account number in the role to mitigate issue of aws\_caller\_id showing *** by obtaining the value of account\_id | `string` | `null` | no |
| <a name="input_iam_access_grant_list"></a> [iam\_access\_grant\_list](#input\_iam\_access\_grant\_list) | The list of IAM roles for granting various EKS cluster(s) permissions to perform a remote write to this AMP instance. | `list` | <pre>[<br> {<br> "description": "To grant access to the remote EKS cluster",<br> "eks_cluster_oidc_issuer_url": "https://xxxxxxxxxxxxxxxxxxxxx.sk1.us-east-1.eks.amazonaws.com",<br> "environment_name": "dev",<br> "instance_name": "dev cluster",<br> "kube_service_account_name": "kube-prometheus-stack-prometheus",<br> "namespace": "monitoring",<br> "oidc_provider_client_id_list": [<br> "sts.amazonaws.com"<br> ],<br> "oidc_provider_thumbprint_list": [<br> "9e99a48a9960b14926bb7f3b02e22da2b0ab7280"<br> ],<br> "oidc_provider_url": "https://oidc.eks.us-east-1.amazonaws.com/id/B4EA44BE30ABD91AC23C475F3237111"<br> }<br>]</pre> | no |
| <a name="input_instance_name"></a> [instance\_name](#input\_instance\_name) | An instance name to attach to some resources. Optional only needed if you are going to create more than one of these items in an AWS account | `string` | `"env"` | no |
| <a name="input_tags"></a> [tags](#input\_tags) | AWS tags | `map(any)` | n/a | yes |
| <a name="input_workspace_alias"></a> [workspace\_alias](#input\_workspace\_alias) | n/a | `string` | `"prometheus-test"` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_arn"></a> [arn](#output\_arn) | ARN |
| <a name="output_id"></a> [id](#output\_id) | Identifier of the workspace |
| <a name="output_prometheus_endpoint"></a> [prometheus\_endpoint](#output\_prometheus\_endpoint) | Prometheus endpoint available for this workspace. |
17 changes: 17 additions & 0 deletions terraform-modules/aws/prometheus/iam-policy.tpl.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"Version": "2012-10-17",
"Statement": [
{"Effect": "Allow",
"Action": [
"aps:RemoteWrite",
"aps:GetSeries",
"aps:GetLabels",
"aps:GetMetricMetadata",
"sts:AssumeRole"
],
"Resource": [
"arn:aws:aps:${awsRegion}:${awsAccountID}:workspace/${workspace_id}"
]
}
]
}
Loading

0 comments on commit ac99d0b

Please sign in to comment.