This folder contains the variant to use when deploying in AWS using an EKS cluster.
This module can be declared by adding the following block on your Terraform configuration:
module "thanos" {
source = "git::https://github.com/camptocamp/devops-stack-module-thanos.git//eks?ref=<RELEASE>"
cluster_name = module.eks.cluster_name
base_domain = module.eks.base_domain
cluster_issuer = local.cluster_issuer
argocd_namespace = module.argocd_bootstrap.argocd_namespace
metrics_storage = {
bucket_id = resource.aws_s3_bucket.thanos_metrics_storage.id
create_role = true
cluster_oidc_issuer_url = module.eks.cluster_oidc_issuer_url
}
thanos = {
oidc = module.oidc.oidc
}
dependency_ids = {
argocd = module.argocd_bootstrap.id
traefik = module.traefik.id
cert-manager = module.cert-manager.id
oidc = module.oidc.id
}
}
As you can see, a minimum requirement for this module is an S3 bucket and an OIDC provider (more information below).
IMPORTANT
You are in charge of creating a S3 bucket for Thanos to store the archived metrics. We’ve decided to keep the creation of this bucket outside of this module, mainly because the persistence of the data should not be related to the instantiation of the module itself.
However, the IAM role used to give permissions to the Thanos components to access the bucket can be created by the module itself. If you want to create the role, you can set the attribute create_role
to true
and the module will create the role for you. If you already have a role created, you can pass the ARN of the role to the module using the attribute iam_role_arn
.
Tip
|
Check the EKS deployment example to see how to create the S3 bucket and to better understand the values passed on the example above. |
Note
|
Do not forget that the bucket configuration also needs to be passed to the module kube-prometheus-stack .
|
Although the declaration above allows you to have a barebones Thanos deployed, it is highly recommended that you customize a few settings for a production-ready deployment. You need to at least configure the resource requirements for a few of the Thanos' components and the size of the persistent volume used by the compactor. You can also configure the compactor retention times, as in the example below.
module "thanos" {
source = "git::https://github.com/camptocamp/devops-stack-module-thanos.git//eks?ref=<RELEASE>"
cluster_name = module.eks.cluster_name
base_domain = module.eks.base_domain
cluster_issuer = local.cluster_issuer
argocd_namespace = module.argocd_bootstrap.argocd_namespace
metrics_storage = {
bucket_id = resource.aws_s3_bucket.thanos_metrics_storage.id
create_role = true
cluster_oidc_issuer_url = module.eks.cluster_oidc_issuer_url
}
thanos = {
# OIDC configuration
oidc = module.oidc.oidc
# Configuration of the persistent volume for the compactor
compactor_persistent_size = "100Gi"
# Resources configuration for the pods
compactor_resources = {
limits = {
memory = "1Gi"
}
requests = {
cpu = "0.5"
memory = "512Mi"
}
}
storegateway_resources = {
limits = {
memory = "1Gi"
}
requests = {
cpu = "0.5"
memory = "1Gi"
}
}
query_resources = {
limits = {
memory = "1Gi"
}
requests = {
cpu = "0.5"
memory = "512Mi"
}
}
# Retention settings for the compactor
compactor_retention = {
raw = "60d"
five_min = "120d"
one_hour = "240d"
}
}
dependency_ids = {
argocd = module.argocd_bootstrap.id
traefik = module.traefik.id
cert-manager = module.cert-manager.id
oidc = module.oidc.id
}
}
As you can see on the examples above, the variable thanos
provides an interface to customize the most frequently used settings. This variable is merged with the local value thanos_defaults
, which contains some sensible defaults to have a barebones working deployment. You can check the default values on the local.tf
file.
If there is a need to configure something besides the common settings that we have provided above, you can customize the chart’s values.yaml
by adding an Helm configuration as an HCL structure:
module "thanos" {
source = "git::https://github.com/camptocamp/devops-stack-module-thanos.git//eks?ref=<RELEASE>"
cluster_name = module.eks.cluster_name
base_domain = module.eks.base_domain
cluster_issuer = local.cluster_issuer
argocd_namespace = module.argocd_bootstrap.argocd_namespace
metrics_storage = {
bucket_id = resource.aws_s3_bucket.thanos_metrics_storage.id
create_role = true
cluster_oidc_issuer_url = module.eks.cluster_oidc_issuer_url
}
thanos = {
oidc = module.oidc.oidc
}
helm_values = [{ # Note the curly brackets here
thanos = {
map = {
string = "string"
bool = true
}
sequence = [
{
key1 = "value1"
key2 = "value2"
},
{
key1 = "value1"
key2 = "value2"
},
]
sequence2 = [
"string1",
"string2"
]
}
}]
dependency_ids = {
argocd = module.argocd_bootstrap.id
traefik = module.traefik.id
cert-manager = module.cert-manager.id
oidc = module.oidc.id
}
}
Thanos needs an S3 bucket to store the archived metrics. The bucket can be created and its ID should be passed to the module, along with the attribute create_role
explicitly set. Set it to true if you want the module to create the required IAM role.
However, if you want to create and manage this IAM role yourself, you can simply pass the ARN of the role to the module using the attribute iam_role_arn
while setting the attribute create_role
to false
.
Tip
|
The code in this example should help you create the IAM policy and role with the required permissions. |
Note
|
This module was developed with OIDC in mind. |
There is an OIDC proxy container deployed as a sidecar on each pod that has a web interface. Consequently, the thanos
variable is expected to have a map oidc
containing at least the Issuer URL, the Client ID, and the Client Secret.
You can pass these values by pointing an output from another module (as above), or by defining them explicitly:
module "thanos" {
...
thanos = {
oidc = {
issuer_url = "<URL>"
client_id = "<ID>"
client_secret = "<SECRET>"
}
}
...
}
Since the resource requirements are not the same on every deployment and because the consumed resources also influence the cost associated, we refrained from configuring default resource requirements for the components of Thanos. We did, however, set memory limits for some of the pods (query
, storegateway
and compactor
all have a 1 GB memory limit). We recommend that you customize these values as you see fit.
Important
|
At the very least you should configure the size for the Persistent Volume used by the compactor. |
This value MUST be configured otherwise the compactor will NOT work on a production deployment. The Thanos documentation recommends a size of 100-300 GB.
Obviously, the module depends on an already running Argo CD in the cluster in order for the application to be created.
This module has multiple ingresses and consequently it must be deployed after the module traefik
and cert-manager
.
The following requirements are needed by this module:
The following Modules are called:
Source: terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc
Version: ~> 5.0
The following resources are used by this module:
-
aws_iam_policy.thanos (resource)
-
aws_iam_policy_document.thanos (data source)
-
aws_s3_bucket.thanos (data source)
The following input variables are required:
Description: AWS S3 bucket configuration values for the bucket where the archived metrics will be stored.
An IAM role is required to give the Thanos components read and write access to the S3 bucket. You can create this role yourself or let the module create it for you. If you want the module to create the role, you need to provide the OIDC issuer’s URL for the EKS cluster. If you create the role yourself, you need to provide the ARN of the IAM role you created.
Type:
object({
bucket_id = string
create_role = bool
iam_role_arn = optional(string, null)
cluster_oidc_issuer_url = optional(string, null)
})
Description: Name given to the cluster. Value used for the ingress' URL of the application.
Type: string
Description: Base domain of the cluster. Value used for the ingress' URL of the application.
Type: string
The following input variables are optional (have default values):
Description: Subdomain of the cluster. Value used for the ingress' URL of the application.
Type: string
Default: "apps"
Description: Name of the Argo CD AppProject where the Application should be created. If not set, the Application will be created in a new AppProject only for this Application.
Type: string
Default: null
Description: Labels to attach to the Argo CD Application resource.
Type: map(string)
Default: {}
Description: Destination cluster where the application should be deployed.
Type: string
Default: "in-cluster"
Description: Override of target revision of the application chart.
Type: string
Default: "v5.0.0"
Description: SSL certificate issuer to use. Usually you would configure this value as letsencrypt-staging
or letsencrypt-prod
on your root *.tf
files.
Type: string
Default: "selfsigned-issuer"
Description: Helm chart value overrides. They should be passed as a list of HCL structures.
Type: any
Default: []
Description: A boolean flag to enable/disable appending lists instead of overwriting them.
Type: bool
Default: false
Description: Automated sync options for the Argo CD Application resource.
Type:
object({
allow_empty = optional(bool)
prune = optional(bool)
self_heal = optional(bool)
})
Default:
{
"allow_empty": false,
"prune": true,
"self_heal": true
}
Description: IDs of the other modules on which this module depends on.
Type: map(string)
Default: {}
Description: Most frequently used Thanos settings. This variable is merged with the local value thanos_defaults
, which contains some sensible defaults. You can check the default values on the local.tf
file. If there still is anything other that needs to be customized, you can always pass on configuration values using the variable helm_values
.
Type: any
Default: {}
Description: Resource limits and requests for Thanos' components. Follow the style on official documentation to understand the format of the values.
Important
|
These are not production values. You should always adjust them to your needs. |
Type:
object({
query = optional(object({
requests = optional(object({
cpu = optional(string, "250m")
memory = optional(string, "512Mi")
}), {})
limits = optional(object({
cpu = optional(string)
memory = optional(string, "512Mi")
}), {})
}), {})
query_frontend = optional(object({
requests = optional(object({
cpu = optional(string, "250m")
memory = optional(string, "256Mi")
}), {})
limits = optional(object({
cpu = optional(string)
memory = optional(string, "512Mi")
}), {})
}), {})
bucketweb = optional(object({
requests = optional(object({
cpu = optional(string, "50m")
memory = optional(string, "128Mi")
}), {})
limits = optional(object({
cpu = optional(string)
memory = optional(string, "128Mi")
}), {})
}), {})
compactor = optional(object({
requests = optional(object({
cpu = optional(string, "250m")
memory = optional(string, "256Mi")
}), {})
limits = optional(object({
cpu = optional(string)
memory = optional(string, "512Mi")
}), {})
}), {})
storegateway = optional(object({
requests = optional(object({
cpu = optional(string, "250m")
memory = optional(string, "512Mi")
}), {})
limits = optional(object({
cpu = optional(string)
memory = optional(string, "512Mi")
}), {})
}), {})
redis = optional(object({
requests = optional(object({
cpu = optional(string, "200m")
memory = optional(string, "256Mi")
}), {})
limits = optional(object({
cpu = optional(string)
memory = optional(string, "512Mi")
}), {})
}), {})
})
Default: {}
Description: Boolean to enable the deployment of a service monitor for Prometheus. This also enables the deployment of default Prometheus rules and Grafana dashboards, which are embedded inside the chart templates and are taken from the official Thanos examples, available here.
Type: bool
Default: false
The following outputs are exported:
Description: ID to pass other modules in order to refer to this module as a dependency. It takes the ID that comes from the main module and passes it along to the code that called this variant in the first place.
Show tables
= Requirements
Name | Version |
---|---|
>= 5 |
|
>= 3 |
|
>= 3 |
|
>= 1 |
= Providers
Name | Version |
---|---|
n/a |
= Modules
Name | Source | Version |
---|---|---|
terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc |
~> 5.0 |
|
= Resources
Name | Type |
---|---|
resource |
|
data source |
|
data source |
= Inputs
Name | Description | Type | Default | Required | ||
---|---|---|---|---|---|---|
AWS S3 bucket configuration values for the bucket where the archived metrics will be stored. An IAM role is required to give the Thanos components read and write access to the S3 bucket. You can create this role yourself or let the module create it for you. If you want the module to create the role, you need to provide the OIDC issuer’s URL for the EKS cluster. If you create the role yourself, you need to provide the ARN of the IAM role you created. |
|
n/a |
yes |
|||
Name given to the cluster. Value used for the ingress' URL of the application. |
|
n/a |
yes |
|||
Base domain of the cluster. Value used for the ingress' URL of the application. |
|
n/a |
yes |
|||
Subdomain of the cluster. Value used for the ingress' URL of the application. |
|
|
no |
|||
Name of the Argo CD AppProject where the Application should be created. If not set, the Application will be created in a new AppProject only for this Application. |
|
|
no |
|||
Labels to attach to the Argo CD Application resource. |
|
|
no |
|||
Destination cluster where the application should be deployed. |
|
|
no |
|||
Override of target revision of the application chart. |
|
|
no |
|||
SSL certificate issuer to use. Usually you would configure this value as |
|
|
no |
|||
Helm chart value overrides. They should be passed as a list of HCL structures. |
|
|
no |
|||
A boolean flag to enable/disable appending lists instead of overwriting them. |
|
|
no |
|||
Automated sync options for the Argo CD Application resource. |
|
|
no |
|||
IDs of the other modules on which this module depends on. |
|
|
no |
|||
Most frequently used Thanos settings. This variable is merged with the local value |
|
|
no |
|||
Resource limits and requests for Thanos' components. Follow the style on official documentation to understand the format of the values.
|
|
|
no |
|||
Boolean to enable the deployment of a service monitor for Prometheus. This also enables the deployment of default Prometheus rules and Grafana dashboards, which are embedded inside the chart templates and are taken from the official Thanos examples, available here. |
|
|
no |
= Outputs
Name | Description |
---|---|
ID to pass other modules in order to refer to this module as a dependency. It takes the ID that comes from the main module and passes it along to the code that called this variant in the first place. |