Skip to content

Commit

Permalink
feat: Added SQS queue creation with events for Karpenter (aws-ia#1458)
Browse files Browse the repository at this point in the history
  • Loading branch information
vara-bonthu authored and Gumar Minibaev committed Mar 17, 2023
1 parent f4e1c43 commit 6ae9423
Show file tree
Hide file tree
Showing 10 changed files with 187 additions and 32 deletions.
1 change: 0 additions & 1 deletion examples/karpenter/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,6 @@ module "eks_blueprints_kubernetes_addons" {
}
karpenter_node_iam_instance_profile = module.karpenter.instance_profile_name
karpenter_enable_spot_termination_handling = true
karpenter_sqs_queue_arn = module.karpenter.queue_arn

tags = local.tags
}
Expand Down
4 changes: 3 additions & 1 deletion modules/kubernetes-addons/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,6 @@
| <a name="input_karpenter_helm_config"></a> [karpenter\_helm\_config](#input\_karpenter\_helm\_config) | Karpenter autoscaler add-on config | `any` | `{}` | no |
| <a name="input_karpenter_irsa_policies"></a> [karpenter\_irsa\_policies](#input\_karpenter\_irsa\_policies) | Additional IAM policies for a IAM role for service accounts | `list(string)` | `[]` | no |
| <a name="input_karpenter_node_iam_instance_profile"></a> [karpenter\_node\_iam\_instance\_profile](#input\_karpenter\_node\_iam\_instance\_profile) | Karpenter Node IAM Instance profile id | `string` | `""` | no |
| <a name="input_karpenter_sqs_queue_arn"></a> [karpenter\_sqs\_queue\_arn](#input\_karpenter\_sqs\_queue\_arn) | (Optional) ARN of SQS used by Karpenter when native node termination handling is enabled | `string` | `""` | no |
| <a name="input_keda_helm_config"></a> [keda\_helm\_config](#input\_keda\_helm\_config) | KEDA Event-based autoscaler add-on config | `any` | `{}` | no |
| <a name="input_keda_irsa_policies"></a> [keda\_irsa\_policies](#input\_keda\_irsa\_policies) | Additional IAM policies for a IAM role for service accounts | `list(string)` | `[]` | no |
| <a name="input_kube_prometheus_stack_helm_config"></a> [kube\_prometheus\_stack\_helm\_config](#input\_kube\_prometheus\_stack\_helm\_config) | Community kube-prometheus-stack Helm Chart config | `any` | `{}` | no |
Expand Down Expand Up @@ -314,6 +313,9 @@
| <a name="input_spark_history_server_irsa_policies"></a> [spark\_history\_server\_irsa\_policies](#input\_spark\_history\_server\_irsa\_policies) | Additional IAM policies for a IAM role for service accounts | `list(string)` | `[]` | no |
| <a name="input_spark_history_server_s3a_path"></a> [spark\_history\_server\_s3a\_path](#input\_spark\_history\_server\_s3a\_path) | s3a path with prefix for Spark history server e.g., s3a://<bucket\_name>/<spark\_event\_logs> | `string` | `""` | no |
| <a name="input_spark_k8s_operator_helm_config"></a> [spark\_k8s\_operator\_helm\_config](#input\_spark\_k8s\_operator\_helm\_config) | Spark on K8s Operator Helm Chart config | `any` | `{}` | no |
| <a name="input_sqs_queue_kms_data_key_reuse_period_seconds"></a> [sqs\_queue\_kms\_data\_key\_reuse\_period\_seconds](#input\_sqs\_queue\_kms\_data\_key\_reuse\_period\_seconds) | The length of time, in seconds, for which Amazon SQS can reuse a data key to encrypt or decrypt messages before calling AWS KMS again | `number` | `null` | no |
| <a name="input_sqs_queue_kms_master_key_id"></a> [sqs\_queue\_kms\_master\_key\_id](#input\_sqs\_queue\_kms\_master\_key\_id) | The ID of an AWS-managed customer master key (CMK) for Amazon SQS or a custom CMK | `string` | `null` | no |
| <a name="input_sqs_queue_managed_sse_enabled"></a> [sqs\_queue\_managed\_sse\_enabled](#input\_sqs\_queue\_managed\_sse\_enabled) | Enable server-side encryption (SSE) for a SQS queue | `bool` | `true` | no |
| <a name="input_strimzi_kafka_operator_helm_config"></a> [strimzi\_kafka\_operator\_helm\_config](#input\_strimzi\_kafka\_operator\_helm\_config) | Kafka Strimzi Helm Chart config | `any` | `{}` | no |
| <a name="input_sysdig_agent_helm_config"></a> [sysdig\_agent\_helm\_config](#input\_sysdig\_agent\_helm\_config) | Sysdig Helm Chart config | `any` | `{}` | no |
| <a name="input_tags"></a> [tags](#input\_tags) | Additional tags (e.g. `map('BusinessUnit`,`XYZ`) | `map(string)` | `{}` | no |
Expand Down
17 changes: 14 additions & 3 deletions modules/kubernetes-addons/karpenter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,30 +28,41 @@ For more details checkout [Karpenter](https://karpenter.sh/docs/getting-started/

| Name | Type |
|------|------|
| [aws_cloudwatch_event_rule.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_rule) | resource |
| [aws_cloudwatch_event_target.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target) | resource |
| [aws_iam_policy.karpenter](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource |
| [aws_arn.queue](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/arn) | data source |
| [aws_sqs_queue.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue) | resource |
| [aws_sqs_queue_policy.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue_policy) | resource |
| [aws_iam_policy_document.karpenter](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.sqs_queue](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_addon_context"></a> [addon\_context](#input\_addon\_context) | Input configuration for the addon | <pre>object({<br> aws_caller_identity_account_id = string<br> aws_caller_identity_arn = string<br> aws_eks_cluster_endpoint = string<br> aws_partition_id = string<br> aws_region_name = string<br> eks_cluster_id = string<br> eks_oidc_issuer_url = string<br> eks_oidc_provider_arn = string<br> tags = map(string)<br> irsa_iam_role_path = string<br> irsa_iam_permissions_boundary = string<br> })</pre> | n/a | yes |
| <a name="input_enable_spot_termination_handling"></a> [enable\_spot\_termination\_handling](#input\_enable\_spot\_termination\_handling) | Determines whether to enable native spot termination handling | `bool` | `false` | no |
| <a name="input_enable_spot_termination"></a> [enable\_spot\_termination](#input\_enable\_spot\_termination) | Determines whether to enable native spot termination handling | `bool` | `false` | no |
| <a name="input_helm_config"></a> [helm\_config](#input\_helm\_config) | Helm provider config for the Karpenter | `any` | `{}` | no |
| <a name="input_irsa_policies"></a> [irsa\_policies](#input\_irsa\_policies) | Additional IAM policies for a IAM role for service accounts | `list(string)` | `[]` | no |
| <a name="input_manage_via_gitops"></a> [manage\_via\_gitops](#input\_manage\_via\_gitops) | Determines if the add-on should be managed via GitOps. | `bool` | `false` | no |
| <a name="input_node_iam_instance_profile"></a> [node\_iam\_instance\_profile](#input\_node\_iam\_instance\_profile) | Karpenter Node IAM Instance profile id | `string` | `""` | no |
| <a name="input_path"></a> [path](#input\_path) | Path in which to create the Karpenter policy | `string` | `"/"` | no |
| <a name="input_sqs_queue_arn"></a> [sqs\_queue\_arn](#input\_sqs\_queue\_arn) | (Optional) ARN of SQS used by Karpenter when native node termination handling is enabled | `string` | `""` | no |
| <a name="input_sqs_queue_kms_data_key_reuse_period_seconds"></a> [sqs\_queue\_kms\_data\_key\_reuse\_period\_seconds](#input\_sqs\_queue\_kms\_data\_key\_reuse\_period\_seconds) | The length of time, in seconds, for which Amazon SQS can reuse a data key to encrypt or decrypt messages before calling AWS KMS again | `number` | `null` | no |
| <a name="input_sqs_queue_kms_master_key_id"></a> [sqs\_queue\_kms\_master\_key\_id](#input\_sqs\_queue\_kms\_master\_key\_id) | The ID of an AWS-managed customer master key (CMK) for Amazon SQS or a custom CMK | `string` | `null` | no |
| <a name="input_sqs_queue_managed_sse_enabled"></a> [sqs\_queue\_managed\_sse\_enabled](#input\_sqs\_queue\_managed\_sse\_enabled) | Enable server-side encryption (SSE) for a SQS queue | `bool` | `true` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_argocd_gitops_config"></a> [argocd\_gitops\_config](#output\_argocd\_gitops\_config) | Configuration used for managing the add-on with ArgoCD |
| <a name="output_event_rules"></a> [event\_rules](#output\_event\_rules) | Map of the event rules created and their attributes |
| <a name="output_irsa_arn"></a> [irsa\_arn](#output\_irsa\_arn) | IAM role ARN for the service account |
| <a name="output_irsa_name"></a> [irsa\_name](#output\_irsa\_name) | IAM role name for the service account |
| <a name="output_release_metadata"></a> [release\_metadata](#output\_release\_metadata) | Map of attributes of the Helm release metadata |
| <a name="output_service_account"></a> [service\_account](#output\_service\_account) | Name of Kubernetes service account |
| <a name="output_sqs_queue_arn"></a> [sqs\_queue\_arn](#output\_sqs\_queue\_arn) | The ARN of the SQS queue |
| <a name="output_sqs_queue_name"></a> [sqs\_queue\_name](#output\_sqs\_queue\_name) | The name of the created Amazon SQS queue |
| <a name="output_sqs_queue_url"></a> [sqs\_queue\_url](#output\_sqs\_queue\_url) | The URL for the created Amazon SQS queue |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
27 changes: 20 additions & 7 deletions modules/kubernetes-addons/karpenter/data.tf
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
data "aws_arn" "queue" {
count = var.enable_spot_termination_handling ? 1 : 0

arn = var.sqs_queue_arn
}
data "aws_partition" "current" {}

data "aws_iam_policy_document" "karpenter" {
statement {
Expand Down Expand Up @@ -89,7 +85,7 @@ data "aws_iam_policy_document" "karpenter" {
}

dynamic "statement" {
for_each = var.sqs_queue_arn != "" ? [1] : []
for_each = var.enable_spot_termination ? [1] : []

content {
actions = [
Expand All @@ -98,7 +94,24 @@ data "aws_iam_policy_document" "karpenter" {
"sqs:GetQueueUrl",
"sqs:ReceiveMessage",
]
resources = [var.sqs_queue_arn]
resources = [aws_sqs_queue.this[0].arn]
}
}
}

data "aws_iam_policy_document" "sqs_queue" {
count = var.enable_spot_termination ? 1 : 0

statement {
sid = "SqsWrite"
actions = ["sqs:SendMessage"]
principals {
type = "Service"
identifiers = [
"events.${local.dns_suffix}",
"sqs.${local.dns_suffix}"
]
}
resources = [aws_sqs_queue.this[0].arn]
}
}
48 changes: 46 additions & 2 deletions modules/kubernetes-addons/karpenter/locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ locals {
clusterName: ${var.addon_context.eks_cluster_id}
clusterEndpoint: ${var.addon_context.aws_eks_cluster_endpoint}
defaultInstanceProfile: ${var.node_iam_instance_profile}
interruptionQueueName: ${try(data.aws_arn.queue[0].resource, "")}
interruptionQueueName: ${try(aws_sqs_queue.this[0].name, "")}
EOT
]
description = "karpenter Helm Chart for Node Autoscaling"
Expand All @@ -48,6 +48,50 @@ locals {
serviceAccountName = local.service_account
controllerClusterEndpoint = var.addon_context.aws_eks_cluster_endpoint
awsDefaultInstanceProfile = var.node_iam_instance_profile
awsInterruptionQueueName = try(data.aws_arn.queue[0].resource, "")
awsInterruptionQueueName = try(aws_sqs_queue.this[0].name, "")
}

dns_suffix = data.aws_partition.current.dns_suffix

# Karpenter Spot Interruption Event rules
event_rules = {
health_event = {
name = "HealthEvent"
description = "Karpenter Interrupt - AWS health event for EC2"
event_pattern = {
source = ["aws.health"]
detail-type = ["AWS Health Event"]
detail = {
service = ["EC2"]
}
}
}
spot_interupt = {
name = "SpotInterrupt"
description = "Karpenter Interrupt - A spot interruption warning was triggered for the node"
event_pattern = {
source = ["aws.ec2"]
detail-type = ["EC2 Spot Instance Interruption Warning"]
}
}
instance_rebalance = {
name = "InstanceRebalance"
description = "Karpenter Interrupt - A spot rebalance recommendation was triggered for the node"
event_pattern = {
source = ["aws.ec2"]
detail-type = ["EC2 Instance Rebalance Recommendation"]
}
}
instance_state_change = {
name = "InstanceStateChange"
description = "Karpenter interrupt - EC2 instance state-change notification"
event_pattern = {
source = ["aws.ec2"]
detail-type = ["EC2 Instance State-change Notification"]
detail = {
state = ["stopping", "terminated", "shutting-down", "stopped"] #ignored pending and running
}
}
}
}
}
40 changes: 40 additions & 0 deletions modules/kubernetes-addons/karpenter/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,43 @@ resource "aws_iam_policy" "karpenter" {
policy = data.aws_iam_policy_document.karpenter.json
path = var.path
}

#tfsec:ignore:aws-sqs-enable-queue-encryption
resource "aws_sqs_queue" "this" {
count = var.enable_spot_termination ? 1 : 0

name = "karpenter-${var.addon_context.eks_cluster_id}"
message_retention_seconds = 300
sqs_managed_sse_enabled = var.sqs_queue_managed_sse_enabled
kms_master_key_id = var.sqs_queue_kms_master_key_id
kms_data_key_reuse_period_seconds = var.sqs_queue_kms_data_key_reuse_period_seconds

tags = var.addon_context.tags
}

resource "aws_sqs_queue_policy" "this" {
count = var.enable_spot_termination ? 1 : 0

queue_url = aws_sqs_queue.this[0].id
policy = data.aws_iam_policy_document.sqs_queue[0].json
}

resource "aws_cloudwatch_event_rule" "this" {
for_each = { for k, v in local.event_rules : k => v if var.enable_spot_termination }

name = each.value.name
description = each.value.description
event_pattern = jsonencode(each.value.event_pattern)
tags = merge(
{ "ClusterName" : var.addon_context.eks_cluster_id },
var.addon_context.tags,
)
}

resource "aws_cloudwatch_event_target" "this" {
for_each = { for k, v in local.event_rules : k => v if var.enable_spot_termination }

rule = aws_cloudwatch_event_rule.this[each.key].name
arn = aws_sqs_queue.this[0].arn
target_id = "KarpenterInterruptionQueueTarget"
}
20 changes: 20 additions & 0 deletions modules/kubernetes-addons/karpenter/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,23 @@ output "service_account" {
description = "Name of Kubernetes service account"
value = module.helm_addon.service_account
}

output "sqs_queue_arn" {
description = "The ARN of the SQS queue"
value = try(aws_sqs_queue.this[0].arn, null)
}

output "sqs_queue_name" {
description = "The name of the created Amazon SQS queue"
value = try(aws_sqs_queue.this[0].name, null)
}

output "sqs_queue_url" {
description = "The URL for the created Amazon SQS queue"
value = try(aws_sqs_queue.this[0].url, null)
}

output "event_rules" {
description = "Map of the event rules created and their attributes"
value = aws_cloudwatch_event_rule.this
}
27 changes: 19 additions & 8 deletions modules/kubernetes-addons/karpenter/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,12 @@ variable "node_iam_instance_profile" {
default = ""
}

# tflint-ignore: terraform_unused_declarations
variable "enable_spot_termination_handling" {
variable "enable_spot_termination" {
description = "Determines whether to enable native spot termination handling"
type = bool
default = false
}

variable "sqs_queue_arn" {
description = "(Optional) ARN of SQS used by Karpenter when native node termination handling is enabled"
type = string
default = ""
}

variable "addon_context" {
description = "Input configuration for the addon"
type = object({
Expand All @@ -57,3 +50,21 @@ variable "path" {
type = string
default = "/"
}

variable "sqs_queue_managed_sse_enabled" {
description = "Enable server-side encryption (SSE) for a SQS queue"
type = bool
default = true
}

variable "sqs_queue_kms_master_key_id" {
description = "The ID of an AWS-managed customer master key (CMK) for Amazon SQS or a custom CMK"
type = string
default = null
}

variable "sqs_queue_kms_data_key_reuse_period_seconds" {
description = "The length of time, in seconds, for which Amazon SQS can reuse a data key to encrypt or decrypt messages before calling AWS KMS again"
type = number
default = null
}
17 changes: 10 additions & 7 deletions modules/kubernetes-addons/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -317,13 +317,15 @@ module "karpenter" {

count = var.enable_karpenter ? 1 : 0

helm_config = var.karpenter_helm_config
irsa_policies = var.karpenter_irsa_policies
node_iam_instance_profile = var.karpenter_node_iam_instance_profile
enable_spot_termination_handling = var.karpenter_enable_spot_termination_handling
sqs_queue_arn = var.karpenter_sqs_queue_arn
manage_via_gitops = var.argocd_manage_add_ons
addon_context = local.addon_context
helm_config = var.karpenter_helm_config
irsa_policies = var.karpenter_irsa_policies
node_iam_instance_profile = var.karpenter_node_iam_instance_profile
enable_spot_termination = var.karpenter_enable_spot_termination_handling
manage_via_gitops = var.argocd_manage_add_ons
addon_context = local.addon_context
sqs_queue_managed_sse_enabled = var.sqs_queue_managed_sse_enabled
sqs_queue_kms_master_key_id = var.sqs_queue_kms_master_key_id
sqs_queue_kms_data_key_reuse_period_seconds = var.sqs_queue_kms_data_key_reuse_period_seconds
}

module "keda" {
Expand Down Expand Up @@ -530,6 +532,7 @@ module "secrets_store_csi_driver" {
manage_via_gitops = var.argocd_manage_add_ons
addon_context = local.addon_context
}

module "aws_privateca_issuer" {
count = var.enable_aws_privateca_issuer ? 1 : 0
source = "./aws-privateca-issuer"
Expand Down
18 changes: 15 additions & 3 deletions modules/kubernetes-addons/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -911,10 +911,22 @@ variable "karpenter_enable_spot_termination_handling" {
default = false
}

variable "karpenter_sqs_queue_arn" {
description = "(Optional) ARN of SQS used by Karpenter when native node termination handling is enabled"
variable "sqs_queue_managed_sse_enabled" {
description = "Enable server-side encryption (SSE) for a SQS queue"
type = bool
default = true
}

variable "sqs_queue_kms_master_key_id" {
description = "The ID of an AWS-managed customer master key (CMK) for Amazon SQS or a custom CMK"
type = string
default = ""
default = null
}

variable "sqs_queue_kms_data_key_reuse_period_seconds" {
description = "The length of time, in seconds, for which Amazon SQS can reuse a data key to encrypt or decrypt messages before calling AWS KMS again"
type = number
default = null
}

#-----------KEDA ADDON-------------
Expand Down

0 comments on commit 6ae9423

Please sign in to comment.