Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Multi-cluster architecture to increase resiliency and reduce inter-az data transfer charges #1802

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions patterns/cell-based-eks/0.vpc/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
provider "aws" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the cluster-per-AZ design requires splitting up the Terraform configurations into multiple directories. We should collapse this back down to a single directory, but have multiple cluster definitions - one for each AZ used. This can be shown with a set of definitions split into multiple files - for example:

  • az1.tf
  • az1.yaml
  • az2.tf
  • az2.yaml
  • az3.tf
  • az3.yaml

Within each of these AZ specific Terraform files we'll have:

  • EKS cluster definition
  • Addons definition
  • Kubernetes and Helm aliased providers scoped to that cluster and addon definition

Then within each of the AZ specific YAML files will be the Karpenter specific manifests for that AZ and cluster within that AZ.

Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, made changes as suggested. We were following the istio multi-cluster pattern structure before.

region = local.region
}

data "aws_availability_zones" "available" {}

locals {
cluster_name = format("%s-%s", basename(path.cwd), "shared")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets follow the current norm from what is used in other patterns:

Suggested change
cluster_name = format("%s-%s", basename(path.cwd), "shared")
name = basename(path.cwd)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

region = "us-west-2"

vpc_cidr = "10.0.0.0/16"
azs = slice(data.aws_availability_zones.available.names, 0, 3)

tags = {
Blueprint = local.cluster_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Blueprint = local.cluster_name
Blueprint = local.name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

GithubRepo = "github.com/aws-ia/terraform-aws-eks-blueprints"
}
}

################################################################################
# VPC
################################################################################

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"

name = local.cluster_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name = local.cluster_name
name = local.name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

cidr = local.vpc_cidr

azs = local.azs
private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
public_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]

enable_nat_gateway = true
single_nat_gateway = true

public_subnet_tags = {
"kubernetes.io/role/elb" = 1
}

private_subnet_tags = {
"kubernetes.io/role/internal-elb" = 1
}

tags = local.tags
}
14 changes: 14 additions & 0 deletions patterns/cell-based-eks/0.vpc/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
output "vpc_id" {
description = "Amazon EKS VPC ID"
value = module.vpc.vpc_id
}

output "subnet_ids" {
description = "Amazon EKS Subnet IDs"
value = module.vpc.private_subnets
}

output "vpc_cidr" {
description = "Amazon EKS VPC CIDR Block."
value = local.vpc_cidr
}
Empty file.
17 changes: 17 additions & 0 deletions patterns/cell-based-eks/0.vpc/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
terraform {
required_version = ">= 1.0"

required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 4.47"
}
}

# ## Used for end-to-end testing on project; update to suit your needs
# backend "s3" {
# bucket = "<BUCKET_NAME>"
# region = "<AWS_REGION>"
# key = "e2e/istio-multi-cluster-vpc/terraform.tfstate"
# }
}
181 changes: 181 additions & 0 deletions patterns/cell-based-eks/1.cell1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Cell-Based Architecture for Amazon EKS

This example shows how to provision a cell based Amazon EKS cluster.

* Deploy EKS Cluster with one managed node group in a VPC and AZ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the motivation for mixing Fargate, managed nodegroup, and Karpenter in this design?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was about showing how to use them in single AZ pattern. Removed the Fargate and using 1 managed node group + Karpenter now.

* Deploy Fargate profiles to run `coredns`, `aws-load-balancer-controller`, and `karpenter` addons
* Deploy Karpenter `Provisioner` and `AWSNodeTemplate` resources and configure them to run in AZ1
* Deploy sample deployment `inflate` with 0 replicas

Refer to the [AWS Solution Guidance](https://aws.amazon.com/solutions/guidance/cell-based-architecture-for-amazon-eks/) for more details.

## Prerequisites:

Ensure that you have the following tools installed locally:

1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
2. [kubectl](https://Kubernetes.io/docs/tasks/tools/)
3. [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)
4. [helm](https://helm.sh/docs/helm/helm_install/)

## Deploy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the other pattern readmes for the "standard" README structure

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


To provision this example:

```sh
terraform init
terraform apply
```

Enter `yes` at command prompt to apply

## Validate

The following command will update the `kubeconfig` on your local machine and allow you to interact with your EKS Cluster using `kubectl` to validate the deployment.

1. Run `update-kubeconfig` command:

```sh
aws eks --region <REGION> update-kubeconfig --name <CLUSTER_NAME>
```

2. List the nodes running currently

```sh
kubectl get node -o custom-columns='NODE_NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,INSTANCE-TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,AZ:.metadata.labels.topology\.kubernetes\.io/zone,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,INTERNAL-IP:.metadata.annotations.alpha\.kubernetes\.io/provided-node-ip'
```

```
# Output should look like below
NODE_NAME READY INSTANCE-TYPE AZ VERSION OS-IMAGE INTERNAL-IP
fargate-ip-10-0-13-93.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-14-95.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-15-86.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-8-178.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-8-254.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-8-73.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
ip-10-0-12-14.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.12.14
ip-10-0-14-197.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.14.197
```

3. List out the pods running currently:

```sh
kubectl get pods,svc -n kube-system
```

```
# Output should look like below
NAME READY STATUS RESTARTS AGE
pod/aws-load-balancer-controller-776868b4fb-2j9t6 1/1 Running 0 13h
pod/aws-load-balancer-controller-776868b4fb-bzkrr 1/1 Running 0 13h
pod/aws-node-2zhpc 2/2 Running 0 16h
pod/aws-node-w897r 2/2 Running 0 16h
pod/coredns-5c9679c87-bp6ws 1/1 Running 0 16h
pod/coredns-5c9679c87-lw468 1/1 Running 0 16h
pod/kube-proxy-6wp2k 1/1 Running 0 16h
pod/kube-proxy-n8qtq 1/1 Running 0 16h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/aws-load-balancer-webhook-service ClusterIP 172.20.44.77 <none> 443/TCP 14h
service/kube-dns ClusterIP 172.20.0.10 <none> 53/UDP,53/TCP 17h
```

4. Verify all the helm releases installed:

```sh
helm list -A
```

```
# Output should look like below
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
aws-load-balancer-controller kube-system 2 2023-10-18 23:07:36.089372 -0400 EDT deployed aws-load-balancer-controller-1.6.1 v2.6.1
karpenter karpenter 14 2023-10-19 08:25:12.313094 -0400 EDT deployed karpenter-v0.30.0 0.30.0
```

## Test

1. Verify both Fargate nodes and EKS Managed Nodegroup worker nodes are deployed to single AZ

```sh
kubectl get node -o custom-columns='NODE_NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,INSTANCE-TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,AZ:.metadata.labels.topology\.kubernetes\.io/zone,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,INTERNAL-IP:.metadata.annotations.alpha\.kubernetes\.io/provided-node-ip'
```

```
NODE_NAME READY INSTANCE-TYPE AZ VERSION OS-IMAGE INTERNAL-IP
fargate-ip-10-0-13-93.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-14-95.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-15-86.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-8-178.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-8-254.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-8-73.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
ip-10-0-12-14.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.12.14
ip-10-0-14-197.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.14.197
```

2. Scale the `inflate` deployment to 20 replicas and watch for Karpenter to launch EKS worker nodes in correct AZ.

```sh
kubectl scale deployment inflate --replicas 20
```

```
deployment.apps/inflate scaled
```

3. Wait for the pods become ready

```sh
kubectl wait --for=condition=ready pods --all --timeout 2m
```

```
pod/inflate-75d744d4c6-5r5cv condition met
pod/inflate-75d744d4c6-775wm condition met
pod/inflate-75d744d4c6-7t225 condition met
pod/inflate-75d744d4c6-945p4 condition met
pod/inflate-75d744d4c6-b52gp condition met
pod/inflate-75d744d4c6-d99fn condition met
pod/inflate-75d744d4c6-dmnwm condition met
pod/inflate-75d744d4c6-hrvvr condition met
pod/inflate-75d744d4c6-j4hkl condition met
pod/inflate-75d744d4c6-jwknj condition met
pod/inflate-75d744d4c6-ldwts condition met
pod/inflate-75d744d4c6-lqnr5 condition met
pod/inflate-75d744d4c6-pctjh condition met
pod/inflate-75d744d4c6-qdlkc condition met
pod/inflate-75d744d4c6-qnzc5 condition met
pod/inflate-75d744d4c6-r2cwj condition met
pod/inflate-75d744d4c6-srmkb condition met
pod/inflate-75d744d4c6-wf45j condition met
pod/inflate-75d744d4c6-x9mwl condition met
pod/inflate-75d744d4c6-xlbhl condition met
```

4. Check all the nodes are in the correct AZ

```sh
kubectl get node -o custom-columns='NODE_NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,INSTANCE-TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,AZ:.metadata.labels.topology\.kubernetes\.io/zone,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,INTERNAL-IP:.metadata.annotations.alpha\.kubernetes\.io/provided-node-ip'
```
```
NODE_NAME READY INSTANCE-TYPE AZ VERSION OS-IMAGE INTERNAL-IP
fargate-ip-10-0-13-93.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-14-95.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-15-86.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-8-178.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-8-254.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
fargate-ip-10-0-8-73.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none>
ip-10-0-12-14.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.12.14
ip-10-0-14-197.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.14.197
ip-10-0-3-161.us-west-2.compute.internal True c6gn.8xlarge us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.3.161
```

## Destroy

To teardown and remove the resources created in this example:

```sh
terraform destroy -target="module.eks_blueprints_addons" -auto-approve
terraform destroy -auto-approve
```
Loading
Loading