Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: config karpenter TDE-903 #196

Merged
merged 24 commits into from
Oct 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
61d75bc
feat: upgrade cdk8s-cli
paulfouquet Oct 13, 2023
c469211
wip
paulfouquet Oct 16, 2023
a28ec4d
feat(cdk8s): retrieve eks cluster information
paulfouquet Oct 17, 2023
6edffe8
Merge branch 'master' into feat/config-karpenter-tde-903
paulfouquet Oct 18, 2023
6e505b5
wip
paulfouquet Oct 19, 2023
eff13ce
Merge branch 'master' into feat/config-karpenter-tde-903
paulfouquet Oct 19, 2023
8478bd3
fix: delete tsconfig.json
paulfouquet Oct 19, 2023
9704ae3
wip
paulfouquet Oct 19, 2023
3323bae
feat: init cdk8s karpenter and provisioners
paulfouquet Oct 20, 2023
d019878
feat: import config
blacha Oct 20, 2023
be19119
fix: allow any for imported files
blacha Oct 20, 2023
397cc9c
fix: allow provisoners to be deployed
blacha Oct 20, 2023
4bb282d
fix: taint all karpenter instances
blacha Oct 20, 2023
7fe0ee2
wip: hack around a bit to get karpenter to start
blacha Oct 23, 2023
38dac74
fix: allow ipv6 address creation
blacha Oct 23, 2023
75dfe4a
fix: lint fails
paulfouquet Oct 23, 2023
ac33309
fix: need to use something to determine what subnets to use Name: * s…
blacha Oct 24, 2023
c5dfc33
fix: override coredns to fix AAAA records being resolved for external…
blacha Oct 24, 2023
2e14936
docs: initial docs for debugging dns resolution
blacha Oct 24, 2023
6fa300d
docs: add small details to dns.configuration
paulfouquet Oct 24, 2023
38d87a2
docs: add missing dots
paulfouquet Oct 24, 2023
3b3a60f
refactor: share cluster name
paulfouquet Oct 24, 2023
5c56550
docs: update readme
paulfouquet Oct 24, 2023
df7a413
Merge branch 'master' into feat/config-karpenter-tde-903
paulfouquet Oct 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
node_modules/
dist/
cdk.out/
cdk.context.json
cdk.context.json
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Topo Workflows

Topo workflows are run on a AWS EKS Cluster using [Argo Workflows](https://argoproj.github.io/argo-workflows/)
Topo workflows are run on a AWS EKS Cluster using [Argo Workflows](https://argoproj.github.io/argo-workflows/). The detailed configuration is available in [this repo](./config/).

To get setup you need access to the Argo user role inside the EKS cluster, you will need to contact someone from Topo Data Engineering to get access, all Imagery maintainers will already have access.

Expand Down
3 changes: 3 additions & 0 deletions cdk8s.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
app: npx tsx config/cdk8s.ts
language: typescript
imports:
- https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.sh_provisioners.yaml
- https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.k8s.aws_awsnodetemplates.yaml
42 changes: 34 additions & 8 deletions config/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,46 @@
# Kubernetes configuration with CDK8s and AWS-CDK
# Topo-Workflows Infrastructure

Collection of AWS & Kubernetes resources.
The infrastructure running the workflows is mainly based on a Kubernetes (EKS) cluster and Argo Workflows. It is currently run on AWS.
Generally all Kubernetes resources are defined with cdk8s and anything that needs AWS interactions such as service accounts are defined with CDK.

## Components
## EKS Cluster / AWS CDK

Main entry point: [cdk8s](./cdk8s.ts) and [cdk](./cdk.ts)
The EKS Cluster base configuration is defined in `./cdk.ts` using [`aws-cdk`](https://aws.amazon.com/cdk/).

Generally all Kubernetes resources are defined with cdk8s and anything that needs AWS interactions such as service accounts are defined with CDK.
## Kubernetes resources / CDK8s

The additional components (or Kubernetes resources) running on the EKS cluster are defined in `./cdk8s` using [`cdk8s`](https://cdk8s.io/).

Main entry point: [app](./cdk8s.ts)

- argo - Argo workflows for use with [linz/topo-workflows](https://github.com/linz/topo-workflows)
- Argo - Argo workflows for use with [linz/topo-workflows](https://github.com/linz/topo-workflows)
- Karpenter

### Argo Workflows

#### Semaphores

ConfigMap that list the synchronization limits for parallel execution of the workflows.

## Development
### Karpenter

TODO

### Generate code

Generate code from Helm:
It is possible to generate a specific Helm construct for the component if their chart includes a `value.schema.json`. This is useful to provide typing hints when specifying their configuration (<https://github.com/cdk8s-team/cdk8s/blob/master/docs/cli/import.md#values-schema>)

<https://cdk8s.io/>
To generate the Helm Construct for a specific Chart, follow the instructions [here](https://github.com/cdk8s-team/cdk8s/blob/master/docs/cli/import.md#values-schema):

Specify the output for the imports:

`--output config/imports/`

However, some of the component Helm charts do not have a `values.schema.json`. For those we won't generate any code and use the default `Helm` construct:

- aws-for-fluent-bit (<https://github.com/aws/eks-charts/issues/1011>)
- Karpenter

## Usage (for test)

Expand All @@ -45,3 +67,7 @@ kubectl apply -f dist/
## Deployment

The deployment of the K8s config is managed by GithubActions in [main](../.github/workflows/main.yml).

## Troubleshoot

- [DNS](../docs/dns.configuration.md)
5 changes: 4 additions & 1 deletion config/cdk.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
import { App } from 'aws-cdk-lib';

import { CLUSTER_NAME } from './constants';
import { LinzEksCluster } from './eks/cluster';

const app = new App();

async function main(): Promise<void> {
new LinzEksCluster(app, 'Workflows', { env: { region: 'ap-southeast-2', account: process.env.CDK_DEFAULT_ACCOUNT } });
new LinzEksCluster(app, CLUSTER_NAME, {
env: { region: 'ap-southeast-2', account: process.env.CDK_DEFAULT_ACCOUNT },
});

app.synth();
}
Expand Down
31 changes: 31 additions & 0 deletions config/cdk8s.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,42 @@
import { App } from 'cdk8s';

import { ArgoSemaphore } from './charts/argo.semaphores';
import { Karpenter, KarpenterProvisioner } from './charts/karpenter';
import { CoreDns } from './charts/kube-system.coredns';
import { CfnOutputKeys } from './constants';
import { CLUSTER_NAME } from './constants';
import { getCfnOutputs } from './util/cloud.formation';

const app = new App();

async function main(): Promise<void> {
// Get cloudformation outputs
const cfnOutputs = await getCfnOutputs(CLUSTER_NAME);
const missingKeys = [...Object.values(CfnOutputKeys.Karpenter)].filter((f) => cfnOutputs[f] == null);
if (missingKeys.length > 0) {
throw new Error(`Missing CloudFormation Outputs for keys ${missingKeys.join(', ')}`);
}

new ArgoSemaphore(app, 'semaphore', {});
new CoreDns(app, 'Dns', {});

const karpenter = new Karpenter(app, 'karpenter', {
clusterName: CLUSTER_NAME,
clusterEndpoint: cfnOutputs[CfnOutputKeys.Karpenter.ClusterEndpoint],
saRoleName: cfnOutputs[CfnOutputKeys.Karpenter.ServiceAccountName],
saRoleArn: cfnOutputs[CfnOutputKeys.Karpenter.ServiceAccountRoleArn],
instanceProfile: cfnOutputs[CfnOutputKeys.Karpenter.DefaultInstanceProfile],
});

const karpenterProvisioner = new KarpenterProvisioner(app, 'karpenter-provisioner', {
clusterName: CLUSTER_NAME,
clusterEndpoint: cfnOutputs[CfnOutputKeys.Karpenter.ClusterEndpoint],
saRoleName: cfnOutputs[CfnOutputKeys.Karpenter.ServiceAccountName],
saRoleArn: cfnOutputs[CfnOutputKeys.Karpenter.ServiceAccountRoleArn],
instanceProfile: cfnOutputs[CfnOutputKeys.Karpenter.DefaultInstanceProfile],
});

karpenterProvisioner.addDependency(karpenter);

app.synth();
}
Expand Down
5 changes: 0 additions & 5 deletions config/cfn.output.ts

This file was deleted.

2 changes: 1 addition & 1 deletion config/charts/argo.semaphores.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import { Chart, ChartProps } from 'cdk8s';
import * as kplus from 'cdk8s-plus-27';
import { Construct } from 'constructs';

import { applyDefaultLabels } from '../util/labels';
import { applyDefaultLabels } from '../util/labels.js';

export class ArgoSemaphore extends Chart {
constructor(scope: Construct, id: string, props: ChartProps) {
Expand Down
128 changes: 128 additions & 0 deletions config/charts/karpenter.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
import { Chart, ChartProps, Duration, Helm } from 'cdk8s';
import { Construct } from 'constructs';

import { AwsNodeTemplateSpec } from '../imports/karpenter.k8s.aws.js';
import { Provisioner, ProvisionerSpecLimitsResources } from '../imports/karpenter.sh.js';
import { applyDefaultLabels } from '../util/labels.js';

export interface KarpenterProps {
clusterName: string;
saRoleName: string;
saRoleArn: string;
clusterEndpoint: string;
instanceProfile: string;
}

export class Karpenter extends Chart {
constructor(scope: Construct, id: string, props: KarpenterProps & ChartProps) {
super(scope, id, applyDefaultLabels(props, 'karpenter', 'v0.31.0', 'karpenter', 'workflows'));

// Deploying the CRD
new Helm(this, 'karpenter-crd', {
chart: 'oci://public.ecr.aws/karpenter/karpenter-crd',
namespace: 'karpenter',
version: 'v0.31.0',
});

// Karpenter is using `oci` rather than regular helm repo: https://gallery.ecr.aws/karpenter/karpenter.
// This Helm constructor has been tricked to be able to use `oci`,
// the `oci` repo is passed inside `chart` instead of `repo` so the generated `helm`
// command is the following:
// [
// 'template',
// '-f',
// '/tmp/cdk8s-helm-keYZCA/overrides.yaml',
// '--version',
// 'v0.31.0',
// '--namespace',
// 'karpenter',
// 'karpenter-c870a560',
// 'oci://public.ecr.aws/karpenter/karpenter'
// ]
new Helm(this, 'karpenter', {
chart: 'oci://public.ecr.aws/karpenter/karpenter',
namespace: 'karpenter',
version: 'v0.31.0',
values: {
serviceAccount: {
create: false,
name: props.saRoleName,
annotations: { 'eks.amazonaws.com/role-arn': props.saRoleArn },
},
settings: {
aws: {
clusterName: props.clusterName,
clusterEndpoint: props.clusterEndpoint,
defaultInstanceProfile: props.instanceProfile,
},
},
},
});
}
}

export class KarpenterProvisioner extends Chart {
constructor(scope: Construct, id: string, props: KarpenterProps & ChartProps) {
super(scope, id, applyDefaultLabels(props, 'karpenter', 'v0.31.0', 'karpenter', 'workflows'));

// Subnets need to be opted into, ideally a tag on subnets would be the best bet here
// but CDK does not easily allow us to tag Subnets that are not created by us
const subnetSelector = { Name: '*' };

const provider: AwsNodeTemplateSpec = {
amiFamily: 'Bottlerocket',
subnetSelector,
securityGroupSelector: { [`kubernetes.io/cluster/${props.clusterName}`]: 'owned' },
instanceProfile: props.instanceProfile,
blockDeviceMappings: [
// {
// deviceName: '/dev/xvdb',
// ebs: {
// volumeType: 'gp3',
// volumeSize: '200Gi',
// deleteOnTermination: true,
// },
// },
],
};

new Provisioner(this, 'ClusterAmd64WorkerNodes', {
metadata: { name: `eks-karpenter-${props.clusterName}-amd64`.toLowerCase(), namespace: 'karpenter' },
spec: {
// Ensure only pods that tolerate spot run on spot instance types
// to prevent long running pods (eg kube-dns) being moved.
taints: [{ key: 'karpenter.sh/capacity-type', value: 'spot', effect: 'NoSchedule' }],
requirements: [
{ key: 'karpenter.sh/capacity-type', operator: 'In', values: ['spot'] },
{ key: 'kubernetes.io/arch', operator: 'In', values: ['amd64'] },
{ key: 'karpenter.k8s.aws/instance-family', operator: 'In', values: ['c5', 'c6i', 'c6a'] },
],
limits: { resources: { cpu: ProvisionerSpecLimitsResources.fromString('20000m') } },
provider,
ttlSecondsAfterEmpty: Duration.minutes(1).toSeconds(), // optional, but never scales down if not set
},
});

new Provisioner(this, 'ClusterArmWorkerNodes', {
metadata: { name: `eks-karpenter-${props.clusterName}-arm64`.toLowerCase(), namespace: 'karpenter' },
spec: {
taints: [
// Instances that want ARM have to tolerate the arm taint
// This prevents some pods from accidentally trying to start on ARM
{ key: 'kubernetes.io/arch', value: 'arm64', effect: 'NoSchedule' },
// Ensure only pods that tolerate spot run on spot instance types
// to prevent long running pods (eg kube-dns) being moved.
{ key: 'karpenter.sh/capacity-type', value: 'spot', effect: 'NoSchedule' },
],
requirements: [
{ key: 'karpenter.sh/capacity-type', operator: 'In', values: ['spot'] },
{ key: 'kubernetes.io/arch', operator: 'In', values: ['arm64'] },
{ key: 'karpenter.k8s.aws/instance-family', operator: 'In', values: ['c7g', 'c6g'] },
],
limits: { resources: { cpu: ProvisionerSpecLimitsResources.fromString('20000m') } },
provider,
ttlSecondsAfterEmpty: Duration.minutes(1).toSeconds(), // optional, but never scales down if not set
},
});
}
}
65 changes: 65 additions & 0 deletions config/charts/kube-system.coredns.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import { Chart, ChartProps } from 'cdk8s';
import * as kplus from 'cdk8s-plus-27';
import { Construct } from 'constructs';

import { applyDefaultLabels } from '../util/labels.js';

/**
* This cluster is setup as dual ipv4/ipv6 where ipv4 is used for external traffic
* and ipv6 for internal traffic.
*
* This means all traffic ingressing/egressing the cluster must go over ipv4.
*
* By default coredns will resolve AAAA (ipv6) records for external hosts even though
* the cluster is unable to reach them
*
* This configuration splits coredns into two zones
*
* - `.cluster.local` - internal to the cluster
* - `.` - everything else
*
* The internal cluster allows `ipv6` resolutions, while `.` prevents `AAAA` resolutions using
* `rewrite stop type AAAA A`
*
*/
export class CoreDns extends Chart {
constructor(scope: Construct, id: string, props: ChartProps) {
super(scope, id, applyDefaultLabels(props, 'coredns', 'v1', 'kube-dns', 'kube-dns'));

new kplus.ConfigMap(this, 'coredns', {
metadata: { name: 'coredns', namespace: 'kube-system' },
data: {
// FIXME: is there a better way of handling config files inside of cdk8s
Corefile: `
cluster.local:53 {
log
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}

.:53 {
log
errors
health
rewrite stop type AAAA A
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}`,
},
});
}
}
12 changes: 12 additions & 0 deletions config/constants.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
/* Cluster name */
export const CLUSTER_NAME = 'Workflows';

/* CloudFormation Output to access from CDK8s */
export const CfnOutputKeys = {
Karpenter: {
ServiceAccountName: 'KarpenterServiceAccountName',
ServiceAccountRoleArn: 'KarpenterServiceAccountRoleArn',
ClusterEndpoint: 'ClusterEndpoint',
DefaultInstanceProfile: 'DefaultInstanceProfile',
},
};
Loading
Loading