Skip to content

Commit

Permalink
Merge branch 'aws-samples:main' into ci2flux
Browse files Browse the repository at this point in the history
  • Loading branch information
ybezsonov authored Sep 26, 2023
2 parents 1d9c3d6 + 64ef398 commit 2fe4244
Show file tree
Hide file tree
Showing 62 changed files with 834 additions and 258 deletions.
35 changes: 0 additions & 35 deletions .github/workflows/helm-update.yaml

This file was deleted.

2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,5 @@ env
*.zip

cdk.out

.envrc
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Amazon Elastic Kubernetes Service Workshop

![Tests](https://github.com/aws-samples/eks-workshop-v2/actions/workflows/ci.yaml/badge.svg?branch=main)

Welcome to the repository for the [Amazon Elastic Kubernetes Services workshop](https://eksworkshop.com). This contains the source for the website content as well as the accompanying infrastructure-as-code to set up a workshop lab environment in your AWS account. Please review the [Introduction](https://www.eksworkshop.com/docs/introduction/) chapter of the workshop for more details.

## Introduction
Expand Down
60 changes: 35 additions & 25 deletions governance/steering.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,59 @@
# Steering Committee and Module Leads

## Steering Commitee Members

The Steering Committee is a 6 member body, overseeing the governance of the EKS Workshop.

### Terms end in February 2024
|Name|Profile|Role|
|:----|:-------|:----|
|Sai Vennam|[@svennam92](https://github.com/svennam92)|Principal EKS DA
|Niall Thomson|[@niallthomson](https://github.com/niallthomson)|Specialist Solution Architect, Containers|
|Ray Krueger|[@raykrueger](https://github.com/raykrueger)|Principal Container Specialist|
|Ameet Naik|[@ameetnaik](https://github.com/ameetnaik)|Technical Account Manager|
|Kamran Habib|[@kmhabib](https://github.com/kmhabib)|Solution Architect (TFC at large)|
|Theo Salvo|[@buzzsurfr](https://github.com/buzzsurfr)|Container Specialist (TFC core team member)|

| Name | Profile | Role |
| :------------ | :----------------------------------------------- | :------------------------------------------ |
| Sai Vennam | [@svennam92](https://github.com/svennam92) | Principal EKS DA |
| Niall Thomson | [@niallthomson](https://github.com/niallthomson) | Specialist Solution Architect, Containers |
| Ray Krueger | [@raykrueger](https://github.com/raykrueger) | Principal Container Specialist |
| Ameet Naik | [@ameetnaik](https://github.com/ameetnaik) | Technical Account Manager |
| Kamran Habib | [@kmhabib](https://github.com/kmhabib) | Solution Architect (TFC at large) |
| Theo Salvo | [@buzzsurfr](https://github.com/buzzsurfr) | Container Specialist (TFC core team member) |

## Working Groups

The working groups are led by chairs (6 month terms) and maintainers (6 month terms).

|Working Group|Chair|Maintainers|
|:----|:-------|:----|
|Infrastructure|[Niall Thomson](https://github.com/niallthomson)||
|Fundamentals|[Sai Vennam](https://github.com/svennam92)|[Bijith Nair](https://github.com/bijithnair), [Tolu Okuboyejo](https://github.com/oktab1), [Hemanth AVS](https://github.com/hemanth-avs)|
|Autoscaling|[Sanjeev Ganjihal](https://github.com/sanjeevrg89)||
|Automation|[Carlos Santana](https://github.com/csantanapr)|[Tsahi Duek](https://github.com/tsahiduek), [Christina Andonov](https://github.com/candonov), [Sébastien Allamand](https://github.com/allamand)|
|Machine Learning|[Masatoshi Hayashi](https://github.com/literalice)||
|Networking|[Sheetal Joshi](https://github.com/sheetaljoshi)|[Umair Ishaq](https://github.com/umairishaq)|
|Observability|[Nirmal Mehta](https://github.com/normalfaults)|[Steven David](https://github.com/StevenDavid)|
|Security|[Rodrigo Bersa](https://github.com/rodrigobersa)| |
|Storage|[Eric Heinrichs](https://github.com/heinrichse)|[Andrew Peng](https://github.com/pengc99)|
| Working Group | Chair | Maintainers |
| :--------------- | :------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------- |
| Infrastructure | [Niall Thomson](https://github.com/niallthomson) | |
| Fundamentals | [Sai Vennam](https://github.com/svennam92) | [Bijith Nair](https://github.com/bijithnair), [Tolu Okuboyejo](https://github.com/oktab1), [Hemanth AVS](https://github.com/hemanth-avs) |
| Autoscaling | [Sanjeev Ganjihal](https://github.com/sanjeevrg89) | |
| Automation | [Carlos Santana](https://github.com/csantanapr) | [Tsahi Duek](https://github.com/tsahiduek), [Sébastien Allamand](https://github.com/allamand), [Yuriy Bezsonov](https://github.com/ybezsonov) |
| Machine Learning | [Masatoshi Hayashi](https://github.com/literalice) | [Benjamin Gardiner](https://github.com/bkgardiner) |
| Networking | [Sheetal Joshi](https://github.com/sheetaljoshi) | [Umair Ishaq](https://github.com/umairishaq) |
| Observability | [Nirmal Mehta](https://github.com/normalfaults) | [Steven David](https://github.com/StevenDavid) |
| Security | [Rodrigo Bersa](https://github.com/rodrigobersa) | |
| Storage | [Eric Heinrichs](https://github.com/heinrichse) | [Andrew Peng](https://github.com/pengc99) |

## Wranglers

Wranglers will work across all topic areas and serve for at least 6 months.
|Name|Profile|Role|
|:----|:-------|:----|
|Math Bruneau|[@ROunofF](https://github.com/ROunofF)|Specialist Solution Architect, Containers|


## Emeritus
|Name|Profile|Role|
|:----|:-------|:----|
|Jeremy Cowan|[@jicowan](https://github.com/jicowan)|EKS DA manager|

| Name | Profile | Role |
| :----------- | :------------------------------------- | :------------- |
| Jeremy Cowan | [@jicowan](https://github.com/jicowan) | EKS DA manager |

## Meetings

### Schedule and Cadence

The steering committee will host a public meeting every third Thursday of the month at 9AM CT. <!--update with Chime link-->

### Resources
* <!--add links to meeting notes and recordings-->

- <!--add links to meeting notes and recordings-->

## Contact
* Mailing List: <[email protected]>

- Mailing List: <[email protected]>
2 changes: 1 addition & 1 deletion helm/src/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ idna==3.4
PyYAML==6.0
requests==2.31.0
semantic-version==2.10.0
urllib3==2.0.2
urllib3==2.0.3
8 changes: 7 additions & 1 deletion lab/bin/use-cluster
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,10 @@ EKS_IP_FAMILY=ipv4
set +a
EOT

aws eks update-kubeconfig --name $cluster_name > /dev/null
aws eks update-kubeconfig --name $cluster_name > /dev/null 2>&1

if [[ -v C9_USER ]]; then
echo "Granting C9_USER access to the cluster via the AWS Console ${C9_USER}"
eksctl create iamidentitymapping --cluster $cluster_name --arn arn:aws:iam::${AWS_ACCOUNT_ID}:user/${C9_USER} --username console-iam-user --group system:masters > /dev/null
eksctl create iamidentitymapping --cluster $cluster_name --arn arn:aws:iam::${AWS_ACCOUNT_ID}:role/${C9_USER} --username console-iam-role --group system:masters > /dev/null
fi
25 changes: 25 additions & 0 deletions manifests/modules/aiml/inferentia/.workshop/cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

set -e

echo "Deleting AIML resources..."

kubectl delete namespace aiml > /dev/null

echo "Deleting Karpenter provisioners..."

kubectl delete provisioner --all > /dev/null
kubectl delete awsnodetemplate --all > /dev/null

echo "Waiting for Karpenter nodes to be removed..."

EXIT_CODE=0

timeout --foreground -s TERM 30 bash -c \
'while [[ $(kubectl get nodes --selector=type=karpenter -o json | jq -r ".items | length") -gt 0 ]];\
do sleep 5;\
done' || EXIT_CODE=$?

if [ $EXIT_CODE -ne 0 ]; then
echo "Warning: Karpenter nodes did not clean up"
fi
128 changes: 128 additions & 0 deletions manifests/modules/aiml/inferentia/.workshop/terraform/addon.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
data "aws_subnets" "private" {
tags = {
created-by = "eks-workshop-v2"
env = local.addon_context.eks_cluster_id
}

filter {
name = "tag:Name"
values = ["*Private*"]
}
}

module "iam_assumable_role_inference" {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
version = "~> v5.5.0"
create_role = true
role_name = "${local.addon_context.eks_cluster_id}-inference"
provider_url = local.addon_context.eks_oidc_issuer_url
role_policy_arns = [aws_iam_policy.inference.arn]
oidc_fully_qualified_subjects = ["system:serviceaccount:aiml:inference"]

tags = local.tags
}


resource "aws_iam_policy" "inference" {
name = "${local.addon_context.eks_cluster_id}-inference"
path = "/"
description = "IAM policy for the inferenct workload"

policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::${aws_s3_bucket.inference.id}",
"arn:aws:s3:::${aws_s3_bucket.inference.id}/*"
]
}
]
}
EOF
}

module "karpenter" {
source = "github.com/aws-ia/terraform-aws-eks-blueprints?ref=v4.25.0//modules/kubernetes-addons/karpenter"
addon_context = merge(local.addon_context, { default_repository = local.amazon_container_image_registry_uris[data.aws_region.current.name] })

node_iam_instance_profile = aws_iam_instance_profile.karpenter_node.name

helm_config = {
set = [{
name = "replicas"
value = "1"
}]
}
}

resource "aws_iam_instance_profile" "karpenter_node" {
name = "${local.addon_context.eks_cluster_id}-karpenter-node"
role = aws_iam_role.karpenter_node.name
}

resource "aws_iam_role" "karpenter_node" {
name = "${local.addon_context.eks_cluster_id}-karpenter-node"

assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Sid = ""
Principal = {
Service = "ec2.amazonaws.com"
}
},
]
})

managed_policy_arns = [
"arn:${local.addon_context.aws_partition_id}:iam::aws:policy/AmazonEKS_CNI_Policy",
"arn:${local.addon_context.aws_partition_id}:iam::aws:policy/AmazonEKSWorkerNodePolicy",
"arn:${local.addon_context.aws_partition_id}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
"arn:${local.addon_context.aws_partition_id}:iam::aws:policy/AmazonSSMManagedInstanceCore"
]

tags = local.tags
}

data "http" "neuron_device_plugin_rbac_manifest" {
url = "https://raw.githubusercontent.com/aws-neuron/aws-neuron-sdk/v2.6.0/src/k8/k8s-neuron-device-plugin-rbac.yml"
}

data "http" "neuron_device_plugin_manifest" {
url = "https://raw.githubusercontent.com/aws-neuron/aws-neuron-sdk/v2.6.0/src/k8/k8s-neuron-device-plugin.yml"
}

data "kubectl_file_documents" "neuron_device_plugin_rbac_doc" {
content = data.http.neuron_device_plugin_rbac_manifest.response_body
}

data "kubectl_file_documents" "neuron_device_plugin_doc" {
content = data.http.neuron_device_plugin_manifest.response_body
}

resource "kubectl_manifest" "neuron_device_plugin_rbac" {
for_each = data.kubectl_file_documents.neuron_device_plugin_rbac_doc.manifests
yaml_body = each.value
}

resource "kubectl_manifest" "neuron_device_plugin" {
for_each = data.kubectl_file_documents.neuron_device_plugin_doc.manifests
yaml_body = each.value
}

output "environment" {
value = <<EOF
export AIML_NEURON_ROLE_ARN=${module.iam_assumable_role_inference.iam_role_arn}
export AIML_NEURON_BUCKET_NAME=${resource.aws_s3_bucket.inference.id}
export AIML_DL_IMAGE=763104351884.dkr.ecr.${data.aws_region.current.name}.amazonaws.com/pytorch-inference-neuron:1.13.1-neuron-py310-sdk2.12.0-ubuntu20.04
export AIML_SUBNETS=${data.aws_subnets.private.ids[0]},${data.aws_subnets.private.ids[1]},${data.aws_subnets.private.ids[2]}
export KARPENTER_NODE_ROLE="${aws_iam_role.karpenter_node.arn}"
EOF
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
resource "aws_s3_bucket" "inference" {
bucket_prefix = "eksworkshop-inference"
force_destroy = true

tags = local.tags
}
1 change: 1 addition & 0 deletions manifests/modules/aiml/inferentia/base/config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
AIML_NEURON_ROLE_ARN
25 changes: 25 additions & 0 deletions manifests/modules/aiml/inferentia/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
configMapGenerator:
- name: base-vars
namespace: aiml
env: config.properties
options:
disableNameSuffixHash: true
replacements:
- source:
kind: ConfigMap
name: base-vars
version: v1
namespace: aiml
fieldPath: data.AIML_NEURON_ROLE_ARN
targets:
- select:
kind: ServiceAccount
name: inference
namespace: aiml
fieldPaths:
- metadata.annotations.[eks.amazonaws.com/role-arn]
resources:
- serviceaccount.yaml
- namespace.yaml
4 changes: 4 additions & 0 deletions manifests/modules/aiml/inferentia/base/namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: aiml
7 changes: 7 additions & 0 deletions manifests/modules/aiml/inferentia/base/serviceaccount.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: inference
namespace: aiml
annotations:
eks.amazonaws.com/role-arn: ${AIML_NEURON_ROLE_ARN}
16 changes: 16 additions & 0 deletions manifests/modules/aiml/inferentia/compiler/compiler.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: v1
kind: Pod
metadata:
labels:
role: compiler
name: compiler
namespace: aiml
spec:
containers:
- command:
- sh
- -c
- sleep infinity
image: ${AIML_DL_IMAGE}
name: compiler
serviceAccountName: inference
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
AIML_DL_IMAGE
Loading

0 comments on commit 2fe4244

Please sign in to comment.