Skip to content

Commit

Permalink
chore: Update and run pre-commit
Browse files Browse the repository at this point in the history
  • Loading branch information
bryantbiggs committed Aug 14, 2024
1 parent cbedadb commit 545c01a
Show file tree
Hide file tree
Showing 10 changed files with 39 additions and 42 deletions.
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
repos:
- repo: https://github.com/streetsidesoftware/cspell-cli
rev: v8.10.0
rev: v8.13.3
hooks:
- id: cspell
args: [--exclude, 'ADOPTERS.md', --exclude, '.pre-commit-config.yaml', --exclude, '.gitignore', --exclude, '*.drawio', --exclude, 'mkdocs.yml', --exclude, '.helmignore', --exclude, '.github/workflows/*', --exclude, 'patterns/istio-multi-cluster/*', --exclude, 'patterns/blue-green-upgrade/*']
- repo: https://github.com/macisamuele/language-formatters-pre-commit-hooks
rev: v2.13.0
rev: v2.14.0
hooks:
- id: pretty-format-yaml
args: [--autofix, --indent, '2', --offset, '2', --preserve-quotes]
Expand All @@ -19,7 +19,7 @@ repos:
- id: detect-aws-credentials
args: [--allow-missing-credentials]
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.92.0
rev: v1.92.1
hooks:
- id: terraform_fmt
- id: terraform_docs
Expand Down
2 changes: 1 addition & 1 deletion docs/patterns/bottlerocket.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Bottlerocket
title: Bottlerocket
---

{%
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ spec:
encrypted: true
kmsKeyID: {{ .Values.ec2nodeclass.blockDeviceMappings.ebs.kmsKeyID }}
deleteOnTermination: true
volumeSize: 20Gi
volumeSize: 20Gi
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ spec:
- key: kubernetes.io/os
operator: In
values:
- linux
- linux
4 changes: 2 additions & 2 deletions patterns/bottlerocket/karpenter-resources/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ ec2nodeclass:
role: ""
securityGroupSelectorTerms:
tags:
karpenter.sh/discovery: ""
karpenter.sh/discovery: ""
subnetSelectorTerms:
tags:
karpenter.sh/discovery: ""
karpenter.sh/discovery: ""
blockDeviceMappings:
ebs:
kmsKeyID: ""
Expand Down
2 changes: 1 addition & 1 deletion patterns/ecr-pull-through-cache/addons.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
locals {
ecr_url = "${data.aws_caller_identity.current.account_id}.dkr.ecr.${local.region}.amazonaws.com"
ecr_url = "${data.aws_caller_identity.current.account_id}.dkr.ecr.${local.region}.amazonaws.com"
}

module "eks_blueprints_addons" {
Expand Down
40 changes: 20 additions & 20 deletions patterns/nvidia-gpu-efa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started

!!! note

Desired instance type can be specified in [eks.tf](eks.tf#L36).
Desired instance type can be specified in [eks.tf](eks.tf#L36).
Values shown below will change based on the instance type selected (i.e. - `p5.48xlarge` has 8 GPUs and 32 EFA interfaces).
A list of EFA-enabled instance types is available [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types).
If you are using an on-demand capacity reservation (ODCR) for your instance type, please uncomment the `capacity_reservation_specification` block in `eks.tf`
Expand Down Expand Up @@ -92,15 +92,15 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started

This test prints a list of available EFA interfaces by using the `/opt/amazon/efa/bin/fi_info` utility.
The script [generate-efa-info-test.sh](generate-efa-info-test.sh) creates an MPIJob manifest file named `efa-info-test.yaml`. It assumes that there are two cluster nodes with 8 GPU's per node and 32 EFA adapters. If you are not using `p5.48xlarge` instances in your cluster, you may adjust the settings in the script prior to running it.
`NUM_WORKERS` - number of nodes you want to run the test on
`GPU_PER_WORKER` - number of GPUs available on each node
`EFA_PER_WORKER` - number of EFA interfaces available on each node
```sh
./generate-efa-info-test.sh
```
To start the test apply the generated manifest to the cluster:
```sh
Expand All @@ -109,7 +109,7 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started
```text
mpijob.kubeflow.org/efa-info-test created
```
```
Observe the pods in the current namespace. You should see a launcher pod and worker pods.
It is normal for the launcher pod to restart a few times until the worker pods are fully running.
Expand Down Expand Up @@ -137,7 +137,7 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started
efa-info-test-launcher-wm8pm 0/1 Completed 2 5m20s
```
Once the test launcher pod enters status `Running` or `Completed`,
Once the test launcher pod enters status `Running` or `Completed`,
see the test logs using the command below:
```sh
Expand All @@ -153,9 +153,9 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started
[1,1]<stdout>: version: 120.10
[1,1]<stdout>: type: FI_EP_RDM
[1,1]<stdout>: protocol: FI_PROTO_EFA
...
[1,0]<stdout>:provider: efa
[1,0]<stdout>: fabric: efa
[1,0]<stdout>: domain: rdmap201s0-rdm
Expand All @@ -165,42 +165,42 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started
```
Finally, remove the job:
```sh
kubectl delete -f ./efa-info-test.yaml
```
4. EFA NCCL test
The EFA NCCL test is used to measure network bandwidth by running the `/opt/nccl-tests/build/all_reduce_perf` utility.
The EFA NCCL test is used to measure network bandwidth by running the `/opt/nccl-tests/build/all_reduce_perf` utility.
Create an MPIjob manifest by executing the script below:
```sh
./generate-efa-nccl-test.sh
```
This script creates a file named `efa-nccl-test.yaml`. Apply the manifest to start the EFA nccl test.
```sh
kubectl apply -f ./efa-nccl-test.yaml
```text
mpijob.kubeflow.org/efa-nccl-test created
```
```
Similarly to the EFA info test, a launcher and worker pods will be created. The launcher pod will be
in CrashLoopBackoff mode until the worker pods enter Running state.
in CrashLoopBackoff mode until the worker pods enter Running state.
As soon as the launcher pod enters Running state as well, execute the following command to see the test logs:
```sh
kubectl logs -f $(kubectl get pods | grep launcher | cut -d ' ' -f 1)
```
```text
...
[1,0]<stdout>:# out-of-place in-place
[1,0]<stdout>:# out-of-place in-place
[1,0]<stdout>:# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
[1,0]<stdout>:# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
[1,0]<stdout>:# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
[1,0]<stdout>: 0 0 float sum -1 0.13 0.00 0.00 0 0.12 0.00 0.00 0
[1,0]<stdout>: 0 0 float sum -1 0.12 0.00 0.00 0 0.12 0.00 0.00 0
[1,0]<stdout>: 4 1 float sum -1 65.43 0.00 0.00 0 65.82 0.00 0.00 0
Expand Down Expand Up @@ -234,17 +234,17 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started
[1,0]<stdout>: 1073741824 268435456 float sum -1 4553.6 235.80 442.13 0 4553.0 235.83 442.19 0
[1,0]<stdout>: 2147483648 536870912 float sum -1 9062.5 236.96 444.31 0 9060.4 237.02 444.41 0
[1,0]<stdout>:# Out of bounds values : 0 OK
[1,0]<stdout>:# Avg bus bandwidth : 79.9352
[1,0]<stdout>:# Avg bus bandwidth : 79.9352
[1,0]<stdout>:#
```
Columns 9 and 13 in the output table show the in-place and out-of-place bus bandwidth calculated for the data size listed in column 2.
Columns 9 and 13 in the output table show the in-place and out-of-place bus bandwidth calculated for the data size listed in column 2.
In this case it is at maximum 444.31 and 444.41 GB/s respectively.
Your actual results may be slightly different. The calculated average bus bandwidth is displayed at the end of the log.
In this test run the average bus bandwidth was 79.9352 GB/s.
Lastly, delete the MPIJob:
```sh
kubectl delete -f ./efa-nccl-test.yaml
```
Expand Down
1 change: 0 additions & 1 deletion patterns/nvidia-gpu-efa/generate-efa-info-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,3 @@ spec:
nvidia.com/gpu: ${GPU_PER_WORKER}
vpc.amazonaws.com/efa: ${EFA_PER_WORKER}
EOF

Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ In this setup, we used a Kyverno rule to inject iptables rules, and an envoy sid
kubectl --context eks-cluster1 exec -it deploy/demo-cluster1-v1 -c envoy-sigv4 -n apps -- cat /usr/local/bin/launch_envoy.sh
```

Output:
Output:
```
#!/bin/sh

Expand Down Expand Up @@ -187,7 +187,7 @@ In this setup, we used a Kyverno rule to inject iptables rules, and an envoy sid

> If the VPC was not able to destroy, you may want to re-run the destroy command a second time

If the VPC lattice service network still exists, you can remove it with the following command:
If the VPC lattice service network still exists, you can remove it with the following command:

```bash
SN=$(aws vpc-lattice list-service-networks --query 'items[?name==`lattice-gateway`].id' --output text)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,12 @@ spec:
securityContext:
runAsGroup: 0
env:
- name: APP_DOMAIN
value: "example.com"
- name: CA_ARN
value: "{{ .Values.acmpCAArn }}"
args: [
"-l", "info"
]
- name: APP_DOMAIN
value: "example.com"
- name: CA_ARN
value: "{{ .Values.acmpCAArn }}"
args: ["-l", "info"]
ports:
- containerPort: 8080
name: proxy
protocol: TCP
- containerPort: 8080
name: proxy
protocol: TCP

0 comments on commit 545c01a

Please sign in to comment.