Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HyperPod nodes #1175

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

surajkota
Copy link

@surajkota surajkota commented Oct 23, 2024

Description of changes

SageMaker HyperPod recently launched EKS integration. This commit adds SageMaker instance types and toleration for running DeepHealthChecks so customers can install EFA helm chart without modifications unless required

Checklist

  • Added/modified documentation as required (such as the README.md for modified charts)
  • Incremented the chart version in Chart.yaml for the modified chart(s)
  • Manually tested. Describe what testing was done in the testing section below
  • Make sure the title of the PR is a good description that can go into the release notes

Testing

Installed EFA driver from my github branch and verified pods are scheduled on g5.8x instances for EC2 and Hyperpods without modifying the chart. Didnt run workloads as I didnt change the deamonset itself

helm install aws-efa-k8s-device-plugin .
NAME: aws-efa-k8s-device-plugin
LAST DEPLOYED: Wed Oct 23 09:23:44 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
EFA device plugin is installed, it can be requested as `vpc.amazonaws.com/efa` resource.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

SageMaker HyperPod recently launched EKS integration. This commit adds SageMaker instance types and toleration for running DeepHealthChecks.
@surajkota surajkota requested a review from dims as a code owner October 23, 2024 15:20
@bryantbiggs
Copy link
Member

partial dupe of #1129

@surajkota
Copy link
Author

Ack, I can work with Nathan to close the other PR

@surajkota
Copy link
Author

surajkota commented Oct 24, 2024

@bryantbiggs is there also a kustomize version of EFA device plugin maintained by EKS/AWS? or any other high traffic download sources where similar change should also go?

I checked the aws-samples repo and its depreciated in favor or this chart.

@bryantbiggs
Copy link
Member

no - this is the source of truth for deploying the EFA device plugin into an EKS cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants