Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Degraded managedNodeGroups when using a pathed instanceRoleARN #7846

Open
matschaffer-roblox opened this issue Jun 24, 2024 · 6 comments

Comments

@matschaffer-roblox
Copy link

What were you trying to accomplish?

We launch EKS clusters using instanceRoleARN to attach managed policies (AmazonEKSWorkerNodePolicy, AmazonEKS_CNI_Policy, AmazonEC2ContainerRegistryReadOnly) to our node group instances.

We provided a path on these roles of "/eks/" for organizational purposes. We'd like to be able to manage these node groups, but the pathing seems to cause a degradation in node group health.

What happened?

The cluster creates as expected but after about an hour or so the node group shows up as degraded

Screenshot_2024-06-23_at_9_16_37 PM

Screenshot_2024-06-23_at_9_17_41 PM

It's a little tough to tell with the redactions, but the ARN shown in the "Affected resources" column lacks the /eks/ path prefix.

Removing the path parameter from the role seems to avoid the issue.

How to reproduce it?

We use a eksctl config template like this:

managedNodeGroups:
  - name: stable-{{ .CLUSTER_NAME_WITH_HYPHENS }}
    instanceType: r5.8xlarge
    desiredCapacity: 2
    minSize: 2
    maxSize: 2
    privateNetworking: true
    volumeSize: 40
    volumeType: gp3
    volumeEncrypted: true
    labels:
      stable: "true"
    tags:
      <<: *tags
    iam:
      instanceRoleARN: {{ .STABLE_NODES_ROLE_ARN }}

Where the instance role ARN is "arn:aws:iam::ACCOUNT:role/eks/ROLE_NAME"

Logs

Output from eksctl during creation is normal.

Anything else we need to know?

What OS are you using? macos
Are you using a downloaded binary or did you compile eksctl? downloaded via asdf
What type of AWS credentials are you using (i.e. default/named profile, MFA)? SSO

Versions

❯ eksctl info   
eksctl version: 0.183.0
kubectl version: v1.30.2
OS: darwin
Copy link
Contributor

Hello matschaffer-roblox 👋 Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

@matschaffer-roblox
Copy link
Author

Removing the /eks/ path from the role seems to be a viable workaround (arn:aws:iam::ACCOUNT:role/ROLE_NAME)

AWS support provided some steps for their reproduction of the issue:


Step 1 => I created a trust policy with the below mentioned content:

  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["sts:AssumeRole"],
      "Principal": {
        "Service": ["ec2.amazonaws.com"]
      }
    }
  ]
}

Step 2 => I created a role with path using the below mentioned command:

aws iam create-role --role-name test-node-role --assume-role-policy-document file://assume-role-doc.json --path /eks/

Step 3 => I created an EKS cluster and nodegroup with the below mentioned config file "eksctl create cluster -f test.yaml" :

apiVersion: [eksctl.io/v1alpha5](http://eksctl.io/v1alpha5)
kind: ClusterConfig

metadata:
name: my-cluster2
region: ap-south-1
version: "1.29"

accessConfig:
bootstrapClusterCreatorAdminPermissions: true
authenticationMode: API

managedNodeGroups:
- name: ng-2
instanceType: t3.large
desiredCapacity: 2
volumeSize: 20
iam:
instanceRoleARN: "arn:aws:iam::55555555555:role/eks/test-node-role"

Step 4 => The nodegroup that craeted shows IAM role as arn:aws:iam::55555555555:role/test-node-role" on the EKS console. The access entry that is created automatically has the complete path "/eks/" included but it is stripped from the node group. The CreateNodegroup API call and Cloudformation stack show below mentioned configuration for node role passed:

CFN:
"NodeRole": "arn:aws:iam::55555555555:role/test-node-role",
"NodegroupName": "ng-2",

Cloudtrail:
"nodeRole": "arn:aws:iam::55555555555:role/test-node-role",
"name": "my-cluster2",

So, eksctl seems to be stripping the path from the node role which is eventually leading to health issues on the node with the error "access entry not found in cluster".

Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Jul 28, 2024
@matschaffer-roblox
Copy link
Author

Bump for stalebot

@github-actions github-actions bot removed the stale label Jul 29, 2024
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Aug 28, 2024
@matschaffer-roblox
Copy link
Author

Bump for stalebot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants