Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 1.3.0 aws-provider breaks pod identity credential resolution #1252

Closed
1 task done
david-kirby opened this issue Apr 3, 2024 · 6 comments
Closed
1 task done
Labels
bug Something isn't working

Comments

@david-kirby
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Affected Resource(s)

  • aws.upbound.io/v1beta1
  • pkg.crossplane.io/v1
  • sqs.aws.upbound.io/v1beta1 - queue

Resource MRs required to reproduce the bug

In short, the below manifests:

  1. setup an aws-sqs provider that gets patched to have a service account named crossplane
  2. sets up the aws-provider config as prod and to use IRSA for credentials (IMPORTANT NOTE: This is working with a Pod Association configuration in EKS. The IAM role is not configured for IRSA and instead it's configured for Pod Identity)
  3. deploys a simple sqs queue
apiVersion: sqs.aws.upbound.io/v1beta1
kind: Queue
metadata:
  name: my-queue
spec:
  forProvider:
    name: my-queue
    region: us-east-1
  providerConfigRef:
    name: prod
---
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: prod
spec:
  credentials:
    source: IRSA
---
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-aws-sqs
spec:
  package: xpkg.upbound.io/upbound/provider-aws-sqs:v1.2.1
  runtimeConfigRef:
    name: patch-service-account
---
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: patch-service-account
spec:
  deploymentTemplate:
    spec:
      selector: {}
      template:
        spec:
          serviceAccountName: crossplane
          containers: []

Steps to Reproduce

I had the above manifests deployed, but upon updating the sqs provider to 1.3.0 and deploying a new queue or deleting/recreating the example, the queue cannot be created.

What happened?

Using version 1.2.1, the queue is created. After upgrading to 1.3.0, the resource does not get created. I tested this with other resources as well (sns, eks:clusterauth) and encountered the same error message whenever the provider version being used was 1.3.0

Relevant Error Output Snippet

message: 'connect failed: cannot initialize the Terraform plugin SDK async external
        client: cannot get terraform setup: cache manager failure: cannot calculate
        the hash for the credentials file: token file name cannot be empty'


### Crossplane Version

1.15.1

### Provider Version

1.3.0

### Kubernetes Version

_No response_

### Kubernetes Distribution

_No response_

### Additional Info

_No response_
@david-kirby david-kirby added the bug Something isn't working label Apr 3, 2024
@haarchri
Copy link
Member

haarchri commented Apr 3, 2024

Can you try the following:

apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: demo-pod-identity
spec:
  credentials:
    source: WebIdentity
    webIdentity:
      roleARN: arn:aws:iam::12345678910:role/demo
      tokenConfig:
        fs:
          path: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token
        source: Filesystem

@david-kirby david-kirby changed the title [Bug]: ProviderConfig [Bug]: 1.3.0 aws-provider breaks pod identity credential resolution Apr 3, 2024
@david-kirby
Copy link
Author

david-kirby commented Apr 3, 2024

I tried the above ProviderConfig using the 1.3.0 aws-sqs provider and also updated my IAM role to allow the sts:AssumeRoleWithWebIdentity action and received this error after trying to create the sqs queue again:

    message: 'connect failed: cannot initialize the Terraform plugin SDK async external
      client: cannot get terraform setup: cache manager failure: cannot retrieve the
      AWS credentials: failed to refresh cached credentials, failed to retrieve credentials,
      operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts,
      3, https response error StatusCode: 400, RequestID: 3abbae7e-25fa-4c0f-915d-929ae090c3f2,
      InvalidIdentityToken: Incorrect token audience

Double checked I have the audience set correctly on the AWS OIDC provider for the cluster

    "ClientIDList": [
        "sts.amazonaws.com"
    ],

@david-kirby
Copy link
Author

david-kirby commented Apr 3, 2024

Alrighty I think I got this fixed after decoding the token here: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token it's showing an audience of pods.eks.amazonaws.com

Once I added that to my OIDC configuration for the cluster the sqs queue was able to be created.
This command aws iam get-open-id-connect-provider --open-id-connect-provider-arn <USE_ARN> should return

   "ClientIDList": [
        "sts.amazonaws.com",
        "pods.eks.amazonaws.com"
    ],

So for anyone else trying to use pod identity here's my final setup to get it working with 1.3.0

  1. Your OIDC provider must have pods.eks.amazonaws.com in the Audiences list
  2. Create an IAM role (i.e. crossplane-role) with trust policy for AssumeRoleWithWebIdentity and pods.eks.amazonaws.com
  3. Since each aws service provider you install (i.e. aws-sqs, aws-sns, etc) will have it's own unique service account name, I opted to patch it so that all of them could share the same crossplane service account. This way I create a single PodIdentity association, mapping IAM crossplane-role with crossplane service account
  4. Deploy a 1.3.0 AWS service provider
  5. Deploy a ProviderConfig with the WebIdentity configuration that @haarchri suggested
  6. Deploy the sqs queue
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: patch-service-account
spec:
  deploymentTemplate:
    spec:
      selector: {}
      template:
        spec:
          serviceAccountName: crossplane
          containers: []
---
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-aws-sqs
spec:
  package: xpkg.upbound.io/upbound/provider-aws-sqs:v1.3.0
  runtimeConfigRef:
    name: patch-service-account
---
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: prod
spec:
  credentials:
    source: WebIdentity
    webIdentity:
      roleARN: arn:aws:iam::<ACCOUNT_ID>:role/crossplane-role
      tokenConfig:
        fs:
          path: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token
        source: Filesystem
---
apiVersion: sqs.aws.upbound.io/v1beta1
kind: Queue
metadata:
  name: demo-queue
spec:
  forProvider:
    name: demo-queue
    region: us-east-1
  providerConfigRef:
    name: prod

IAM role trust policy for my crossplane-role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/UNIQUE_ID"
            },
            "Action": "sts:AssumeRoleWithWebIdentity"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "pods.eks.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:TagSession"
            ]
        }
    ]
}

@truongnht
Copy link

@david-kirby your solution would work, but then it is not pod identity solution, rather a workaround that fixes pod identity for IRSA (since I still see the IRSA oidc).

@magic-madrigal
Copy link

magic-madrigal commented Aug 8, 2024

Thank you @david-kirby for sharing this information. Not sure if I did something incorrectly, but to get this fully working, I needed to have a different trust relationship on the IAM role.

This is what mine ended up looking like.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/<UNIQUE_ID>"
			},
			"Action": "sts:AssumeRoleWithWebIdentity"
		},
		{
			"Effect": "Allow",
			"Principal": {
				"Service": "pods.eks.amazonaws.com"
			},
			"Action": [
				"sts:AssumeRole",
				"sts:TagSession"
			]
		},
		{
			"Effect": "Allow",
			"Action": "sts:AssumeRoleWithWebIdentity",
			"Principal": {
				"Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.us-west-1.amazonaws.com/id/<UNIQUE_ID>"
			},
			"Condition": {
				"StringEquals": {
					"oidc.eks.us-west-1.amazonaws.com/id/<UNIQUE_ID>:aud": [
						"pods.eks.amazonaws.com"
					]
				}
			}
		}
	]
}

@haarchri
Copy link
Member

pod identity option: #1459

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants