Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Replicationgroup.elasticache.aws.upbound.io in async after aws provider upgrade from 1.3.1 to 1.6.0 #1370

Open
1 task done
fola-ooo opened this issue Jun 18, 2024 · 8 comments
Labels
bug Something isn't working needs:triage

Comments

@fola-ooo
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Affected Resource(s)

ReplicationGroup.elasticache.aws.upbound.io/v1beta2

Resource MRs required to reproduce the bug

apiVersion: elasticache.aws.upbound.io/v1beta2
kind: ReplicationGroup
metadata:
  name: ***-app
 spec:
  deletionPolicy: Delete
  forProvider:
    autoMinorVersionUpgrade: "true"
    automaticFailoverEnabled: false
    description: Elasticache-ReplicationGroup
    engine: redis
    engineVersion: "7.1"
    ipDiscovery: ipv4
    maintenanceWindow: mon:08:00-mon:11:00
    multiAzEnabled: false
    networkType: ipv4
    nodeType: cache.t2.micro
    numCacheClusters: 1
    parameterGroupName: default.redis7
    port: 6379
    region: eu-central-1
    replicasPerNodeGroup: 0
    snapshotWindow: 03:00-04:00
    subnetGroupName: ***-app
    subnetGroupNameRef:
      name: ***-app
    subnetGroupNameSelector:
      matchControllerRef: true
  providerConfigRef:
    name: provider-aws
  writeConnectionSecretToRef:
    name: ***-app-elasticache-connection
    namespace: default
 

Steps to Reproduce

Using the manifest above, create replication group with all upbound providers and aws family in version 1.3.1. Upgrade elasticache provider to 1.6.0

What happened?

Replication groups went into Async state

Relevant Error Output Snippet

conditions:
  - lastTransitionTime: "2024-06-18T14:34:01Z"
    message: "update failed: async update failed: failed to update the resource: [{0
      modifying ElastiCache Replication Group (***-app) authentication: InvalidParameterValue:
      Invalid AUTH token provided. Please check valid AUTH token format.\n\tstatus
      code: 400, request id: fdf564b9-8fee-4ada-b54d-737dc0bbb738  []}]"
    reason: ReconcileError
    status: "False"
    type: Synced
  - lastTransitionTime: "2024-06-18T14:30:30Z"
    reason: Available
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-06-18T14:34:01Z"
    message: "async update failed: failed to update the resource: [{0 modifying ElastiCache
      Replication Group (***-app) authentication: InvalidParameterValue: Invalid
      AUTH token provided. Please check valid AUTH token format.\n\tstatus code: 400,
      request id: fdf564b9-8fee-4ada-b54d-737dc0bbb738  []}]"
    reason: AsyncUpdateFailure
    status: "False"
    type: LastAsyncOperation

Crossplane Version

1.15.3

Provider Version

1.6.0

Kubernetes Version

1.29.2

Kubernetes Distribution

EKS

Additional Info

No response

@fola-ooo fola-ooo added bug Something isn't working needs:triage labels Jun 18, 2024
@caiofralmeida
Copy link

@fola-ooo At version v1.6.1 there is a new field autoGenerateAuthToken, maybe trying to force as false, solves this issue.

@dbs-gong
Copy link

dbs-gong commented Jun 23, 2024

this issue is also apperent in on 1.7.0
it is no longer possible to configure a redis-elasticache cluster if usergroups are assigned and authtoken is disabled ,
after the creation process the replicationgroup will go out of sync with the fallowing error ,

Warning  CannotUpdateExternalResource  2m13s (x46 over 41m)  managed/elasticache.aws.upbound.io/v1beta2, kind=replicationgroup  (combined from similar events): async update failed: failed to update the resource: [{0 modifying ElastiCache Replication Group (new-test-v2) authentication: InvalidParameterCombination: Auth tokens can't be enabled with a user group already associated. Pass RemoveUserGroups to proceed.
           status code: 400, request id: ZZZZ

i have tested all combinations to allow this object to be synced and ready ,
here is my ForProvider config :

For Provider:
    Apply Immediately:           false
    At Rest Encryption Enabled:  true
    Auth Token Secret Ref:
      Key:
      Name:
      Namespace:
    Auto Generate Auth Token:    false
    Auto Minor Version Upgrade:  true
    Automatic Failover Enabled:  true
    Description:                 new-test-v2
    Engine:                      redis
    Engine Version:              7.1
    Ip Discovery:                ipv4
    Maintenance Window:          sun:05:00-sun:09:00
    Network Type:                ipv4
    Node Type:                   cache.t4g.micro
    Num Node Groups:             3
    Parameter Group Name:        new-test-v2-parameter-group
    Port:                        6379
    Region:                      eu-west-1
    Replicas Per Node Group:     1
    Security Group Id Refs:
      Name:  <REMOVED>
    Security Group Id Selector:
      Match Controller Ref:  true
    Security Group Ids:
      <REMOVED>
    Snapshot Retention Limit:  0
    Snapshot Window:           00:00-01:00
    Subnet Group Name:         new-test-v2-subnet-group
    Subnet Group Name Ref:
      Name:  new-test-v2-subnet-group
    Subnet Group Name Selector:
      Match Controller Ref:  true
    Tags:
      Crossplane - Kind:            replicationgroup.elasticache.aws.upbound.io
      Crossplane - Name:            new-test-v2
      Crossplane - Providerconfig:  default
    Transit Encryption Enabled:     true
    Transit Encryption Mode:        required
    User Group Ids:
      new-test-v2-user-group
  Init Provider:
    User Group Ids:
      new-test-v2-user-group
  Management Policies:
    *
  Provider Config Ref:
    Name:  default
Status:
  At Provider:
    Apply Immediately:               false
    Arn:                             <REMOVED>
    At Rest Encryption Enabled:      true
    Auto Minor Version Upgrade:      true
    Automatic Failover Enabled:      true
    Cluster Enabled:                 true
    Configuration Endpoint Address:  <REMOVED>
    Data Tiering Enabled:            false
    Description:                     new-test-v2
    Engine:                          redis
    Engine Version:                  7.1
    Engine Version Actual:           7.1.0
    Id:                              new-test-v2
    Ip Discovery:                    ipv4
    Kms Key Id:
    Maintenance Window:              sun:05:00-sun:09:00
    Member Clusters:
      new-test-v2-0001-001
      new-test-v2-0001-002
      new-test-v2-0002-001
      new-test-v2-0002-002
      new-test-v2-0003-001
      new-test-v2-0003-002
    Multi Az Enabled:         false
    Network Type:             ipv4
    Node Type:                cache.t4g.micro
    Num Cache Clusters:       6
    Num Node Groups:          3
    Parameter Group Name:     new-test-v2-parameter-group
    Port:                     6379
    Replicas Per Node Group:  1
    Security Group Ids:
      <REMOVED>
    Snapshot Retention Limit:  0
    Snapshot Window:           00:00-01:00
    Subnet Group Name:         new-test-v2-subnet-group
    Tags:
      Crossplane - Kind:            replicationgroup.elasticache.aws.upbound.io
      Crossplane - Name:            new-test-v2
      Crossplane - Providerconfig:  default
    Tags All:
      Crossplane - Kind:            replicationgroup.elasticache.aws.upbound.io
      Crossplane - Name:            new-test-v2
      Crossplane - Providerconfig:  default
    Transit Encryption Enabled:     true
    Transit Encryption Mode:        required
    User Group Ids:
      new-test-v2-user-group
  Conditions:
    Last Transition Time:  2024-06-23T13:15:26Z
    Reason:                Available
    Status:                True
    Type:                  Ready
    Last Transition Time:  2024-06-23T15:17:53Z
    Message:               update failed: async update failed: failed to update the resource: [{0 modifying ElastiCache Replication Group (new-test-v2) authentication: InvalidParameterCombination: Auth tokens can't be enabled with a user group already associated. Pass RemoveUserGroups to proceed.
                           status code: 400, request id: <REMOVED>  []}]
    Reason:                ReconcileError
    Status:                False
    Type:                  Synced
    Last Transition Time:  2024-06-23T15:17:53Z
    Message:               async update failed: failed to update the resource: [{0 modifying ElastiCache Replication Group (new-test-v2) authentication: InvalidParameterCombination: Auth tokens can't be enabled with a user group already associated. Pass RemoveUserGroups to proceed.

tried -

  1. not passing autoGenerateAuthToken
  2. passing autoGenerateAuthToken:false
  3. passing empty reference to Auth Token Secret Ref
  4. not passing any auth parameters .
  5. passing the userGroupId's only at initProvider and not in forProvider

all these attempts resulted with the same scenario - which after creation of replicationgroup fails the reconcile step .
this is critical since in this state you cannot change any other setting of replicationgroup - like scaling - or upgrading version

please help

@WolfGanGeRTech
Copy link

Facing the same issue.

@mbbush
Copy link
Collaborator

mbbush commented Aug 9, 2024

I believe this is caused by hashicorp/terraform-provider-aws#38209

@mbbush
Copy link
Collaborator

mbbush commented Aug 9, 2024

I reproduced this issue on v1.11.0.

At least for my use case, I can work around this by setting spec.initProvider.authTokenUpdateStrategy to "". I would expect setting the same value in spec.forProvider would also work.

Can some of the affected users with different configurations try setting that parameter and see if it helps?

What's going on is that the terraform provider introduced a breaking change when they added the authTokenUpdateStrategy with a default value of "ROTATE", because that created a diff on existing resources to try to update the parameter value from "" to "ROTATE". Because of that diff, the terraform provider tries to set the auth token, which fails, because there's no auth token. Explicitly setting authTokenUpdateStrategy overrides the default in the terraform provider.

@WolfGanGeRTech
Copy link

Hello @mbbush ,

Thanks for looking into this issue.

In fact that configuration allows importing existing Elastic caches without the need to set the AuthToken, however by adding the "initProvider" config to the composition it will break the creation of new elastic caches cause it will always create them without AuthToken, meaning new elastic cache will always be created without credentials.

Ideally we should be able to:

  1. Create new Elastic Cache with credentials by setting the auth token.
  2. Import existing elasticaches without the need to set the auth token (for instance the elasticaches created using autoGenerateAuthToken: true)

Copy link

github-actions bot commented Nov 8, 2024

This provider repo does not have enough maintainers to address every issue. Since there has been no activity in the last 90 days it is now marked as stale. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

@github-actions github-actions bot added the stale label Nov 8, 2024
@alexinthesky
Copy link
Contributor

alexinthesky commented Nov 20, 2024

I reproduced this issue on v1.11.0.

At least for my use case, I can work around this by setting spec.initProvider.authTokenUpdateStrategy to "". I would expect setting the same value in spec.forProvider would also work.

Can some of the affected users with different configurations try setting that parameter and see if it helps?

What's going on is that the terraform provider introduced a breaking change when they added the authTokenUpdateStrategy with a default value of "ROTATE", because that created a diff on existing resources to try to update the parameter value from "" to "ROTATE". Because of that diff, the terraform provider tries to set the auth token, which fails, because there's no auth token. Explicitly setting authTokenUpdateStrategy overrides the default in the terraform provider.

Hi I'm trying to implement your work around. It works fine when directly applying / editing the ReplicationGroup resource
BUT I can't seem to have this work within a composition because crossplane removes entirely keys with empty value fields. any idea?

@github-actions github-actions bot removed the stale label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs:triage
Projects
None yet
Development

No branches or pull requests

6 participants