Add support for msk connect resources #1162

mbbush · 2024-02-16T05:00:53Z

Description of your changes

Adds support for the three Msk Connect resources.

Note: AWS has very limited api support for these resources. Other than the scaling object, every field on the terraform schema has ForceNew: true, so from the crossplane perspective, the resources are mostly immutable once created. If you want to make a change, you'll have to delete it and make a new one.

The WorkerConfiguration also has no delete method in the aws API, so once you've created a config with a given name, that's it, forever.

There is some surprising lifecycle behavior around the Connector resource, that I'm not sure if it's possible to resolve.

The AWS api exposes the State of a connector, which can be either "Creating", "Running", "Deleting" or "Failed". The terraform provider uses this state to determine whether a creation call succeeds, but does not expose it in the observed status as a property of the resource, so I can't think of a way to get it into crossplane.

When I create a Connector resource that is broken (as in the connector-nokafka.yaml example, where the kafka cluster doesn't exist), what happens is that the create call runs for just over 20 minutes (I had to increase the timeout slightly from the 20 minutes configured in the terraform provider), and then fails, and sets the following conditions:

  - lastTransitionTime: "2024-02-17T23:47:54Z"
    reason: ReconcileSuccess
    status: "True"
    type: Synced
  - lastTransitionTime: "2024-02-17T23:47:54Z"
    reason: Creating
    status: "False"
    type: Ready
  - lastTransitionTime: "2024-02-18T00:08:23Z"
    message: 'async create failed: failed to create the resource: [{0 waiting for
      MSK Connect Connector (arn:aws:kafkaconnect:us-east-2:905418119848:connector/connector-broken/4f63f714-30c2-4dd7-bf0c-edced7778d4e-2)
      create: unexpected state ''FAILED'', wanted target ''RUNNING''. last error:
      UnknownError.Unknown: The last operation failed. Retry the operation.  []}]'
    reason: AsyncCreateFailure
    status: "False"
    type: LastAsyncOperation

The provider then immediately runs an Observe, which finds that the connector does exist, and that its status matches the spec (since the State is not exposed in the terraform schema), and decides that creation must have succeeded after all, filling in status.atProvider and setting the following conditions:

  - lastTransitionTime: "2024-02-17T23:47:54Z"
    reason: ReconcileSuccess
    status: "True"
    type: Synced
  - lastTransitionTime: "2024-02-18T00:08:25Z"
    reason: Available
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-02-18T00:08:25Z"
    reason: Success
    status: "True"
    type: LastAsyncOperation
  - lastTransitionTime: "2024-02-18T00:08:25Z"
    reason: UpToDate
    status: "True"
    type: Test

I'm not quite sure how best to reconcile this behavior of the aws resource, and the terraform provider, with what crossplane expects.

Fixes #374
Fixes #1146

I have:

Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

Manually, in kind, using the manifests I committed. I was able to get the connector to enter a Running state, in which it did nothing because there was no traffic, and no actual working code to run, but the infrastructure was all healthy.

I'll run uptest on the manifests that don't require manual intervention.

Output from local run with the connector.yaml example:

NAME                                       READY   SYNCED   EXTERNAL-NAME   AGE
group.cloudwatchlogs.aws.upbound.io/test   True    True     op-vggf5w1o     154m

NAME                                           READY   SYNCED   EXTERNAL-NAME       AGE
securitygrouprule.ec2.aws.upbound.io/egress    True    True     sgrule-2572981504   154m
securitygrouprule.ec2.aws.upbound.io/ingress   True    True     sgrule-2191228865   154m

NAME                                  READY   SYNCED   EXTERNAL-NAME          AGE
securitygroup.ec2.aws.upbound.io/sg   True    True     sg-0b03fcec08eeca5d9   154m

NAME                                   READY   SYNCED   EXTERNAL-NAME              AGE
subnet.ec2.aws.upbound.io/subnet-az1   True    True     subnet-04e98dd98ea750f9f   154m
subnet.ec2.aws.upbound.io/subnet-az2   True    True     subnet-040975500b81b55f1   154m

NAME                         READY   SYNCED   EXTERNAL-NAME           AGE
vpc.ec2.aws.upbound.io/vpc   True    True     vpc-0319cb8653512d954   154m

NAME                                               READY   SYNCED   EXTERNAL-NAME                                                             AGE
deliverystream.firehose.aws.upbound.io/connector   True    True     arn:aws:firehose:us-east-2:226209437353:deliverystream/example-absdfsdf   154m

NAME                                READY   SYNCED   EXTERNAL-NAME   AGE
role.iam.aws.upbound.io/connector   True    True     connector       154m

NAME                                   READY   SYNCED   EXTERNAL-NAME                                                                                 AGE
cluster.kafka.aws.upbound.io/example   True    True     arn:aws:kafka:us-east-2:226209437353:cluster/example/d9f0e042-f520-4e34-a6d6-1ac5fce3e82d-4   154m

NAME                                         READY   SYNCED   EXTERNAL-NAME                                                                                       AGE
configuration.kafka.aws.upbound.io/example   True    True     arn:aws:kafka:us-east-2:226209437353:configuration/example/eae7df28-446c-4c93-9ecb-ab0737de0e01-4   154m

NAME                                                     READY   SYNCED   EXTERNAL-NAME                            AGE
connector.kafkaconnect.aws.upbound.io/connector          True    True     c6c84280-7796-42cc-9f4b-42607692741a-2   29m

NAME                                               READY   SYNCED   EXTERNAL-NAME                            AGE
customplugin.kafkaconnect.aws.upbound.io/example   True    True     29fec958-2694-4251-9a62-b2f7de64f7fc-2   154m

NAME                                                        READY   SYNCED   EXTERNAL-NAME                            AGE
workerconfiguration.kafkaconnect.aws.upbound.io/connector   True    True     2d5604c3-ec17-4394-be2f-8b07131963c3-2   154m

NAME                                 READY   SYNCED   EXTERNAL-NAME   AGE
bucket.s3.aws.upbound.io/connector   True    True     op-bonq5b6d     154m

NAME                               READY   SYNCED   EXTERNAL-NAME   AGE
object.s3.aws.upbound.io/example   True    True     empty.zip       154m

mbbush · 2024-02-16T07:18:42Z

/test-examples="examples/kafkaconnect/v1beta1/customplugin.yaml"

mbbush · 2024-02-16T07:18:51Z

/test-examples="examples/kafkaconnect/v1beta1/workerconfiguration.yaml"

mbbush · 2024-02-16T07:19:36Z

/test-examples="examples/kafkaconnect/v1beta1/connector-broken.yaml"

mbbush · 2024-02-16T07:54:02Z

/test-examples="examples/kafkaconnect/v1beta1/connector-broken.yaml"

The previous run actually failed because of the aws account hitting a VPC limit. It would be great if someone from upbound could submit some AWS quota increase requests for the number of VPCs per region.

mbbush · 2024-03-01T18:28:31Z

/test-examples="examples/kafkaconnect/v1beta1/connector-broken.yaml"

mbbush · 2024-03-01T19:27:15Z

The previous run failed with

2024-03-01T19:10:45.0295557Z     logger.go:42: 19:10:44 | case/0-apply |       message: 'connect failed: cannot initialize the Terraform plugin SDK async external
2024-03-01T19:10:45.0296976Z     logger.go:42: 19:10:44 | case/0-apply |         client: cannot get terraform setup: cannot get referenced Provider: default:
2024-03-01T19:10:45.0298532Z     logger.go:42: 19:10:44 | case/0-apply |         Timeout: failed waiting for *v1beta1.ProviderConfig Informer to sync'

I haven't seen this particular failure before, but it doesn't seem at all related to this PR.

Maybe it will pass a second time?

/test-examples="examples/kafkaconnect/v1beta1/connector-broken.yaml"

mbbush · 2024-03-02T01:18:04Z

The previous failure was caused by upbound/universal-crossplane#442. Pinning the UXP version in the makefile resolved the issue.

/test-examples="examples/kafkaconnect/v1beta1/connector-nokafka.yaml"

mbbush · 2024-03-02T04:22:21Z

Uptest is failing because of too many VPCs. This is a problem in the upbound AWS account, which I don't have access to.

@jeanduplessis Could you please take a look at why cloud-nuke doesn't seem to be cleaning them up? Requesting a limit increase from AWS on the number of VPCs per region would also be very helpful.

jeanduplessis · 2024-03-04T17:18:39Z

/test-examples="examples/kafkaconnect/v1beta1/connector-nokafka.yaml"

jeanduplessis · 2024-03-04T17:22:02Z

@mbbush its should be fine now again. Ping me directly if the uptest run fails again with that issue.

mbbush · 2024-03-05T00:08:26Z

/test-examples="examples/kafkaconnect/v1beta1/connector-nokafka.yaml"

Makefile

mbbush · 2024-03-06T18:25:17Z

I believe this is ready for review. @turkenf do you think you could take a look before the next provider-aws release?

I don't see any path forward to improving the behavior of the provider when creating a Connector results in creating a connector in state Failed in aws. The state is not exposed by the tf provider, so the only way I could even possibly query it would be to extract the aws client from the provider and make an explicit API call.

I tried that, and was able to extract the aws client, but it was wrapped in a different type, and I couldn't figure out how to extract the underlying aws sdk client (it's probably possibly, I just don't know how). But regardless of whether it's possible I'm not sure if it's wise.

My preference would be to merge this as-is, and release it, with the known issue that the provider does not expose error states with the connector. If there's consensus that it is wise to somehow add the connector's status field to the schema by invoking the SDK client directly, then I can work on that for a future release, as it would be a non-breaking change.

Nothing significant has changed, so I'm confident the e2e tests will still pass. I'll rerun the ones that are testable automatically, but the full example which requires manual intervention is kind of a pain to run and takes a very long time, so I'd rather only do it one more time after this is otherwise approved.

mbbush · 2024-03-06T18:25:36Z

/test-examples="examples/kafkaconnect/v1beta1/customplugin.yaml"

mbbush · 2024-03-11T02:14:38Z

/test-examples="examples/kafkaconnect/v1beta1/connector-nokafka.yaml"

jeanduplessis · 2024-03-14T18:02:52Z

/test-examples="examples/kafkaconnect/v1beta1/connector-nokafka.yaml"

turkenf

Thank you for your effort in this PR @mbbush, I left a few initial review comments for you to consider.

config/kafkaconnect/config.go

examples/kafkaconnect/v1beta1/connector-nokafka.yaml

examples/kafkaconnect/v1beta1/connector.yaml

turkenf

Thank you @mbbush, we can merge after resolving the conflicts.

Signed-off-by: Matt Bush <[email protected]>

Signed-off-by: Matt Bush <[email protected]> Co-authored-by: Fatih Türken <[email protected]>

mbbush · 2024-03-28T16:36:11Z

@turkenf Rebased and resolved conflicts. It's not clear to me if you've already started the process of releasing v1.3.0. It would be great if this made it in, but it's not a huge problem for me if it doesn't.

Signed-off-by: Matt Bush <[email protected]>

turkenf · 2024-03-28T16:55:38Z

@mbbush, we will include this PR.

mbbush · 2024-03-28T16:58:53Z

Thanks! I'm rebasing/committing from the train on my way in to the office, so I didn't have the bandwidth to run make generate locally. But I applied the diff from the failed check-diff ci job using patch, so it should pass now.

turkenf · 2024-03-28T17:06:34Z

/test-examples="examples/ec2/v1beta1/vpc.yaml"

Uptest run: https://github.com/crossplane-contrib/provider-upjet-aws/actions/runs/8470993317

mbbush requested review from ulucinar, sergenyalcin and turkenf as code owners February 16, 2024 05:00

mbbush force-pushed the msk-connect branch from a841323 to e02c6e2 Compare February 16, 2024 05:10

mbbush force-pushed the msk-connect branch from e02c6e2 to 47271ae Compare March 2, 2024 01:15

turkenf reviewed Mar 5, 2024

View reviewed changes

Makefile Outdated Show resolved Hide resolved

mbbush force-pushed the msk-connect branch from 201cd02 to e5f6bd2 Compare March 6, 2024 18:13

mbbush force-pushed the msk-connect branch from e5f6bd2 to 0a0b6a2 Compare March 14, 2024 01:16

turkenf reviewed Mar 14, 2024

View reviewed changes

mbbush force-pushed the msk-connect branch 2 times, most recently from 4e6dfb0 to 7a22d4a Compare March 16, 2024 06:42

mbbush requested a review from turkenf March 18, 2024 18:02

turkenf approved these changes Mar 28, 2024

View reviewed changes

mbbush added 2 commits March 28, 2024 09:31

add msk connect resources with normalized external names

0122793

Signed-off-by: Matt Bush <[email protected]>

Fix idiosynchrasies of msk connect resources

9fcd787

Signed-off-by: Matt Bush <[email protected]>

mbbush and others added 5 commits March 28, 2024 09:31

examples for msk connect resources

6b986a8

Signed-off-by: Matt Bush <[email protected]>

reduce timeout on connector-nokafka example

c5d4e6c

Signed-off-by: Matt Bush <[email protected]>

clean up examples

55a50c7

Signed-off-by: Matt Bush <[email protected]>

Document reason for timeout

600e04d

Signed-off-by: Matt Bush <[email protected]>

Fix example id

06e7f3c

Signed-off-by: Matt Bush <[email protected]> Co-authored-by: Fatih Türken <[email protected]>

mbbush force-pushed the msk-connect branch from 7a22d4a to c0ee039 Compare March 28, 2024 16:34

codegen

7387a5f

Signed-off-by: Matt Bush <[email protected]>

mbbush force-pushed the msk-connect branch from c0ee039 to 7387a5f Compare March 28, 2024 16:50

turkenf merged commit 9c77960 into crossplane-contrib:main Mar 28, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for msk connect resources #1162

Add support for msk connect resources #1162

mbbush commented Feb 16, 2024 •

edited

Loading

mbbush commented Feb 16, 2024

mbbush commented Feb 16, 2024

mbbush commented Feb 16, 2024

mbbush commented Feb 16, 2024

mbbush commented Mar 1, 2024

mbbush commented Mar 1, 2024

mbbush commented Mar 2, 2024

mbbush commented Mar 2, 2024

jeanduplessis commented Mar 4, 2024

jeanduplessis commented Mar 4, 2024

mbbush commented Mar 5, 2024

mbbush commented Mar 6, 2024

mbbush commented Mar 6, 2024

mbbush commented Mar 11, 2024

jeanduplessis commented Mar 14, 2024

turkenf left a comment

turkenf left a comment

mbbush commented Mar 28, 2024

turkenf commented Mar 28, 2024

mbbush commented Mar 28, 2024

turkenf commented Mar 28, 2024 •

edited

Loading

Add support for msk connect resources #1162

Add support for msk connect resources #1162

Conversation

mbbush commented Feb 16, 2024 • edited Loading

Description of your changes

How has this code been tested

mbbush commented Feb 16, 2024

mbbush commented Feb 16, 2024

mbbush commented Feb 16, 2024

mbbush commented Feb 16, 2024

mbbush commented Mar 1, 2024

mbbush commented Mar 1, 2024

mbbush commented Mar 2, 2024

mbbush commented Mar 2, 2024

jeanduplessis commented Mar 4, 2024

jeanduplessis commented Mar 4, 2024

mbbush commented Mar 5, 2024

mbbush commented Mar 6, 2024

mbbush commented Mar 6, 2024

mbbush commented Mar 11, 2024

jeanduplessis commented Mar 14, 2024

turkenf left a comment

Choose a reason for hiding this comment

turkenf left a comment

Choose a reason for hiding this comment

mbbush commented Mar 28, 2024

turkenf commented Mar 28, 2024

mbbush commented Mar 28, 2024

turkenf commented Mar 28, 2024 • edited Loading

mbbush commented Feb 16, 2024 •

edited

Loading

turkenf commented Mar 28, 2024 •

edited

Loading