Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reference for Configuration to kafka cluster #877

Merged

Conversation

mbbush
Copy link
Collaborator

@mbbush mbbush commented Sep 13, 2023

Description of your changes

Partially Fixes #489

It seems that creating a kafka cluster sometimes fails, with an error related to the configuration, as described in #489. One workaround we've implemented is to always define both the Configuration.kafka and the Cluster.kafka in the same composition, and patch from the Configuration to the composite, then from the composite to the Cluster with fromFieldPath: Required. This is cumbersome, but it works.

This seems like potentially a better solution, allowing the use of a selector to get essentially the same behavior, but the cost is that it requires you to specify the same selector twice. What I really wanted to do was to be able to extract both the ARN and the revision from the same object, but it seems like the output of the ExtractValueFn must always be a string, and make generate panics if I have more than one selector with the same name, so I couldn't see a way to do it.

I have:

  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

make uptest UPTEST_EXAMPLE_LIST='examples/kafka/cluster.yaml'
--- PASS: kuttl (2680.69s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/case (2656.51s)
PASS
13:38:27 [ OK ] running automated tests

@mbbush mbbush force-pushed the matt/kafka-configuration-ref-4.67 branch from e620a35 to 5db43cd Compare September 13, 2023 01:36
@mbbush mbbush marked this pull request as ready for review September 13, 2023 01:37
@turkenf
Copy link
Collaborator

turkenf commented Sep 13, 2023

/test-examples="examples/kafka/cluster.yaml"

}
r.References["configuration_info.revision"] = config.Reference{
Type: "Configuration",
Extractor: "GetConfigurationRevision()",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Extractor: "GetConfigurationRevision()",
Extractor: `github.com/upbound/upjet/pkg/resource.ExtractParamPath("latest_revision",true)`,

We have a function for such situations, could you please try the configuration above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I thought I remembered seeing something like that, and looked for it, but couldn't find it. This is much better.

@turkenf
Copy link
Collaborator

turkenf commented Sep 13, 2023

Since it takes a long time to create and delete this resource, can you add a 2-hour timeout like here to extend the running time of uptest?

@mbbush
Copy link
Collaborator Author

mbbush commented Sep 14, 2023

I updated the example to include the 2 hour timeout (oof) and also fixed a schema error. I don't know why the configurationInfo is an array, probably because of something about the way it's set up in terraform? It certainly seems like it would make more sense as an object in yaml, but I guess that would be a breaking schema change to a v1beta1 resource, which probably isn't worth it.

For some reason when I try to run the e2e test locally I get this error:

    case.go:363: DeliveryStream.firehose.aws.upbound.io "test-stream" is invalid: spec: Invalid value: "object": no such key: initProvider evaluating rule: name is a required parameter

Any idea why that might be?

@turkenf
Copy link
Collaborator

turkenf commented Sep 14, 2023

Any idea why that might be?

Can you use the example here for the DeliveryStream.firehose resource? The example in Cluster.kafka resource has not been updated and has deprecated parameters.

@mbbush
Copy link
Collaborator Author

mbbush commented Sep 14, 2023

ok, that's fixed. Now the problem is that the Cluster.kafka is having a reconcile error:

    Last Transition Time:  2023-09-14T21:20:46Z
    Message:               cannot resolve references: mg.Spec.ForProvider.ConfigurationInfo[i3].Revision: referenced field was empty (referenced resource may not yet be ready)
    Reason:                ReconcileError
    Status:                False
    Type:                  Synced

even though the Configuration is Ready:


kubectl get -o yaml configuration.kafka
apiVersion: v1
items:
- apiVersion: kafka.aws.upbound.io/v1beta1
  kind: Configuration
  metadata:
    annotations:
      crossplane.io/external-create-succeeded: "2023-09-14T21:18:28Z"
      crossplane.io/external-name: arn:aws:kafka:us-west-1:REDACTED:configuration/example/c8c1a33a-54f0-44ef-8a5c-a99ce0f33319-2
      upjet.crossplane.io/provider-meta: "null"
      upjet.upbound.io/test: "true"
    creationTimestamp: "2023-09-14T21:18:28Z"
    finalizers:
    - finalizer.managedresource.crossplane.io
    generation: 2
    labels:
      testing.upbound.io/example-name: example
    name: example
    resourceVersion: "153241"
    uid: aa1f90fe-51b0-46d3-b9d1-51c91490bbb7
  spec:
    deletionPolicy: Delete
    forProvider:
      kafkaVersions:
      - 2.6.0
      name: example
      region: us-west-1
      serverProperties: |
        auto.create.topics.enable = true
        delete.topic.enable = true
    initProvider: {}
    managementPolicies:
    - '*'
    providerConfigRef:
      name: default
  status:
    atProvider:
      arn: arn:aws:kafka:us-west-1:REDACTED:configuration/example/c8c1a33a-54f0-44ef-8a5c-a99ce0f33319-2
      description: ""
      id: arn:aws:kafka:us-west-1:REDACTED:configuration/example/c8c1a33a-54f0-44ef-8a5c-a99ce0f33319-2
      kafkaVersions:
      - 2.6.0
      latestRevision: 1
      name: example
      serverProperties: |
        auto.create.topics.enable = true
        delete.topic.enable = true
    conditions:
    - lastTransitionTime: "2023-09-14T21:18:34Z"
      reason: Available
      status: "True"
      type: Ready
    - lastTransitionTime: "2023-09-14T21:18:29Z"
      reason: ReconcileSuccess
      status: "True"
      type: Synced
    - lastTransitionTime: "2023-09-14T21:18:31Z"
      reason: Success
      status: "True"
      type: LastAsyncOperation
    - lastTransitionTime: "2023-09-14T21:18:31Z"
      reason: Finished
      status: "True"
      type: AsyncOperation
    - lastTransitionTime: "2023-09-14T21:18:45Z"
      reason: UpToDate
      status: "True"
      type: Test
kind: List
metadata:
  resourceVersion: ""

@mbbush
Copy link
Collaborator Author

mbbush commented Sep 14, 2023

Oh, I bet the problem is that status.atProvider.latestRevision is a number, not a string.

@turkenf
Copy link
Collaborator

turkenf commented Sep 15, 2023

Then we can consider canceling the reference definition for the revision parameter and passing the value as revision: 1. What do you say?

@mbbush
Copy link
Collaborator Author

mbbush commented Sep 15, 2023

That will make the test pass. I already tried that locally and it worked. It does make the reference substantially less useful, but I suppose it's still better than nothing?

How deeply baked into the code for resolving references is the requirement that the resolved value must be a string? What I'd really like to be able to do is use a Resolver that returns an object, which I can customize. Something like (pseudocode)

def resolveConfiguration(mr: xp.ManagedResource) = {
  return {
    "arn": mr.status.atProvider.arn,
    "revision": mr.status.atProvider.latestRevision
   }
}

This seems useful in general as a flexible (ideally user-definable) way to define arbitrary dependencies between managed resources. Are you aware of any design discussions around possibly supporting this, or on why it would be prohibitively difficult?

@turkenf
Copy link
Collaborator

turkenf commented Sep 20, 2023

/test-examples="examples/kafka/cluster.yaml"

@turkenf
Copy link
Collaborator

turkenf commented Sep 21, 2023

For checking uptest logs: crossplane-contrib/provider-upjet-gcp#391 (comment)

@mbbush
Copy link
Collaborator Author

mbbush commented Sep 21, 2023

As far as I can tell the test failed because one or more of the subnets didn't become ready.

2023-09-20T08:29:51Z	DEBUG	provider-aws	apply async ended	{"workspace": "/tmp/88afe8fa-8857-45c1-883e-737f847b19d4", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.5.5\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-09-20T08:29:50.081714Z\",\"terraform\":\"1.5.5\",\"type\":\"version\",\"ui\":\"1.1\"}\n{\"@level\":\"info\",\"@message\":\"aws_subnet.subnet-az2: Plan to create\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-09-20T08:29:51.314176Z\",\"change\":{\"resource\":{\"addr\":\"aws_subnet.subnet-az2\",\"module\":\"\",\"resource\":\"aws_subnet.subnet-az2\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_subnet\",\"resource_name\":\"subnet-az2\",\"resource_key\":null},\"action\":\"create\"},\"type\":\"planned_change\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 1 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-09-20T08:29:51.314247Z\",\"changes\":{\"add\":1,\"change\":0,\"import\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"aws_subnet.subnet-az2: Creating...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-09-20T08:29:51.800935Z\",\"hook\":{\"resource\":{\"addr\":\"aws_subnet.subnet-az2\",\"module\":\"\",\"resource\":\"aws_subnet.subnet-az2\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_subnet\",\"resource_name\":\"subnet-az2\",\"resource_key\":null},\"action\":\"create\"},\"type\":\"apply_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_subnet.subnet-az2: Creation errored after 0s\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-09-20T08:29:51.946235Z\",\"hook\":{\"resource\":{\"addr\":\"aws_subnet.subnet-az2\",\"module\":\"\",\"resource\":\"aws_subnet.subnet-az2\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_subnet\",\"resource_name\":\"subnet-az2\",\"resource_key\":null},\"action\":\"create\",\"elapsed_seconds\":0},\"type\":\"apply_errored\"}\n{\"@level\":\"error\",\"@message\":\"Error: creating EC2 Subnet: InvalidParameterValue: Value (REDACTEDc) for parameter availabilityZone is invalid. Subnets can currently only be created in the following availability zones: REDACTEDa, REDACTEDb.\\n\\tstatus code: 400, request id: c748bd40-b8df-4902-89ee-8ac002585416\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-09-20T08:29:51.947437Z\",\"diagnostic\":{\"severity\":\"error\",\"summary\":\"creating EC2 Subnet: InvalidParameterValue: Value (REDACTEDc) for parameter availabilityZone is invalid. Subnets can currently only be created in the following availability zones: REDACTEDa, REDACTEDb.\\n\\tstatus code: 400, request id: c748bd40-b8df-4902-89ee-8ac002585416\",\"detail\":\"\",\"address\":\"aws_subnet.subnet-az2\",\"range\":{\"filename\":\"main.tf.json\",\"start\":{\"line\":1,\"column\":443,\"byte\":442},\"end\":{\"line\":1,\"column\":444,\"byte\":443}},\"snippet\":{\"context\":\"resource.aws_subnet.subnet-az2\",\"code\":\"{\\\"provider\\\":{\\\"aws\\\":{\\\"access_key\\\":\\\"REDACTED\\\",\\\"region\\\":\\\"REDACTED\\\",\\\"secret_key\\\":\\\"REDACTED\\\",\\\"token\\\":\\\"\\\"}},\\\"resource\\\":{\\\"aws_subnet\\\":{\\\"subnet-az2\\\":{\\\"availability_zone\\\":\\\"REDACTEDc\\\",\\\"cidr_block\\\":\\\"192.168.1.0/24\\\",\\\"lifecycle\\\":{\\\"prevent_destroy\\\":true},\\\"tags\\\":{\\\"crossplane-kind\\\":\\\"subnet.ec2.aws.upbound.io\\\",\\\"crossplane-name\\\":\\\"subnet-az2\\\",\\\"crossplane-providerconfig\\\":\\\"default\\\"},\\\"vpc_id\\\":\\\"vpc-071bb13dcb266c3d4\\\"}}},\\\"terraform\\\":{\\\"required_providers\\\":{\\\"aws\\\":{\\\"source\\\":\\\"hashicorp/aws\\\",\\\"version\\\":\\\"4.67.0\\\"}}}}\",\"start_line\":1,\"highlight_start_offset\":442,\"highlight_end_offset\":443,\"values\":[]}},\"type\":\"diagnostic\"}\n"}

Maybe for some reason your aws account is only allowed to use the us-west-1a and us-west-1b AZs, and not us-west-1c? Or maybe AWS was having some problem in us-west-1c yesterday?

I ran the test twice locally on my laptop. The first time, after it had requested creation of the cluster, one of the aws api calls failed with a network error, probably because I was switching between wifi and ethernet, which caused the provider to terraform taint the kafka cluster. It was created in AWS, but never became ready in crossplane until I logged into the provider pod and manually terraform untainted it. Then it worked fine.

I reran the test a second time and left my computer plugged in to ethernet the whole time. The test passed without intervention, and those are the results I posted above.

@turkenf
Copy link
Collaborator

turkenf commented Sep 21, 2023

Maybe for some reason your aws account is only allowed to use the us-west-1a and us-west-1b AZs, and not us-west-1c? Or maybe AWS was having some problem in us-west-1c yesterday?

Yes, it is, if this run also fails due to this, you can use us-west-1a and us-west-1b, I will trigger now.

@turkenf
Copy link
Collaborator

turkenf commented Sep 21, 2023

/test-examples="examples/kafka/cluster.yaml"

@mbbush
Copy link
Collaborator Author

mbbush commented Sep 21, 2023

I moved the whole example to us-east-2 and switched to availability zones a and b just before you triggered the test, so 🤞 it will pass this time.

Copy link
Collaborator

@turkenf turkenf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your work in this PR @mbbush, LGTM.

@turkenf turkenf merged commit 1228da6 into crossplane-contrib:main Sep 22, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kakfa: Cluster resource is not able to get to the Ready state
2 participants