Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Resources gets stuck creating forever with 1.8.0 #611

Closed
1 task done
DMarby opened this issue Sep 6, 2024 · 6 comments · Fixed by #614
Closed
1 task done

[Bug]: Resources gets stuck creating forever with 1.8.0 #611

DMarby opened this issue Sep 6, 2024 · 6 comments · Fixed by #614
Labels
bug Something isn't working impact:high is:triaged

Comments

@DMarby
Copy link

DMarby commented Sep 6, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Affected Resource(s)

  • cloudplatform.gcp.upbound.io/v1beta1 - Project, ProjectService, seemingly all resources

Resource MRs required to reproduce the bug

No response

Steps to Reproduce

  • Install Crossplane 1.16.0 and version 1.8.0 of the gcp provider
  • Try to create a CloudPlatform ProjectService or other resource

What happened?

After upgrading from 1.7.0, to 1.8.0, sometimes new resources no longer gets created, they get stuck in an infinite loop of "Successfully requested creation of external resource" and "Waiting for external resource existence to be confirmed"
After downgrading the provider to 1.7.0, everything functions again.

Relevant Error Output Snippet

No response

Crossplane Version

1.16.0

Provider Version

1.8.0

Kubernetes Version

1.29

Kubernetes Distribution

GKE

Additional Info

No response

@DMarby DMarby added bug Something isn't working needs:triage labels Sep 6, 2024
@DMarby DMarby changed the title [Bug]: Project resources never gets created with 1.8.0 [Bug]: Resources sometimes gets stuck creating forever with 1.8.0 Sep 6, 2024
@DMarby DMarby changed the title [Bug]: Resources sometimes gets stuck creating forever with 1.8.0 [Bug]: Resources gets stuck creating forever with 1.8.0 Sep 6, 2024
@turkenf
Copy link
Collaborator

turkenf commented Sep 6, 2024

Thank you for the issue report @DMarby

I can reproduce the issue using the ProjectIAMMember resource. We also observed similar issues when adding new resources in provider-upjet-azure (PRs: 805, 810). It gets stuck at creating: false without any errors: https://crossplane.slack.com/archives/C05E4LDNNG5/p1725459419760769

@turkenf
Copy link
Collaborator

turkenf commented Sep 6, 2024

I looked at the changes between v1.7.0 and v1.8.0 to find the cause of the issue(s). It seems that two changes caused this issue(s).

  • This change is the main reason why the ProjectIAMMember resource and probably other resources are not being created. When I revert this change, the resource is created successfully.

  • What makes things even more complicated here is that we hid the error messages with the change we made in Upjet. This change we made to upjet broke our ability to observe error messages (PRs in the provider-upjet-azure I mentioned above, get stuck at creating while adding new resources. These probably have incomplete configurations in the resources they are trying to add and get stuck creating and we can't see the error message). When I revert the change, I get the following error for the ProjectIAMMember resource:

  - lastTransitionTime: "2024-09-06T16:33:28Z"
    message: 'create failed: async create failed: failed to create the resource: [{0
      Parent context of request iam-project-official-provider-testing modifyIamPolicy
      canceled  []}]'
    reason: ReconcileError
    status: "False"
    type: Synced
  - lastTransitionTime: "2024-09-06T16:33:28Z"
    message: 'async create failed: failed to create the resource: [{0 Parent context
      of request iam-project-official-provider-testing modifyIamPolicy canceled  []}]'
    reason: AsyncCreateFailure
    status: "False"
    type: LastAsyncOperation

When I revert the change made in internal/clients/gcp.go, the error goes away.

@drew0ps
Copy link

drew0ps commented Sep 9, 2024

Hi All - I believe this issue is closely related, maybe duplicate.
Thanks for looking at this, although I have a different understanding regarding:

These probably have incomplete configurations in the resources they are trying to add and get stuck creating and we can't see the error message

I am not sure if this is the reason since the resources get created successfully using our own subs with the same examples, running make e2e and also with make run locally.
Same case for pr810 and pr805

@turkenf
Copy link
Collaborator

turkenf commented Sep 9, 2024

I am not sure if this is the reason since the resources get created successfully using our own subs with the same examples, running make e2e and also with make run locally.

I think that such situations are caused by insufficient quota due to subscription. Instead of getting an error such as insufficient quota, it gets stuck at creating.

@drew0ps
Copy link

drew0ps commented Sep 9, 2024

If that is the case, that should be pretty straight forward to troubleshoot by just monitoring the resource group activity while the pipeline job is running - I would be happy to assist but have no access to the azure sub.

@duizabojul
Copy link

duizabojul commented Sep 11, 2024

Disclaimer: we have a fork with google-beta provider we keep up to date with this repo.

After having some resources hanging like described in this issue (async create), I reverted this particular commit but nothing changed. Then i reverted update of upjet and now my resources are unstuck and i got the real underlying error of async create operation.

Are you sure this single revert is the cause of problem?

Edit: ok after reading thread, it makes sense that in my case I also needed to revert upjet :D My 2cents here is we maybe should revert upjet bump too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:high is:triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants