Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions when creating cluster: 409 response or missing installURL #229

Open
davidgubler opened this issue Sep 3, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@davidgubler
Copy link

There are at least two race conditions when creating a cluster:

  • API returns 409 "conflict" with body '{"reason":"Operation cannot be fulfilled on clusters.syn.tools 'c-unicornfarts2': the object has been modified; please apply your changes to the latest version and try again"}' or similar
  • API returns new cluster object without the installURL being set

I assume the two problems are related.

Steps to Reproduce the Problem

Create a cluster with valid data, e.g. the problem has been reproduced by POSTing
{"id":"c-unicornfarts7","displayName":"Unicorn FARTS 7!","annotations":{},"tenant":"t-sg-test","facts":{"cloud":"c","service_level":"best_effort","access_policy":"regular","release_channel":"a","sales_order":"NONE","distribution":"b"},"gitRepo":{"type":"auto"}}
to /clusters.

Being a race condition the problem is hard to reproduce reliably.

Actual Behavior

Sometimes it works, sometimes I get back a cluster that doesn't have an installURL, sometimes I get back a 409 conflict with body {"reason":"Operation cannot be fulfilled on clusters.syn.tools 'c-unicornfarts7': the object has been modified; please apply your changes to the latest version and try again"}

Expected Behavior

I reliably get back a cluster object with installURL set.

Moreover the 409 response code does not appear in https://syn.tools/lieutenant-api/references/index.html#_createcluster . Whether that's an implementation bug or a documentation bug I don't know, but it should be fixed one way or another.

@davidgubler davidgubler added the bug Something isn't working label Sep 3, 2024
@bastjan
Copy link
Contributor

bastjan commented Sep 3, 2024

The API does not do an atomic create (two requests):

if err := ctx.client.Create(ctx.Request().Context(), cluster); err != nil {

if err := ctx.client.Status().Update(ctx.Request().Context(), cluster); err != nil {

And it also does not wait for the install token to be set.

The status update can be removed most likely. And the API should either wait for the token to be set or we do the token creation in a mutating webhook. I believe the mutating webhook is the nicer way to use the manifests but the lieutenant code is such a horrible mess we might just wait in the API.

Previously it might have worked ok since the time between the two requests was just enough to create the token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants