StatefulSet restarter always restarts replica 0 immediately after initial rollout #111

siegfriedweber · 2022-09-19T06:21:48Z

Current behaviour

New restarter-enabled StatefulSet have their replica 0 restarted after the initial rollout is complete.

Expected Behaviour

The initial rollout should be completed "normally" with no extra restarts.

Why does this happen?

There is a race condition between Kubernetes' StatefulSet controller creating the first replica Pod and commons' StatefulSet restart controller adding the restart trigger labels. If the restarter loses the race then the first replica is created without the metadata, triggering a restart once it is added.

What can we do about it?

Add a mutating webhook (see the spike) that adds the relevant metadata. The webhook must not replace the existing controller, since webhook delivery is not reliable.

However, webhook delivery requires a bunch of extra infrastructure that we do not currently have, namely:

We must implement the K8s webhook HTTPS API (https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)
We must generate a TLS certificate and write that into the MWC

Definition of done

Webhook certificate is managed (provisioned and renewed) by the commons operator
The Webhook should apply the same podTemplate annotations as the controller currently does
- Initial STS rollout does not cause a restart (STS.metadata.generation should stay 1)
Controller must still apply metadata if the webhook is disabled and/or fails (at the cost of still doing the extra restart in this case)
The webhook must fail open (failurePolicy: Ignore)
A Kuttl test should verify the above (maybe minus the failurePolicy)

Original ticket

The StatefulSet of a Superset cluster is immediately restarted after its creation. This should not be necessary and should be prevented.

$ kubectl describe statefulset simple-superset-node-default
...
Events:
  Type    Reason            Age                    From                    Message
  ----    ------            ----                   ----                    -------
  Normal  SuccessfulDelete  3m31s                  statefulset-controller  delete Pod simple-superset-node-default-0 in StatefulSet simple-superset-node-default successful
  Normal  SuccessfulCreate  2m59s (x2 over 3m31s)  statefulset-controller  create Pod simple-superset-node-default-0 in StatefulSet simple-superset-node-default successful

After the restart, the Superset pods are annotated as follows:

annotations:
  configmap.restarter.stackable.tech/simple-superset-node-default: cf60300e-0c45-4ee2-b60c-de53b0084182/21998
  secret.restarter.stackable.tech/simple-superset-credentials: e0e4b781-46f9-44f4-80c8-a0876b91ed8b/16909

This could be an indication that the restart controller of the commons-operator is involved.

The commons-operator is busy while the StatefulSet is restarted:

2022-09-16T10:02:50.228971Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.236144Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.239371Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:50.255219Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:50.258511Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.266146Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.274647Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:51.433621Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:51.449009Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default

The text was updated successfully, but these errors were encountered:

siegfriedweber · 2022-10-20T15:28:08Z

Actually this is how the restart controller is supposed to work. The Superset operator creates a stateful set labeled with restarter.stackable.tech/enabled=true. The restart controller of the commons operator watches this labeled stateful set and adds the UIDs and resource versions of the referenced config maps and secrets as annotations to the pod template. This triggers a restart of the stateful set right after its first creation. The commons operator watches these config maps and secrets and updates the annotations whenever these resources change which in turn triggers a restart. This ensures that Superset always runs with the latest configuration.

Even if the immediate restart is confusing, I do not see an elegant solution to prevent this, so I propose to close this issue.

fhennig · 2022-10-24T14:16:47Z

If the outcome is "this is how it is supposed to work", then it might not be a bug, be we should still go back to the drawing board and look at what we've built, because this immediate restart is really something that shouldn't happen.

Maybe the title of the issue and also it's place in this repo is not correct anymore.

After talking to sigi, I'm moving it back to "Refinement: In Progress"

vsupalov · 2022-10-25T11:45:10Z

Taking a step back. Outcomes from the refinement discussion:

Observed Situation

A clean system is easier to monitor, debug, and operate
Random restarts are surprising in a bad way
- Logs are not mentioning reason for this behaviour - just the restart is visible
We want our systems to be understandable, and observable (know why restart happens)
The restart controller causes a restart, which might be avoided
Two operators working on the same resource interact in sometimes hard to understand fashion (badly designed system?)

Investigation Worthy Threads

Can we avoid this kind of initial restart? Are they strictly necessary?
- Mutating admission controller / webhook (?) link
- Port restarting logic into the operator-rs framework (it is a library!)? Reusable functions.
Can we add logs to understand what's going on
- Add logging outputs to justify / announce restarts
Root cause analysis: how was the decision made that we want to have this kind of pattern?
- Possible "design guideline" which could have been helpful: "only one operator is modifying a single resource"

Possible Next Steps

Create spike investigating if restart is necessary?
Address "how was restarting decision" in architecture meeting?
Do we need somebody who feels responsible for higher-level architecture patterns?

vsupalov · 2022-10-25T11:56:56Z

@soenkeliebau @lfrancke unclear how to proceed here, some input would be great. Do we leave things as they are here, or do we pursue one of the investigation threads?

This ticket can be closed in favour of these other investigation threads.

nightkr · 2022-10-25T19:39:31Z

That the commons-operator is reconciling doesn't really say much, that'll happen whenever the STS is modified externally too. If you want to check whether it has triggered a restart then you'll want to check the STS YAMLs for whether the restarter annotations change.

I'd also argue that the restart controller isn't the cause of the problem here, just a symptom. The restart controller will trigger when the app's config changes. Check for why that happens if you want to avoid this.

nightkr · 2022-10-25T19:40:03Z

(Case in point, if the restart controller is at fault then this will happen for all product operators, not just Superset...)

lfrancke · 2022-10-25T20:21:23Z

Thanks Teo.

The restart controller of the commons operator watches this labeled stateful set and adds the UIDs and resource versions of the referenced config maps and secrets as annotations to the pod template. This triggers a restart of the stateful set right after its first creation.

I'm a bit confused now, though. This sounds like the answer that you were looking for, no?
I must be missing something still. I thought the restart happens because the restart controller changes the configmaps which in turn triggers a restart of the STS?

nightkr · 2022-10-25T20:29:29Z

The restart controller just watches the configmaps for changes, and triggers STS restarts based on that. The restart controller is just a messenger for the actual issue (that the CM changed). It never modifies the CMs itself.

lfrancke · 2022-10-25T20:47:31Z

Understood, thank you.
So, the described behavior has to come from somewhere else then.

@siegfriedweber @fhennig @vsupalov - sorry, back to you for now?

siegfriedweber · 2022-10-26T06:25:53Z

If you want to check whether it has triggered a restart then you'll want to check the STS YAMLs for whether the restarter annotations change.

The restarter annotations are initially added to the STS YAMLs by the restart controller. This triggers the restart.

The restart controller will trigger when the app's config changes.

The restart controller also triggers when a new StatefulSet with the annotation restarter.stackable.tech/enabled=true was created.

(Case in point, if the restart controller is at fault then this will happen for all product operators, not just Superset...)

It happens for all StatefulSets which are annotated with restarter.stackable.tech/enabled=true. That is, it also happens for the Airflow operator. The other operators do not use this annotation yet.

I think that I accurately described the issue in my first comment:

Actually this is how the restart controller is supposed to work. The Superset operator creates a stateful set labeled with restarter.stackable.tech/enabled=true. The restart controller of the commons operator watches this labeled stateful set and adds the UIDs and resource versions of the referenced config maps and secrets as annotations to the pod template. This triggers a restart of the stateful set right after its first creation. The commons operator watches these config maps and secrets and updates the annotations whenever these resources change which in turn triggers a restart. This ensures that Superset always runs with the latest configuration.

Please tell me if I am wrong.

soenkeliebau · 2022-10-26T07:03:19Z

@siegfriedweber and @teozkr will have a chat about this

nightkr · 2022-10-27T22:14:11Z

Hm, the label "should" still be added quickly enough that we the pods shouldn't have been scheduled yet, but maybe my memory fails me on how much of a window we have, will have to do some testing tomorrow... If that is indeed the sad case then a fail-open (since we'd still need the controller as well, the worst that could happen would be falling back to the current extra restart) mutating webhook probably makes sense at some point, and should be completely transparent to the operators.

The downside is that doing a first webhook means doing a bunch of new scaffolding from scratch (certificates, kube API registration, etc)...

nightkr · 2022-10-27T22:25:14Z

Ok, having tried it out now it does look like the first replica is restarted while all other replicas are created late enough that they are up to date. Avoiding that glitch will probably require us to do the webhook...

nightkr · 2022-10-28T00:13:46Z

I have validated that a mutating webhook is able to avoid the initial restart for a simple STS... 0cbf72c

Moving this ticket to commons since this is a bug in the restarter, not in superset-op specifically.

fhennig · 2022-10-31T08:28:51Z

So, is the mutating webhook that accepted solution to this problem? I see that this ticket is in refinement acceptance, I'd like to know the up- and downsides of this solution. Is there another way? how bad is:

The downside is that doing a first webhook means doing a bunch of new scaffolding from scratch (certificates, kube API registration, etc)...

I personally wouldn't call this ticket refined, as I wouldn't know what to do. The ticket doesn't have tasks or even acceptance criteria.

lfrancke · 2022-11-01T14:01:21Z

I agree, this came up during the acceptance. So I'll move it back for now.

lfrancke · 2022-11-08T11:28:00Z

A few questions:

Do I understand it correctly that this will "only" fix the issue in most cases since delivery of webhooks is not reliable?
Do webhooks require any kind of extra privileges that we currently don't need?
Do webhooks work on OpenShift?
Does this need to be presented to the team again? I assume no because I believe this is the outcome of the last team session, right?

nightkr · 2022-11-08T14:28:23Z

Do I understand it correctly that this will "only" fix the issue in most cases since delivery of webhooks is not reliable?

Yes, good old CAP strikes again. There's no way that we could possibly hook into an API in a way that is both:

Synchronous (happens before the write completes)
Exhaustive (never misses any events)
Reliable (doesn't bring down (parts of) the hooked API when it fails)

We could technically sacrifice any of these, but (to me) 2 and 3 are both far more vital than 1.

Do webhooks require any kind of extra privileges that we currently don't need?

The operator and installer both need permission to work with MutatingWebhookConfiguration objects. Additionally, webhooks do introduce the risk of bugs and/or misconfiguration wreaking havoc on the larger system.

This design would allow users who are uncomfortable with running our webhook to simply not deploy the MWC, at the cost of still having to deal with this extra restart.

Do webhooks work on OpenShift?

I haven't tested the spike against OS specifically, but Google seems to believe yes. I also remember noticing that certain parts of OS itself is implemented using webhooks.

Does this need to be presented to the team again? I assume no because I believe this is the outcome of the last team session, right?

It might be worth bringing up at tomorrow's arch meeting. I'll add it to the list.

lfrancke · 2022-11-08T15:37:36Z

Yes, good old CAP strikes again. There's no way that we could possibly hook into an API in a way that is both:

That's fine for me as the worst that can happen ist hat we get the extra restart, right?
Similar to the answer to question two.

Thank you. If the team is fine with this then I am as well.

fhennig · 2022-11-08T15:50:34Z

I think we didn't really discuss this broadly, I'll put it on the agenda for tomorrow, so we can have a brief look

nightkr · 2022-11-09T09:53:52Z

That's fine for me as the worst that can happen ist hat we get the extra restart, right?

Yes.

nightkr · 2022-11-10T12:34:08Z

After some discussion with @sbernauer and @siegfriedweber we decided to push on with the webhook approach.

lfrancke · 2022-11-10T12:34:54Z

So this means refinement is done?

nightkr · 2022-11-10T12:35:59Z

I'd say so, yes. There's still the question of how we want to prioritize this, but I guess that's your department... :P

sbernauer · 2022-12-05T08:06:46Z

Teo did a handover with Siggi

siegfriedweber self-assigned this Oct 20, 2022

nightkr transferred this issue from stackabletech/superset-operator Oct 28, 2022

nightkr changed the title ~~Prevent unnecessary restart of the StatefulSet~~ StatefulSet restarter always restarts replica 0 immediately after initial rollout Oct 28, 2022

lfrancke moved this from Refinement Acceptance: In Progress to Refinement: Waiting for in Stackable Engineering Nov 1, 2022

lfrancke unassigned siegfriedweber Nov 1, 2022

lfrancke moved this from Refinement: Waiting for to Next in Stackable Engineering Nov 7, 2022

nightkr moved this from Next to Refinement: In Progress in Stackable Engineering Nov 8, 2022

nightkr self-assigned this Nov 8, 2022

nightkr moved this from Refinement: In Progress to Refinement Acceptance: Waiting for in Stackable Engineering Nov 8, 2022

lfrancke moved this from Refinement Acceptance: Waiting for to Refinement Acceptance: In Progress in Stackable Engineering Nov 8, 2022

lfrancke self-assigned this Nov 8, 2022

lfrancke moved this from Refinement Acceptance: In Progress to Ready for Development in Stackable Engineering Nov 11, 2022

lfrancke unassigned lfrancke and nightkr Nov 11, 2022

nightkr self-assigned this Nov 14, 2022

nightkr moved this from Ready for Development to Development: In Progress in Stackable Engineering Nov 14, 2022

sbernauer assigned siegfriedweber Dec 5, 2022

sbernauer moved this from Development: In Progress to Ready for Development in Stackable Engineering Dec 5, 2022

siegfriedweber removed their assignment Dec 19, 2022

sbernauer moved this from Ready for Development to Development: Track in Stackable Engineering Jan 9, 2023

soenkeliebau mentioned this issue Mar 8, 2023

Foundation work for admission controllers stackabletech/issues#361

Closed

lfrancke removed this from Stackable Engineering Mar 9, 2023

sbernauer mentioned this issue Nov 14, 2023

Changing configOverrides doesn't restart NiFi pods stackabletech/nifi-operator#531

Open

adwk67 mentioned this issue Jan 12, 2024

Airflow servers shutting down after change from celery to kubernetes as executor stackabletech/airflow-operator#371

Closed

lfrancke added the refinement-needed label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StatefulSet restarter always restarts replica 0 immediately after initial rollout #111

StatefulSet restarter always restarts replica 0 immediately after initial rollout #111

siegfriedweber commented Sep 19, 2022 •

edited by nightkr

Loading

siegfriedweber commented Oct 20, 2022

fhennig commented Oct 24, 2022

vsupalov commented Oct 25, 2022 •

edited

Loading

vsupalov commented Oct 25, 2022

nightkr commented Oct 25, 2022

nightkr commented Oct 25, 2022

lfrancke commented Oct 25, 2022

nightkr commented Oct 25, 2022 •

edited

Loading

lfrancke commented Oct 25, 2022

siegfriedweber commented Oct 26, 2022

soenkeliebau commented Oct 26, 2022

nightkr commented Oct 27, 2022

nightkr commented Oct 27, 2022 •

edited

Loading

nightkr commented Oct 28, 2022

fhennig commented Oct 31, 2022

lfrancke commented Nov 1, 2022

lfrancke commented Nov 8, 2022

nightkr commented Nov 8, 2022

lfrancke commented Nov 8, 2022

fhennig commented Nov 8, 2022

nightkr commented Nov 9, 2022

nightkr commented Nov 10, 2022

lfrancke commented Nov 10, 2022

nightkr commented Nov 10, 2022

sbernauer commented Dec 5, 2022

StatefulSet restarter always restarts replica 0 immediately after initial rollout #111

StatefulSet restarter always restarts replica 0 immediately after initial rollout #111

Comments

siegfriedweber commented Sep 19, 2022 • edited by nightkr Loading

Current behaviour

Expected Behaviour

Why does this happen?

What can we do about it?

Definition of done

siegfriedweber commented Oct 20, 2022

fhennig commented Oct 24, 2022

vsupalov commented Oct 25, 2022 • edited Loading

Observed Situation

Investigation Worthy Threads

Possible Next Steps

vsupalov commented Oct 25, 2022

nightkr commented Oct 25, 2022

nightkr commented Oct 25, 2022

lfrancke commented Oct 25, 2022

nightkr commented Oct 25, 2022 • edited Loading

lfrancke commented Oct 25, 2022

siegfriedweber commented Oct 26, 2022

soenkeliebau commented Oct 26, 2022

nightkr commented Oct 27, 2022

nightkr commented Oct 27, 2022 • edited Loading

nightkr commented Oct 28, 2022

fhennig commented Oct 31, 2022

lfrancke commented Nov 1, 2022

lfrancke commented Nov 8, 2022

nightkr commented Nov 8, 2022

lfrancke commented Nov 8, 2022

fhennig commented Nov 8, 2022

nightkr commented Nov 9, 2022

nightkr commented Nov 10, 2022

lfrancke commented Nov 10, 2022

nightkr commented Nov 10, 2022

sbernauer commented Dec 5, 2022

siegfriedweber commented Sep 19, 2022 •

edited by nightkr

Loading

vsupalov commented Oct 25, 2022 •

edited

Loading

nightkr commented Oct 25, 2022 •

edited

Loading

nightkr commented Oct 27, 2022 •

edited

Loading