New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for auto deployment for historicals #36

Draft

rbankar7 wants to merge 7 commits into master from rban/historical-auto-deploy

Member

rbankar7 commented Aug 3, 2023

Fixes #XXXX.

Description

This PR has:

been tested on a real K8S cluster to ensure creation of a brand new Druid cluster works.
been tested for backward compatibility on a real K*S cluster by applying the changes introduced here on an existing Druid cluster. If there are any backward incompatible changes then they have been noted in the PR description.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added documentation for new or modified features or behaviors.

Key changed/added files in this PR

MyFoo
OurBar
TheirBaz

cla-assistant bot commented Aug 3, 2023

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

1 similar comment

cla-assistant bot commented Aug 3, 2023

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

saithal-confluent reviewed

View reviewed changes

chart/templates/crds/druid.apache.org_druids.yaml Outdated Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

config/crd/bases/druid.apache.org_druids.yaml Outdated Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

config/druid.apache.org_druids.yaml

@@ @@ -11496,6 +11533,23 @@ spec: @@
  status:
  description: DruidStatus defines the observed state of Druid
  properties:
+ HistoricalStatus:

Member

saithal-confluent Aug 3, 2023

Suggested change

 HistoricalStatus:

 historicalStatus:

saithal-confluent reviewed

View reviewed changes

controllers/druid/drain_historical.go Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

controllers/druid/druid_client.go Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

controllers/druid/druid_client.go Outdated Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

controllers/druid/druid_client.go Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

controllers/druid/druid_client.go Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

chart/templates/crds/druid.apache.org_druidnodespecs.yaml Outdated Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

controllers/druid/drain_historical.go Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

controllers/druid/druid_client.go Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

controllers/druid/druid_client.go Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

controllers/druid/drain_historical.go Show resolved Hide resolved

saithal-confluent reviewed

View reviewed changes

controllers/druid/drain_historical.go

+ return nil
+ }
+ m.Status.Historical.Replica = obj.(*appsv1.StatefulSet).Status.CurrentReplicas

Member

saithal-confluent Aug 10, 2023

If m.Status.Historical.Replica is used later on to scale down back to original replicas, then its better to create this at the beginning. Also use statefulset.spec.replicas as opposed to status.

Member

saithal-confluent Aug 10, 2023

Also this feels redundant as you have already updated this field here

saithal-confluent reviewed

View reviewed changes

controllers/druid/drain_historical.go

+ logger_drain_historical.Info("Waiting for pods to drain", "name", m.Name, "namespace", m.Namespace)
+ return nil
+ }
+ //delete corresponding nodegrabbers

Member

saithal-confluent Aug 10, 2023

This can be implemented only if we use local storage. Which suggests that we need some kind of check to see if its a local storage instance or not.

saithal-confluent reviewed

View reviewed changes

controllers/druid/drain_historical.go

+ for m.Status.Historical.DecommissionedPods != nil {
+ //get the PVC for the pod to be disabled and delete it once the pod is deleted
+ for _, pod := range podList {
+ if pod.(*v1.Pod).Name == m.Status.Historical.DecommissionedPods[0] {

Member

saithal-confluent Aug 10, 2023

Suppose the podList has pods with 2,1,0 ordinal numbers and m.Status.Historical.DecommissionedPods has pods 0,1,2 . In this case only the pod 0 would be deleted, but the PVC of 1 and 2 would also be deleted which is not ideal.
Why not have a simple for loop over the pods in m.Status.Historical.DecommissionedPods and delete them?

saithal-confluent reviewed

View reviewed changes

controllers/druid/drain_historical.go

+ }
+ }
+ }
+ for it := startPod; it <= endPod; it++ {

Member

saithal-confluent Aug 10, 2023

After you delete the PVCs, the new pod which was deleted just before this block, would see the reference to the PVC and report that the PVC is missing and hence can't schedule. This is why we may need an additional round of deletion of the pods we deleted earlier.

saithal-confluent reviewed

View reviewed changes

controllers/druid/drain_historical.go

+func deployHistorical(sdk client.Client, m *v1alpha1.Druid, nodeSpec *v1alpha1.DruidNodeSpec, nodeSpecUniqueStr string, emitEvent EventEmitter, batchSize int32, baseUrl string) error {
+ // patch the updateStrategy with onDelete
+ err := patchUpdateStrategy(sdk, m, nodeSpec, onDelete, emitEvent)

Member

saithal-confluent Aug 10, 2023

This means we would be patching the update strategy on every run. Can we patch it only at the start of an deployment process? Also, make sure to revert the patching done here after the deployment.

saithal-confluent reviewed

View reviewed changes

controllers/druid/drain_historical.go

+ return err
+ }
+ if obj.(*appsv1.StatefulSet).Status.ReadyReplicas != obj.(*appsv1.StatefulSet).Status.CurrentReplicas {

Member

saithal-confluent Aug 10, 2023

Add a similar check before starting the deployment to see if there's already another deployment in progress. You can use status.updateRevision and status.currentRevision to check for the same.
If there's another deployment in progress, dont proceed at all.

saithal-confluent reviewed

View reviewed changes

apis/druid/v1alpha1/druid_types.go Show resolved Hide resolved

saithal-confluent requested changes

View reviewed changes

Member

saithal-confluent left a comment

I'll just list some of the pending items on this PR;

Calling the deployer function of historicals from the handler
Ensuring that the deployer function is only called when the deployment type is node change, otherwise the normal rolling update strategy should take care of it.
Deployment can only be triggered when there's no on going updates in the cluster. This also means that if some other component is being rolled which is ranked higher than the historicals, it should be allowed to complete and only then trigger a deployment.
Ensure the state of the cluster before the deployment is same as after the deployment with respect to the resources created or modified.
We need to make the operator code for historical deployment as idempotent as possible. States can be built on each run using the config and status objects as needed.
Actions like patching statefulset update strategy which is called during the start should only be called once in the life cycle. The next time its called, it should be after the deployment for reverting the changes.

rbankar7 added 7 commits

January 11, 2024 14:32


Add support for auto deployment for historicals

75311db

fix

77a62e3

Fix

fc3e1b7

Fix

4b8fcaf

Fix

7d80dbf


address review comments

5e59883

fix

dc5f766

rbankar7 force-pushed the rban/historical-auto-deploy branch from f25c0b2 to dc5f766 Compare

January 11, 2024 09:05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment