Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for auto deployment for historicals #36

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

rbankar7
Copy link
Member

@rbankar7 rbankar7 commented Aug 3, 2023

Fixes #XXXX.

Description


This PR has:

  • been tested on a real K8S cluster to ensure creation of a brand new Druid cluster works.
  • been tested for backward compatibility on a real K*S cluster by applying the changes introduced here on an existing Druid cluster. If there are any backward incompatible changes then they have been noted in the PR description.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Key changed/added files in this PR
  • MyFoo
  • OurBar
  • TheirBaz

@cla-assistant
Copy link

cla-assistant bot commented Aug 3, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

1 similar comment
@cla-assistant
Copy link

cla-assistant bot commented Aug 3, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@@ -11496,6 +11533,23 @@ spec:
status:
description: DruidStatus defines the observed state of Druid
properties:
HistoricalStatus:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HistoricalStatus:
historicalStatus:

return nil
}

m.Status.Historical.Replica = obj.(*appsv1.StatefulSet).Status.CurrentReplicas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If m.Status.Historical.Replica is used later on to scale down back to original replicas, then its better to create this at the beginning. Also use statefulset.spec.replicas as opposed to status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this feels redundant as you have already updated this field here

logger_drain_historical.Info("Waiting for pods to drain", "name", m.Name, "namespace", m.Namespace)
return nil
}
//delete corresponding nodegrabbers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be implemented only if we use local storage. Which suggests that we need some kind of check to see if its a local storage instance or not.

for m.Status.Historical.DecommissionedPods != nil {
//get the PVC for the pod to be disabled and delete it once the pod is deleted
for _, pod := range podList {
if pod.(*v1.Pod).Name == m.Status.Historical.DecommissionedPods[0] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the podList has pods with 2,1,0 ordinal numbers and m.Status.Historical.DecommissionedPods has pods 0,1,2 . In this case only the pod 0 would be deleted, but the PVC of 1 and 2 would also be deleted which is not ideal.
Why not have a simple for loop over the pods in m.Status.Historical.DecommissionedPods and delete them?

}
}
}
for it := startPod; it <= endPod; it++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you delete the PVCs, the new pod which was deleted just before this block, would see the reference to the PVC and report that the PVC is missing and hence can't schedule. This is why we may need an additional round of deletion of the pods we deleted earlier.


func deployHistorical(sdk client.Client, m *v1alpha1.Druid, nodeSpec *v1alpha1.DruidNodeSpec, nodeSpecUniqueStr string, emitEvent EventEmitter, batchSize int32, baseUrl string) error {
// patch the updateStrategy with onDelete
err := patchUpdateStrategy(sdk, m, nodeSpec, onDelete, emitEvent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we would be patching the update strategy on every run. Can we patch it only at the start of an deployment process? Also, make sure to revert the patching done here after the deployment.

return err
}

if obj.(*appsv1.StatefulSet).Status.ReadyReplicas != obj.(*appsv1.StatefulSet).Status.CurrentReplicas {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a similar check before starting the deployment to see if there's already another deployment in progress. You can use status.updateRevision and status.currentRevision to check for the same.
If there's another deployment in progress, dont proceed at all.

Copy link
Member

@saithal-confluent saithal-confluent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just list some of the pending items on this PR;

  • Calling the deployer function of historicals from the handler
  • Ensuring that the deployer function is only called when the deployment type is node change, otherwise the normal rolling update strategy should take care of it.
  • Deployment can only be triggered when there's no on going updates in the cluster. This also means that if some other component is being rolled which is ranked higher than the historicals, it should be allowed to complete and only then trigger a deployment.
  • Ensure the state of the cluster before the deployment is same as after the deployment with respect to the resources created or modified.
  • We need to make the operator code for historical deployment as idempotent as possible. States can be built on each run using the config and status objects as needed.
  • Actions like patching statefulset update strategy which is called during the start should only be called once in the life cycle. The next time its called, it should be after the deployment for reverting the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants