Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Operator managed quota. #199

Merged
merged 3 commits into from
Aug 30, 2024
Merged

✨ Operator managed quota. #199

merged 3 commits into from
Aug 30, 2024

Conversation

jortel
Copy link
Contributor

@jortel jortel commented Aug 15, 2024

No description provided.

Signed-off-by: Jeff Ortel <[email protected]>
@jortel jortel changed the title ✨ Add enhancement. ✨ Operator manged quota. Aug 15, 2024
@jortel jortel added this to the v0.6.0 milestone Aug 15, 2024
Signed-off-by: Jeff Ortel <[email protected]>
Copy link
Member

@aufi aufi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just added few notes.


## Design Details

### Test Plan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some kind of scale test should be probably defined to prove the quota helps handling higher loads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we have any upper / lower bounds? 99%, 100% would probably give basically no protection, while 1%, 2% etc. might make it just about impossible to schedule anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps but hard to say. 2% of a huge cluster may be enough. On the flip side, 100% is essentially worthless but not broken.

Copy link
Contributor Author

@jortel jortel Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some kind of scale test should be probably defined to prove the quota helps handling higher loads.

This would be very complicated.

enhancements/operator-quota/README.md Show resolved Hide resolved
@jortel jortel changed the title ✨ Operator manged quota. ✨ Operator managed quota. Aug 21, 2024
@jmontleon
Copy link
Member

jmontleon commented Aug 21, 2024

The logic (mostly for me later if we decide to do this) to get total CPUs on worker nodes is something like this. This should be easy enough to reproduce with ansible.

$ tot_cpu=0; for n in $(oc get node --selector='node-role.kubernetes.io/worker=' -o name); do node_cpu=$(oc get $n -o jsonpath='{.status.capacity.cpu}'); ((tot_cpu+=node_cpu)); done; echo $tot_cpu
16

And memory (in Ki):

tot_mem=0; for n in $(oc get node --selector='node-role.kubernetes.io/worker=' -o name); do node_mem=$(oc get $n -o jsonpath='{.status.capacity.memory}'); ((tot_mem+=$(echo $node_mem | sed 's/Ki//') )); done; echo $tot_mem
65825548

https://access.redhat.com/solutions/6968485

Copy link
Contributor

@shawn-hurley shawn-hurley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the idea, I agree with others, that documentation around best practices and that it should probably default to off.

I had one other concern, but I think it is managable, and may even be a concern for a different time.


## Proposal

Enhance the operator to optionally manage a Resource Quota in the namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I would be concerned about, but I think there options around this, is this scenario:

  1. Pods are created to MAX of the resource Quota
  2. The HUB pod is killed and now is unable to be scheduled
  3. This leads to konveyor being taken down for some period of time (as the pods complete it will free up space IIUC.

I think one thing to get around this, is have the hub/ui pods be of a higher priority class, and all the task pods be of another priority class, and use the resouce quota on just the task the pods using the resource quota scope of priority clas.

Just a thought, and may be a pre-mature optimization but just a concern.

Copy link
Contributor Author

@jortel jortel Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quota limits the creation of the Pod resource. So, if the hub crashes or is OOM killed, the quota would not impact restarting the container. I think the potentially concerning use case would be related to the operator needing to restart pods. Such as upgrade while tasks are in flight. Or, and edit of the Tackle CR causing the operator to update a deployment.

Some options (not in any order). Though, I think #4 is the simplest/cleanest.

Option 1

Perhaps this could be mitigated by having the operator reconcile in the following order:

  • delete the quota
  • reconcile everything else.
  • reconcile the quota.

This seems fragile and does not address all of the use cases.

Option 2

Priority classes (suggested by @shawn-hurley)
Unfortunately priority classes are cluster scoped and without a default priority class (which does not exist by default), pods without an assigned class default to Zero(0). IIUC, this means that tackle pods will run at higher priority than all other pods in the cluster (including controllers). Also, our operator would need perms to create the cluster-scoped priority classes which I highly doubt could be acceptable. All of this adds up to - we can't use/rely-on priority classes.

$ oc get pc
NAME                      VALUE        GLOBAL-DEFAULT   AGE
system-cluster-critical   2000000000   false            33d
system-node-critical      2000001000   false            33d

Option 3

It is also unfortunate that Quota's don't support labels selectors.

The only option I have considered to address this concern is for task pods (and quota) to be created in another namespace. Example: konveyor-task. This would have the benefits of:

  • Would eliminate clutter in our main in konveyor-tackle.
  • Quota would only affect task pods.

But with the following drawbacks:

  • mainly that dealing with multiple namespaces would be kind of a pain. And, perhaps confusing for users.

Option 4

Perhaps a better option would be to implement the quota in the hub. The hub would list the nodes and calculate the allocatable cpu and memory. Then, (optionally) create task pods within the requested quota also expressed as a % passed to the hub ENVAR. I had a PR for this at one time but discarded it in favor or ResourceQuota. The algorithm/calculation is straight forward. Given the limitations of ResourceQuota, this may be the better way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if the other suggestions don't work then as described it makes sense, sorry for the wild goose chase. Just thought there as maybe something that we could do.

With that said the usecase I am worried about is an edge case and baring good options (1-3 I agree are out) and I don't want to re-create a scheduler (option 4) then I think what we have is sufficient

Copy link
Contributor Author

@jortel jortel Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we already have a scheduler. Option 4 only proposed to make it a little smarter.

Signed-off-by: Jeff Ortel <[email protected]>
@jortel jortel merged commit 3084f00 into konveyor:master Aug 30, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants