-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leaderelection: configure all timeouts via environment variables #305
Conversation
CATTLE_ELECTION_LEASE_DURATION CATTLE_ELECTION_RENEW_DEADLINE CATTLE_ELECTION_RETRY_PERIOD See: rancher/wrangler#305 Signed-off-by: Silvio Moioli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, otherwise LGTM.
fb22944
to
57d388e
Compare
@rmweir fixed. Do changes in wrangler require two approvers? If so, who could be the right person? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@moio yes, I assigned @KevinJoiner. Once approved it will be good to merge. |
Also, I don't know what 🎯 policy is for Unit tests. |
Well, the Bullseye Team does not have a policy at this point. There is a guiding principle, which is to adapt to each Team's patterns/habits/processes/policies to the extent reasonable and possible - the goal being to help as good as possible. Question becomes: would you normally add (or ask to add) tests in a case like this? If so, how would you propose to set them up? Currently code is basically a "main" method and the effect of setting environment variables is only time-detectable - not really amenable to unit tests. One possibility would be to refactor out the main added block: to a separate function returning the three values or |
@KevinJoiner do you have a comment on the last update? |
@KevinJoiner , @rmweir What's the current state here? |
Signed-off-by: Silvio Moioli <[email protected]>
…to separate function Signed-off-by: Silvio Moioli <[email protected]>
Signed-off-by: Silvio Moioli <[email protected]>
26372af
to
7c35dd2
Compare
@KevinJoiner: rebased, refactored per your suggestion, added unit tests. Please review again. |
Signed-off-by: Silvio Moioli <[email protected]>
Signed-off-by: Silvio Moioli <[email protected]>
eadda4d
to
ef1a21e
Compare
@rmweir do we already know when a new version of wrangler could be tagged? I am asking because of a fleet issue that would benefit from this change. |
We have observed in
fleet-agent
user cases (SURE-6125/fleet#1465) that the leader election can be lost due to timeouts - not because pods are not working, but because the control plane is overwhelmed by load.While of course having an overloaded control plane is a problem in and of itself, losing the election means that Wrangler-based pods will restart, which in turn means a new full listing of all interesting resources, creating additional load and exacerbating symptoms.
Ideally, the situation should improve when API Priority and Fairness goes out of beta and becomes commonplace - as it should ensure leader election API calls get served with higher priority, thus returning correctly within reasonable time frames even if the control plane is overloaded.
Meantime, pretty much the only other pragmatic option left is to give election more time before failure.
This PR allows to set all critical parameters via environment variables - with default values staying the same as before.
Note: this will also address (SURE-3805/#1491)