Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use maximum between liveness and readiness for fail_time #196

Closed

Conversation

herodes1991
Copy link
Contributor

@herodes1991 herodes1991 requested a review from a team as a code owner November 15, 2022 12:26
@herodes1991 herodes1991 requested a review from a team November 25, 2022 11:41
@oyvindio
Copy link
Member

It is possible that I don't have all the details, but based on my current understanding of the issue in #195 , it seems to me like the changes in this PR as it currently is could have some potential drawbacks:
ReadyCheck tries to set a timeout based on the expected duration of a successful rollout at the Kubernetes level. During a rollout it is necessary to wait for initial_delay_seconds from the readiness probe for each pod to be ready, to ensure there are always some number of available pods. As far as I understand, the rollout process does not wait for initial_delay_seconds set on the liveness probe, because this probe's default state is success.
This means readiness probe initial_delay_seconds influence rollout duration directly, but the liveness probe initial_delay_seconds setting to my knowledge doesn't. Because of this difference, it seems to me like a ReadyCheck timeout set based on liveness probe initial_delay_seconds would not necessarily be a good estimate of the rollout duration. Even if the value of initial_delay_seconds on the liveness probe is higher than the same parameter on the readiness probe, the way I see it, a longer timeout isn't always better because it means it takes a longer time before the FAILED result is set on the application status if the rollout fails, and the user deploying the application can get feedback about that.

@herodes1991
Copy link
Contributor Author

herodes1991 commented Nov 29, 2022

I agree with you @oyvindio, the only weird thing is that we had a scenario where the deployment was successfully deployed but the FIAAS application was marked as FAILED 😔. I will close it for now and encourage our users to increase the initial_delay to be more "flexible" 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants