-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
track whether a Stage was ever healthy at any point following each promotion #2847
Comments
The determination that the next Promotion may proceed or not looks at verification status and, absent any verification process, looks at Stage health. Stage health is partially informed by Application state(s), but the decision to proceed to the next promotion or not is entirely unaware of what's going on with Argo CD. i.e. This issue becomes simpler (only in relative terms) if we leave Argo CD out of the equation. (If it seems like I'm being pedantic about this, it's only because we're trying very hard decouple most of Kargo from Argo CD. In an ideal world, the promotion step that updates Argo CD Apps would be the only component of Kargo with any Argo CD awareness.) What you are proposing is something I had, in fact, considered at one point. My approach had simply been that in the absence of any user-defined verification process that would leave behind a I believe @hiddeco is actively working on refactoring this bit of code, which I've mentioned is quite complex and difficult to reason over. I'd like him to weigh in on this how this may be best resolved. To be clear, I believe we do need to better account for the scenario you described, I am just not positive what that looks like yet. |
That all makes sense, thank you for the breakdown. I could definitely see that being confusing, but I guess the flip side is the exact opposite, which is what prompted my comment here - basically "We're not waiting on any verification, why isn't this promotion running?". I'll follow along as that work progresses, and happy to test any proposed solutions as that comes together. |
Next week I will be in a better position to form a decent opinion on what options are worth looking into, but what I can already say is that verification and health checks do need to change in some way to make things more pleasant in multiple scenarios. |
Description
Currently, if a stage is responsible for syncing ArgoCD apps, those apps must be healthy before either:
This is problematic for projects with stages that handle applications that auto scale, as the applications frequently go into a progressing state as they scale. That progressing state is observer by the kargo controller whether or not the app already reported healthy for the given promotion, blocking new promotions. This effectively means ability to deploy is dependent on an apps load. Breaking down stages to handle less applications is one option, but even a single app could have a deployment scaling from a few hundred to a few thousand pods throughout a day.
I wonder if it might make sense to track that a given app has reported healthy to a specific promotion, and then not allow it to go back to progressing? It could potentially just ignore further progressing status, but still consider other statuses like error, unknown, etc. if the concern is that the app becomes unhealthy after the rollout completes.
The text was updated successfully, but these errors were encountered: