You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Elastic Agent upgrade watcher currently considers the state of the agent itself and each started component process when deciding if the upgrade was successful:
The agent upgrade watcher should stop considering the state of component processes when deciding whether it should roll back the upgrade. There are multiple reasons for this:
The agent should not trust component processes to be well behaved at all times. Components have and will continue to have unexpected runtime errors and panics that may be transient and do not indicate that the upgrade itself failed.
Any problem causing a component to fail at startup is a bug that we would want to address immediately. By automatically rolling back the upgrade we are creating the need for two investigations instead of one. We need an investigation into why the upgrade failed, followed by an investigation into why the component failed. It is much simpler to present the component failure immediately after upgrade.
It should still be possible to easily rollback an upgrade, it should just not be done automatically based a brief sampling of the component state.
The agent should not trust component processes to be well behaved at all times. Components have and will continue to have unexpected runtime errors and panics that may be transient and do not indicate that the upgrade itself failed.
Do we really want that? That means that if you upgrade the Elastic Agent and say Endpoint Security is broken it will remain broken and not rolled back automatically. I understand that with this change its less likely that Elastic Agent will be blamed for a bad upgrade, but I don't know if we necessary want to make the rollback process a manual process.
I'm not longer convinced we want this, and a better solution is to just make it much more obvious that an upgrade has rolled back along with the reason why. I'm going to close this.
The Elastic Agent upgrade watcher currently considers the state of the agent itself and each started component process when deciding if the upgrade was successful:
elastic-agent/internal/pkg/agent/application/upgrade/error_checker.go
Lines 81 to 90 in c097697
The agent upgrade watcher should stop considering the state of component processes when deciding whether it should roll back the upgrade. There are multiple reasons for this:
It should still be possible to easily rollback an upgrade, it should just not be done automatically based a brief sampling of the component state.
We need elastic/kibana#172745 to be completed first.
The text was updated successfully, but these errors were encountered: