Wait for the old CRD Manager to stop before starting a new one #1778
+166
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Description
This PR fixes an issue which was reported on the community slack. A person ran into this error when doing a config reload:
I could not reproduce the error, but I think it happens due to the new CRD Manager starting up before the old one has had a chance to stop and unregister its metrics. I'm not sure how to test this in a unit test, since we'd need to make the CRD Manager stop more slowly somehow. We'd probably have to refactor the code to be more unit testable, so for now I hope we can fix the bug without unit testing it.
I tested my change locally with a config like this, just to make sure the waitgroup functions ok:
Alloy config
I changed the
clustering/enabled
value, then triggered a config reload viacurl localhost:12345/-/reload
. Then I shut down Alloy using Ctrl + C. The reload and shutdown went ok.PR Checklist