Scaler fails only when failing to get counts from all the interceptor endpoints #903
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Provide a description of what has been changed
We observe behavior that the scaler fails and exit the loop when failing to get counts from any of the interceptor replica.
Not sure this is the intended behavior but sometimes one interceptor replica is down only because it is on a spot node. When the node is down and the endpoints of the interceptor service is not updated yet, the scaler still try to get from and endpoint which does not exist. And most of the time the killed interceptor pod will heal itself.
Checklist
README.md
docs/
directoryFixes #
Change so that the scaler fails only when fetching all the counts failed.
Comment:
I am new to this, not sure the existing version is the intended behavior. Please let me know if there is a better way or it can be handled by any config value that I am not aware of. Appreciated.