Sensible log behavior when redis is unavailable #15466
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY
This picks up on the "redis" part of a prior PR #12698 and goes further.
That prior PR was too unfocused, trying to solve both the redis and receptor problems. Backing up, why am I looking at this problem in the first place? Because it gets in my way (personally) trying to diagnose other problems. Because when I look at logs, those logs are swamped due to 2 main reasons:
This PR is only concerned with the 2nd bullet point.
So when you get the log file you want (finally, after wading through the rest of the SOS report), you find that 95% of that log is stuff you don't want. Even worse, the noise is all "Traceback:" entries... which isn't great when what you're looking for is a stack trace of that format.
With that segway, here's a demo of the log behavior after taking redis down:
This is noisy on a certain point, but that is actually an interesting point.
The dispatcher "statistics" are used for the
--status
command. So if we can't get the statistics to stash into redis... what do we do? Before, we would drop that data on the floor and then log a giant stack trace. But, since redis connection errors are a very well-known quality, better to show the details of the error, and then print the data that we're dropping. Right? There's a non-zero chance that we have a pool management bug while at the same time hitting this, confounding debugging even more.ISSUE TYPE
COMPONENT NAME