-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an e2e tests that detects error logs during etcd bootstrap #17329
Comments
Hi, I'd like to take this up! |
Thanks, feel free to reach out to me on slack (get an invite) if you need any help |
#17423 covered a single node cluster, we should also cover:
|
we can raise a new PR for it. WDYT? |
SG, thanks |
Also, when we are done, we should consider backporting the test. |
Still remaining to be completed:
|
I think I can take this. /assign |
done 🙂 I think this can be closed now. |
Awesome, thanks @ghouscht ! |
It looks like the test doesn't enable TLS by default. Should we also extend the test to support the case of enabling TLS? |
Good point, I missed #17329 (comment). |
Oh I missed that as well, sorry! I'll have a look later on and will prepare another PR. |
Something like this #18819? Or did you have something different in your mind? |
What would you like to be added?
Etcd should not write any error logs during normal operation. We could add a test that starts etcd cluster, does some simple operations and validates if there is any error level log written.
This could be further improved by enabling this check in all e2e tests when cluster/server is being closed. As some test intentionally inject errors, we could make it configurable option that is enabled by default and disabled in all tests we expect error to be written.
Why is this needed?
Logs are used to provide useful debug information to users, however if mislabeled they could cause user to lose trust and ignore errors. For error logs to be useful they should only be emitted where there is something wrong.
We should prevent issues like #17245 where a change in code started generating a large number of error logs.
Proposed in #17249 (comment) with agreement between maintainers.
The text was updated successfully, but these errors were encountered: