Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fatal LoggingThread error backout strategy #253

Merged

Conversation

crungehottman
Copy link
Member

In #169, fatal LoggingThread errors were logged to the worker's local log file before exiting.

In #248, a more drastic measure was taken: all exceptions were indefinitely retried and the ability to write exceptions to the worker's local log file was removed. This approach could prevent the LoggingThread from terminating when encountering a fatal error.

This commit combines the two approaches, backing out and exiting only after determining we've identified a persistent fatal error.

The means by which we identify a fatal (vs a temporary/non-fatal) LoggingThread error is by simply retrying the upload_task_log method during a defined interval (defined by the LoggingThread's _timeout attribute).

If the method continues to fail for the duration of the interval (i.e., does not succeed by the time the timeout is reached), we can consider the error to be fatal. At this point, we attempt to instead write the error to the worker's local log file, and raise the exception.

Note that the timeout can be toggled using the
KOBO_LOGGING_THREAD_TIMEOUT environment variable.

kobo/worker/logger.py Outdated Show resolved Hide resolved
@crungehottman crungehottman force-pushed the logging-thread-retries branch 2 times, most recently from f589853 to 37a86eb Compare March 15, 2024 13:53
kobo/worker/logger.py Outdated Show resolved Hide resolved
In release-engineering#169,
fatal LoggingThread errors were logged to the worker's local
log file before exiting.

In release-engineering#248, a more
drastic measure was taken: all exceptions were indefinitely
retried and the ability to write exceptions to the worker's
local log file was removed. This approach could prevent the
LoggingThread from terminating when encountering a fatal error.

This commit combines the two approaches, backing out and exiting
only after determining we've identified a persistent fatal error.

The means by which we identify a fatal (vs a temporary/non-fatal)
LoggingThread error is by simply retrying the `upload_task_log`
method during a defined interval (defined by the LoggingThread's
`_timeout` attribute).

If the method continues to fail for the duration of the interval
(i.e., does not succeed by the time the timeout is reached), we
can consider the error to be fatal. At this point, we attempt to
instead write the error to the worker's local log file, and raise
the exception.

Note that the timeout can be toggled using the
`KOBO_LOGGING_THREAD_TIMEOUT` environment variable.
@crungehottman crungehottman merged commit 8c7fef3 into release-engineering:master Mar 18, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants