Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker: limit the size of Queue in LoggingThread #249

Merged
merged 2 commits into from
Mar 14, 2024

Conversation

lzaoral
Copy link
Contributor

@lzaoral lzaoral commented Mar 12, 2024

This change fixes two issues:

  1. By limiting the maximum size of the Queue, we avoid consuming all
    available memory on given machine to store its contents.

  2. By limiting the amount of retrieved elements from the Queue, we avoid
    consuming all available memory to construct the byte string to send and
    we avoid stalling the upload of logs to the hub.

Also, log all skipped exceptions caught in LoggingThread to ease debugging.

@kdudka
Copy link
Contributor

kdudka commented Mar 12, 2024

Just to provide some context. We hit this bug with OpenScanHub, which calls kobo.shortcut.run() with buffer_size=2: https://github.com/openscanhub/openscanhub/blob/a39168977cca6f89799a32a50c02625e6a92800b/osh/worker/csmock_runner.py#L85

The outcome was that scanning tasks that produced big amount of output in a small period of time never finished and stayed in the OPEN state forever consuming 100% CPU on the worker despite the corresponding scan results were already uploaded to the hub.

Nevertheless, the problems being fixed may happen regardless of buffer_size, perhaps just with lower probability. I do not think that kobo should read output of unlimited size from a subprocess and put everything into memory without applying any bound.

kobo/worker/logger.py Outdated Show resolved Hide resolved
This change fixes two issues:

1. By limiting the maximum size of the Queue, we avoid consuming all
   available memory on given machine to store its contents.

2. By limiting the amount of retrieved elements from the Queue, we avoid
   consuming all available memory to construct the byte string to send and
   we avoid stalling the upload of logs to the hub.
@rohanpm rohanpm merged commit c5acba4 into release-engineering:master Mar 14, 2024
19 checks passed
@lzaoral lzaoral deleted the worker-logger-ratelimit branch March 15, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants