-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WQ: worker does not seem to respect --idle-timeout
#4012
Comments
Hello Sander - We use the Two things to check: |
Workers are started using the Parsl provider, and the |
Getting back to this after the holiday break... I have been doing some testing here, and from my end, the idle-timeout feature works as expected. It is baked right into the main loop of the program, and it doesn't interact with any other features. When activated, the worker should send an
The only actions that reset the idle timer are: So I am a bit puzzled as to what is happening. |
How about this: Run your application again, and this time run a If that times out as expected, then our problem likely lies in passing the command line properly through all the layers of parsl, slurm, etc If it does not time out as expected, then the problem likely lies in WQ, and the debug log will give us some insight into that. |
A worker was started using the following launch command:
Based on the
--help
, I would think that this worker should quit after 20 seconds of not being given any task. However, thedebug_log
of the WQ process which accepts the incoming worker connections looks like this:i.e. there is only one task running but all the other workers remain alive and somehow don't disconnect due to an idle timeout. Is this intentional behavior? I would expect that if there's only one task running, the idling workers should be killed after some time.
This is
cctools==7.11.0
, used in combination with aWorkQueueExecutor
from Parsl. I quickly checked in with @benclifford about this but he didn't seem to spot an immediate mistake on my end here.The text was updated successfully, but these errors were encountered: