-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for all threads to finish on parallel runner #535
base: master
Are you sure you want to change the base?
Conversation
If there's an error in one thread, we wait for earlier hosts to complete but not for later ones. Ensure consistent behaviour across all hosts by saving the exception and joining each thread in turn before returning.
private | ||
|
||
def wait_for_threads(threads) | ||
exception = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps storing all of the exceptions would be more informative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be a nice improvement - if something fails on multiple hosts it can be tempting to assume the problem is specific just to the host that the error is reported on.
We'd need a way to do it without breaking the current API though which is an ExecuteError with a single cause. We could store multiple causes I guess and have ExecuteError#cause return the first one?
But I think that would be something for another PR.
Hi @djmb , could you give me a little more backstory on this proposal? My interpretation is that SSHKit's parallel runner, in its current form, is effectively "fail-fast". In other words, as soon a host fails, an exception is raised right away. If the exception is not rescued, then the Ruby process exits and any subsequent hosts are not allowed to run to completion. The change being proposed in this PR is to remove "fail-fast" for I hesitate to accept this PR as-is because this seems like a significant change that could be considered breaking. Although I was not present for the original creation of SSHKit, I have to assume that the fail-fast behavior was an intentional design decision. Are there other solutions that wouldn't involve changing the default behavior? Could you register a custom runner for your particular use case? For example, you should be able to set the default using: SSHKit.config.default_runner = MyRunner Or as a one-off: on(hosts, in: MyRunner) do
# ...
end Would either of those work? |
HI @mattbrictson, My reasoning for the change is that the current behaviour is not really fail fast - the parallel runner joins the threads in turn so if the second host thread has an exception, we wait for the first host thread, but not the third. Since we sometimes wait and sometimes do not, it seems more consistent to always wait. However always waiting is also the behaviour that I'm looking for, so maybe there's some motivated reasoning going on here 😅. Using the default runner doesn't work completely in my case, as the group runner uses the parallel runner internally so I need create a custom parallel runner and a custom group runner. But I can certainly work around this if we want to keep the current behaviour. |
If there's an error in one thread, we wait for earlier hosts to complete but not for later ones.
Ensure consistent behaviour across all hosts by saving the exception and joining each thread in turn before returning.