Wait for all threads to finish on parallel runner #535

djmb · 2024-04-04T10:14:54Z

If there's an error in one thread, we wait for earlier hosts to complete but not for later ones.

Ensure consistent behaviour across all hosts by saving the exception and joining each thread in turn before returning.

If there's an error in one thread, we wait for earlier hosts to complete but not for later ones. Ensure consistent behaviour across all hosts by saving the exception and joining each thread in turn before returning.

Sija · 2024-04-04T11:05:35Z

lib/sshkit/runners/parallel.rb

+      private
+
+      def wait_for_threads(threads)
+        exception = nil


Perhaps storing all of the exceptions would be more informative?

I think that would be a nice improvement - if something fails on multiple hosts it can be tempting to assume the problem is specific just to the host that the error is reported on.

We'd need a way to do it without breaking the current API though which is an ExecuteError with a single cause. We could store multiple causes I guess and have ExecuteError#cause return the first one?

But I think that would be something for another PR.

mattbrictson · 2024-04-08T23:38:03Z

Hi @djmb , could you give me a little more backstory on this proposal?

My interpretation is that SSHKit's parallel runner, in its current form, is effectively "fail-fast". In other words, as soon a host fails, an exception is raised right away. If the exception is not rescued, then the Ruby process exits and any subsequent hosts are not allowed to run to completion.

The change being proposed in this PR is to remove "fail-fast" for in: :parallel execution and instead explicitly wait for all hosts to run to completion. Once all hosts are allowed to complete, then an exception is raised corresponding to the first failure. (However, if in: :groups is used, the first group that fails would still short-circuit subsequent groups and prevent them from running.)

I hesitate to accept this PR as-is because this seems like a significant change that could be considered breaking. Although I was not present for the original creation of SSHKit, I have to assume that the fail-fast behavior was an intentional design decision.

Are there other solutions that wouldn't involve changing the default behavior? Could you register a custom runner for your particular use case? For example, you should be able to set the default using:

SSHKit.config.default_runner = MyRunner

Or as a one-off:

on(hosts, in: MyRunner) do
  # ...
end

Would either of those work?

djmb · 2024-04-15T15:25:04Z

HI @mattbrictson,

My reasoning for the change is that the current behaviour is not really fail fast - the parallel runner joins the threads in turn so if the second host thread has an exception, we wait for the first host thread, but not the third.

Since we sometimes wait and sometimes do not, it seems more consistent to always wait.

However always waiting is also the behaviour that I'm looking for, so maybe there's some motivated reasoning going on here 😅.

Using the default runner doesn't work completely in my case, as the group runner uses the parallel runner internally so I need create a custom parallel runner and a custom group runner. But I can certainly work around this if we want to keep the current behaviour.

Wait for all threads to finish on parallel runner

f076660

If there's an error in one thread, we wait for earlier hosts to complete but not for later ones. Ensure consistent behaviour across all hosts by saving the exception and joining each thread in turn before returning.

djmb mentioned this pull request Apr 4, 2024

Remove the healthcheck step basecamp/kamal#740

Merged

Sija reviewed Apr 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for all threads to finish on parallel runner #535

Wait for all threads to finish on parallel runner #535

djmb commented Apr 4, 2024

Sija Apr 4, 2024

djmb Apr 4, 2024

mattbrictson commented Apr 8, 2024

djmb commented Apr 15, 2024

Wait for all threads to finish on parallel runner #535

Are you sure you want to change the base?

Wait for all threads to finish on parallel runner #535

Conversation

djmb commented Apr 4, 2024

Sija Apr 4, 2024

Choose a reason for hiding this comment

djmb Apr 4, 2024

Choose a reason for hiding this comment

mattbrictson commented Apr 8, 2024

djmb commented Apr 15, 2024