Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --fail-fast #38

Merged
merged 7 commits into from
Nov 27, 2024
Merged

Add --fail-fast #38

merged 7 commits into from
Nov 27, 2024

Conversation

lgarrison
Copy link
Member

This adds the --fail-fast option to exit promptly if any task fails. It seems to work, but I have low confidence that this is the right implementation. In particular, engines with pending tasks fail currently, rather than gracefully closing. I'm not even 100% sure what the desired semantics are—do we want to wait for running tasks to finish, or kill them immediately?

@lgarrison lgarrison requested a review from njcarriero November 21, 2024 01:22
@lgarrison lgarrison linked an issue Nov 21, 2024 that may be closed by this pull request
@njcarriero
Copy link
Collaborator

Looks straightforward. The only thing I would recommend is adding a comment (which I should have done in the code fragment on which I am guessing this is modeled, around 1580) to the effect that "break" at 1532 in this context breaks out of the main driver control loop and drops into the exit code.

Thanks!

@lgarrison
Copy link
Member Author

I wasn't sure whether to be worried about the fact that disBatch exits with a non-zero exit code when failing fast (along with the message Some engine processes failed -- please check the logs). Normally a task failure would only cause a non-zero exit code with -e. But the current fail-fast behavior is to interrupt running tasks, so maybe the user does want to be notified that this has happened.

I think what I'm leaning towards is keeping the exit code and probably the Some engine processes failed message, and adding one more stderr message that says disBatch quit early due to task failure with --fail-fast or something like that. With just the former error message, it looks like an internal failure unless one goes spelunking in the logs for the "fail fast" message.

@lgarrison lgarrison merged commit 3879bd2 into master Nov 27, 2024
7 checks passed
@lgarrison lgarrison deleted the fail-fast branch November 27, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

option to quit if any task fails
2 participants