Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example Flux jobscripts #589

Merged
merged 5 commits into from
Apr 8, 2024
Merged

Conversation

jameshcorbett
Copy link
Collaborator

As mentioned in #553, there are no flux jobscripts. Add them.

[WIP until I can test thoroughly.]

Problem: there is no SCR batch job script for Flux.

Add one.
Problem: the jobscript documentation does not mention
`scr_flux_run`.

Add a line describing it.
Problem: there is no example job script for running under Flux and
recovering from node failures.

Add one.
Problem: there is no documentation for the `scr_flux_run_loop.sh`
example script.

Mention it in the jobscripts README.
Problem: the Flux ResourceManager subclass returns a list of strings
from the 'down_nodes' method but the expected return value is a
mapping from strings (hostnames) to strings giving a reason for
why the node is down.

Return a mapping, as expected.
@jameshcorbett jameshcorbett marked this pull request as ready for review April 8, 2024 02:21
@jameshcorbett
Copy link
Collaborator Author

Ok it took a lot of trial and error (some nodes just wouldn't die on command) but I finally got a solid test of this and it seems to work! Ready for review now @mcfadden8

Copy link
Collaborator

@mcfadden8 mcfadden8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @jameshcorbett! This looks great.

@mcfadden8 mcfadden8 merged commit ea8f98a into LLNL:develop Apr 8, 2024
@jameshcorbett jameshcorbett deleted the flux-jobscripts branch April 8, 2024 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants