You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The comprehensive one which would enable all the jobs to be actual slurm jobs with dependencies that can be submited ahead of time. This would allow:
all the current configurations to run async
users to write arbitrary scripts and turn them automatically in such jobs.
The targeted approach where we only focus on some jobs of interest. This would have the following limitations:
Only jobs that explicitly submit a corresponding slurm job can run async
This also implies that any job in an async config must be implemented this way, including user provided jobs.
Implications
Although approach 1 seems more appealing, it implies a lot of refactoring as it contradicts with a lot of design choices. In particular, the current structure assumes that
jobs have access to the only one running python interpreter and its memory.
jobs run sequentially
In order to generate slurm jobs out of any of these jobs we'd need to ensure that
the job python module is transferred to the working directory along with all its imported modules
the configuration objects are dumped to files in that working directory
all of this knowing that some of the jobs currently act one after each other in the same directory...
Proposed Road map
As a conclusion, here is the proposed road map:
target the jobs of interest, namely icon and prepare_data and make their main function return the job id(s) that they submitted
Implement the dependency mechanism in run_chain with an error when trying to run async with jobs not ready for it.
Later: make as many job async as possible so that other config than icon can be made async
Most of the work is in prepare_data. It will need to be either broken into pieces or equiped with error messages for all the pieces not ready for async yet.
The text was updated successfully, but these errors were encountered:
2 approaches to the async problem
Implications
Although approach 1 seems more appealing, it implies a lot of refactoring as it contradicts with a lot of design choices. In particular, the current structure assumes that
In order to generate slurm jobs out of any of these jobs we'd need to ensure that
all of this knowing that some of the jobs currently act one after each other in the same directory...
Proposed Road map
As a conclusion, here is the proposed road map:
icon
andprepare_data
and make theirmain
function return the job id(s) that they submittedrun_chain
with an error when trying to run async with jobs not ready for it.Most of the work is in
prepare_data
. It will need to be either broken into pieces or equiped with error messages for all the pieces not ready for async yet.The text was updated successfully, but these errors were encountered: