Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make (part of) processin-chain async #51

Open
leclairm opened this issue Nov 24, 2023 · 0 comments · Fixed by #52 or #54
Open

Make (part of) processin-chain async #51

leclairm opened this issue Nov 24, 2023 · 0 comments · Fixed by #52 or #54

Comments

@leclairm
Copy link
Contributor

leclairm commented Nov 24, 2023

2 approaches to the async problem

  1. The comprehensive one which would enable all the jobs to be actual slurm jobs with dependencies that can be submited ahead of time. This would allow:
    • all the current configurations to run async
    • users to write arbitrary scripts and turn them automatically in such jobs.
  2. The targeted approach where we only focus on some jobs of interest. This would have the following limitations:
    • Only jobs that explicitly submit a corresponding slurm job can run async
    • This also implies that any job in an async config must be implemented this way, including user provided jobs.

Implications

Although approach 1 seems more appealing, it implies a lot of refactoring as it contradicts with a lot of design choices. In particular, the current structure assumes that

  • jobs have access to the only one running python interpreter and its memory.
  • jobs run sequentially

In order to generate slurm jobs out of any of these jobs we'd need to ensure that

  • the job python module is transferred to the working directory along with all its imported modules
  • the configuration objects are dumped to files in that working directory

all of this knowing that some of the jobs currently act one after each other in the same directory...

Proposed Road map

As a conclusion, here is the proposed road map:

  1. target the jobs of interest, namely icon and prepare_data and make their main function return the job id(s) that they submitted
  2. Implement the dependency mechanism in run_chain with an error when trying to run async with jobs not ready for it.
  3. Later: make as many job async as possible so that other config than icon can be made async

Most of the work is in prepare_data. It will need to be either broken into pieces or equiped with error messages for all the pieces not ready for async yet.

@leclairm leclairm linked a pull request Nov 28, 2023 that will close this issue
5 tasks
@mjaehn mjaehn linked a pull request Jan 29, 2024 that will close this issue
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant