Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: HPC Resource Estimation Tooling #445

Open
TimothyWillard opened this issue Jan 6, 2025 · 1 comment
Open

[Feature request]: HPC Resource Estimation Tooling #445

TimothyWillard opened this issue Jan 6, 2025 · 1 comment
Labels
batch Relating to batch processing. enhancement Request for improvement or addition of new feature(s). medium priority Medium priority.

Comments

@TimothyWillard
Copy link
Contributor

TimothyWillard commented Jan 6, 2025

Label

batch, enhancement

Priority Label

medium priority

Is your feature request related to a problem? Please describe.

Users often have a difficult time estimating the HPC resources (time/CPU/memory) required for a batch job. There are useful rules of thumb that have been documented, but time and memory in particular are very dependent on the model itself. CPU less so, usually there are constraints on the inference method that determine this one.

Is your feature request related to a new application, scenario round, pathogen? Please describe.

No response

Describe the solution you'd like

Add an --estimate flag to the future flepimop batch command (described in GH-440) that will submit a small number of sample jobs varying underlying parameters (say if doing flepimop batch calibrate then chains and samples) would be varied. The time and peak memory usage would then be measured and could fit a multivariate linear regression and then take the 95% prediction interval estimate as the bounds for time and memory for a job of a larger size. An example might look like:

# --estimate=time is specifically for time, but could leave empty for time & memory
$ flepimop batch --estimate=time --vary=chains:2,4,6,8 --vary=iterations:100,200,300,400

I don't particularly like the --vary syntax as written above so open to suggestions. Also need to think about how the results will be outputted. Can:

  1. Produce a CSV/JSON that can be read in by the batch command,
  2. Output a modified configuration file with the changes made to it that encode this information, or
  3. Something else entirely?

I think the answer to the above as well as the exact syntax to use will become more clear/natural after implementing GH-440 so not worth thinking to hard about at the moment. Blocked by GH-440.

@TimothyWillard TimothyWillard added enhancement Request for improvement or addition of new feature(s). batch Relating to batch processing. medium priority Medium priority. labels Jan 6, 2025
@pearsonca
Copy link
Contributor

Hmm - do we want --estimate to be a batch option specifically, or should it be usable without batch? It seems potentially like one of the action module ABC elements -- like a simulate module should say how it implements doing an estimate.

Agree its mostly entangled with batch, so might be best to have it work that way (and then action modules don't have to implement their estimations - batch does it for them by running those with some diagnostic infrastructre).

I think I've talked myself into having be part of batch, but seems worth bring up some alternatives.

Re outputs: yeah, 90% 1 - this is mostly about getting out data for use by batch. We should stdout a usefully formatted version of that for users, but the should be thinking from how this would be consumed by batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
batch Relating to batch processing. enhancement Request for improvement or addition of new feature(s). medium priority Medium priority.
Projects
None yet
Development

No branches or pull requests

2 participants