[Feature request]: HPC Resource Estimation Tooling #445

TimothyWillard · 2025-01-06T16:24:48Z

Label

batch, enhancement

Priority Label

medium priority

Is your feature request related to a problem? Please describe.

Users often have a difficult time estimating the HPC resources (time/CPU/memory) required for a batch job. There are useful rules of thumb that have been documented, but time and memory in particular are very dependent on the model itself. CPU less so, usually there are constraints on the inference method that determine this one.

Is your feature request related to a new application, scenario round, pathogen? Please describe.

No response

Describe the solution you'd like

Add an --estimate flag to the future flepimop batch command (described in GH-440) that will submit a small number of sample jobs varying underlying parameters (say if doing flepimop batch calibrate then chains and samples) would be varied. The time and peak memory usage would then be measured and could fit a multivariate linear regression and then take the 95% prediction interval estimate as the bounds for time and memory for a job of a larger size. An example might look like:

# --estimate=time is specifically for time, but could leave empty for time & memory
$ flepimop batch --estimate=time --vary=chains:2,4,6,8 --vary=iterations:100,200,300,400

I don't particularly like the --vary syntax as written above so open to suggestions. Also need to think about how the results will be outputted. Can:

Produce a CSV/JSON that can be read in by the batch command,
Output a modified configuration file with the changes made to it that encode this information, or
Something else entirely?

I think the answer to the above as well as the exact syntax to use will become more clear/natural after implementing GH-440 so not worth thinking to hard about at the moment. Blocked by GH-440.

The text was updated successfully, but these errors were encountered:

pearsonca · 2025-01-06T16:38:11Z

Hmm - do we want --estimate to be a batch option specifically, or should it be usable without batch? It seems potentially like one of the action module ABC elements -- like a simulate module should say how it implements doing an estimate.

Agree its mostly entangled with batch, so might be best to have it work that way (and then action modules don't have to implement their estimations - batch does it for them by running those with some diagnostic infrastructre).

I think I've talked myself into having be part of batch, but seems worth bring up some alternatives.

Re outputs: yeah, 90% 1 - this is mostly about getting out data for use by batch. We should stdout a usefully formatted version of that for users, but the should be thinking from how this would be consumed by batch.

TimothyWillard added enhancement Request for improvement or addition of new feature(s). batch Relating to batch processing. medium priority Medium priority. labels Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request]: HPC Resource Estimation Tooling #445

[Feature request]: HPC Resource Estimation Tooling #445

TimothyWillard commented Jan 6, 2025 •

edited

Loading

pearsonca commented Jan 6, 2025

[Feature request]: HPC Resource Estimation Tooling #445

[Feature request]: HPC Resource Estimation Tooling #445

Comments

TimothyWillard commented Jan 6, 2025 • edited Loading

Label

Priority Label

Is your feature request related to a problem? Please describe.

Is your feature request related to a new application, scenario round, pathogen? Please describe.

Describe the solution you'd like

pearsonca commented Jan 6, 2025

TimothyWillard commented Jan 6, 2025 •

edited

Loading