[Feature request]: introduce a `process` action + associated configuration #454

pearsonca · 2025-01-08T16:03:02Z

Label

enhancement, meta/workflow, post-processing

Priority Label

medium priority

Is your feature request related to a problem? Please describe.

Users currently perform post-processing steps (e.g. rendering notebooks, running transformation scripts) manually outside of the pipeline. To make these steps more efficient for power-users and more accessible for lay-users, flepimop should support doing post-processing in pipeline.

Is your feature request related to a new application, scenario round, pathogen? Please describe.

No response

Describe the solution you'd like

Processing steps are likely to be highly specific to any given analysis, so this is not a request to implement lots of generalized code for post processing.

Rather, should support a workflow like:

$ flepimop simulate someconfig.yml # do the simulation + produce some results
$ flepimop process someconfig.yml # execute some post processing analysis, e.g. render a notebook

with a someconfig.yml section like

process:
  jupyter:
    file: somenotebook.ipynb
  rmarkdown:
    file: someothernotebook.rmd
    args: etc

which reads as "for the process action, use the 'jupyter' module (which knows to look for a file + other arguments) to render X, and the rmarkdown module (again, which knows to look for a file + other arguments) to render Y"

In general, process modules should support executing some standard action (e.g. jupyter notebook rendering) on a target file (e.g. a notebook) providing the configuration information in a standard way (likely as the path to the file itself) + additional arguments as specified in the configuration file.

For typical use cases (e.g. notebooks, R or python post-processing scripts), we should include in the library some examples / templates that show argument parsing so users don't have to reinvent that every time they make a new notebook.

I think the obvious initial "modules" are:

execute a particular bash command (series of bash commands?)
run a bash script
run an R script (series of?)
run a python script (series of?)
render Rmarkdown
render ipynb

Dry-running on process should report what will be executed. Probably report what other steps in the configuration appear incomplete (as in, if simulate hasn't been run => no output results) - I don't think we want to have this specifying what processing steps depend on what being done (and definitely not trying to introspect that out), but we should alert users that something else in this configuration doesn't appear to have happened, so if processing depends on that being done, its not going to work.

This issue depends upon having completed #451

The text was updated successfully, but these errors were encountered:

pearsonca · 2025-01-08T16:17:55Z

For process, likely to want to specify multiple steps, but not necessarily always want to run them all.

I imagine the default being: run all specified modules, in specified order.

Might also want to support a steps or stages key (basically, the "scenarios" equivalent), which allows an order specification. Something like:

process:
  steps: [jupyter, rmarkdown, exec]

which reads as run all of the jupyter steps, then rmarkdown, then exec. If we want have modules support multiple internal steps (e.g. multiple notebooks to render), then we could do something like [jupyter::1, rmarkdown, jupyter::2] to express jupyter step 1 first, then all the rmarkdown step(s), then jupyter step 2.

We'd also need to support dynamically overriding the steps from the command line:

$ flepimop process someconfig.yml steps=jupyter::2 # just run the second jupyter module step

pearsonca · 2025-01-08T16:32:51Z

question to potential users @saraloo @MacdonaldJoshuaCaleb @alsnhll: do we want to also / instead support syntax like

$ flepimop simulate someconfig.yml --render=somenotebook.rmd

which reads as "simulate the model in someconfig.yml and then render the notebook somenotebook.rmd"?

I see that as a likely typical use case, and it should be relatively easy to implement for very low-flexibility options (that is, just render rmd or ipynb with the configuration file as an argument and no other customizability).

We can likely do that as "sugar" syntax to replace the distinct simulate (or whatever) / process steps. But we're also likely to support in the future:

$ flepimop simulate someconfig.yml | flepimop process

i.e. simulate this configuration and then pipe the workflow output (a configuration file), to a processing action

pearsonca · 2025-01-13T19:56:47Z

Noted in 13 Jan developer meeting, likely also some appetite for doing process steps at the start of a pipeline (e.g. to estimate some feature in a non-gempyor model, and then use that estimate as a parameter).

anjalika-nande · 2025-01-13T20:49:41Z

question to potential users @saraloo @MacdonaldJoshuaCaleb @alsnhll: do we want to also / instead support syntax like
$ flepimop simulate someconfig.yml --render=somenotebook.rmd
which reads as "simulate the model in someconfig.yml and then render the notebook somenotebook.rmd"?

I see that as a likely typical use case, and it should be relatively easy to implement for very low-flexibility options (that is, just render rmd or ipynb with the configuration file as an argument and no other customizability).

We can likely do that as "sugar" syntax to replace the distinct simulate (or whatever) / process steps. But we're also likely to support in the future:
$ flepimop simulate someconfig.yml | flepimop process
i.e. simulate this configuration and then pipe the workflow output (a configuration file), to a processing action

I think including this syntax in addition to the above will be useful

saraloo · 2025-01-13T20:58:11Z

Thanks for thinking through all this. I agree, the render syntax from CLI would be useful.
ie

$ flepimop simulate someconfig.yml --render=somenotebook.rmd

I can imagine this super useful for testing purposes, when notebooks and configs are going through iterative changes and what not.

I don't immediately see needing the sequential and module support. Might be useful for longterm but I don't think it's super necessary right now.

TimothyWillard added enhancement post-processing medium priority labels Jan 10, 2025

TimothyWillard added this to the Post-Processing And Scenario Analysis Tools milestone Jan 10, 2025

TimothyWillard added the cli label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request]: introduce a `process` action + associated configuration #454

[Feature request]: introduce a `process` action + associated configuration #454

pearsonca commented Jan 8, 2025

pearsonca commented Jan 8, 2025

pearsonca commented Jan 8, 2025 •

edited

Loading

pearsonca commented Jan 13, 2025

anjalika-nande commented Jan 13, 2025

saraloo commented Jan 13, 2025

[Feature request]: introduce a process action + associated configuration #454

[Feature request]: introduce a process action + associated configuration #454

Comments

pearsonca commented Jan 8, 2025

Label

Priority Label

Is your feature request related to a problem? Please describe.

Is your feature request related to a new application, scenario round, pathogen? Please describe.

Describe the solution you'd like

pearsonca commented Jan 8, 2025

pearsonca commented Jan 8, 2025 • edited Loading

pearsonca commented Jan 13, 2025

anjalika-nande commented Jan 13, 2025

saraloo commented Jan 13, 2025

[Feature request]: introduce a `process` action + associated configuration #454

[Feature request]: introduce a `process` action + associated configuration #454

pearsonca commented Jan 8, 2025 •

edited

Loading