-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request]: introduce a process
action + associated configuration
#454
Comments
For I imagine the default being: run all specified modules, in specified order. Might also want to support a process:
steps: [jupyter, rmarkdown, exec] which reads as run all of the jupyter steps, then rmarkdown, then exec. If we want have modules support multiple internal steps (e.g. multiple notebooks to render), then we could do something like We'd also need to support dynamically overriding the steps from the command line: $ flepimop process someconfig.yml steps=jupyter::2 # just run the second jupyter module step |
question to potential users @saraloo @MacdonaldJoshuaCaleb @alsnhll: do we want to also / instead support syntax like $ flepimop simulate someconfig.yml --render=somenotebook.rmd which reads as "simulate the model in someconfig.yml and then render the notebook somenotebook.rmd"? I see that as a likely typical use case, and it should be relatively easy to implement for very low-flexibility options (that is, just render rmd or ipynb with the configuration file as an argument and no other customizability). We can likely do that as "sugar" syntax to replace the distinct simulate (or whatever) / process steps. But we're also likely to support in the future: $ flepimop simulate someconfig.yml | flepimop process i.e. simulate this configuration and then pipe the workflow output (a configuration file), to a processing action |
Noted in 13 Jan developer meeting, likely also some appetite for doing process steps at the start of a pipeline (e.g. to estimate some feature in a non-gempyor model, and then use that estimate as a parameter). |
I think including this syntax in addition to the above will be useful |
Thanks for thinking through all this. I agree, the render syntax from CLI would be useful.
I can imagine this super useful for testing purposes, when notebooks and configs are going through iterative changes and what not. I don't immediately see needing the sequential and module support. Might be useful for longterm but I don't think it's super necessary right now. |
Label
enhancement, meta/workflow, post-processing
Priority Label
medium priority
Is your feature request related to a problem? Please describe.
Users currently perform post-processing steps (e.g. rendering notebooks, running transformation scripts) manually outside of the pipeline. To make these steps more efficient for power-users and more accessible for lay-users, flepimop should support doing post-processing in pipeline.
Is your feature request related to a new application, scenario round, pathogen? Please describe.
No response
Describe the solution you'd like
Processing steps are likely to be highly specific to any given analysis, so this is not a request to implement lots of generalized code for post processing.
Rather, should support a workflow like:
with a someconfig.yml section like
which reads as "for the process action, use the 'jupyter' module (which knows to look for a file + other arguments) to render X, and the rmarkdown module (again, which knows to look for a file + other arguments) to render Y"
In general,
process
modules should support executing some standard action (e.g. jupyter notebook rendering) on a target file (e.g. a notebook) providing the configuration information in a standard way (likely as the path to the file itself) + additional arguments as specified in the configuration file.For typical use cases (e.g. notebooks, R or python post-processing scripts), we should include in the library some examples / templates that show argument parsing so users don't have to reinvent that every time they make a new notebook.
I think the obvious initial "modules" are:
Dry-running on process should report what will be executed. Probably report what other steps in the configuration appear incomplete (as in, if simulate hasn't been run => no output results) - I don't think we want to have this specifying what processing steps depend on what being done (and definitely not trying to introspect that out), but we should alert users that something else in this configuration doesn't appear to have happened, so if processing depends on that being done, its not going to work.
This issue depends upon having completed #451
The text was updated successfully, but these errors were encountered: