-
Notifications
You must be signed in to change notification settings - Fork 5
LASER Modeling in R
There have been questions about how tightly LASER will be tied to the Python language. This document attempts to show one way that modelers could work in R to run LASER. Note that the model in question here is just one of the LASER prototypes.
Please note that nothing in here attempts to solve the problem of extending the core LASER with R. This work just shows how one could create input files, run an existing model, and post-process outputs in R.
We will mostly be referencing code in this commit: https://github.com/jonathanhhb/laser/commit/54d9d75946267069100f713f5f75015ff59b0616
- Create input CSV files.
- Run model as a webservice.
- Post-process output CSV into results of interest.
Note that all 3 of those steps can be done just as easily in R as Python. Dataframe manipulation and communicating with RESTful webservices doesn't require Python.
At this point, the model service actually has the input files bundled with it so this simplified workflow is papering over a step. If one wants to unpack that part further, there are 2 or 3 options:
- Document the tools and workflow to publish the new input files; rebundle a new docker version of the docker image; republish a new instance of the model service.
- Upload the new input files to the service.
- Upload the new input files to a shared path and notify the service of the path.
I prefer option 3. The details of that workflow would be:
- Upload new input files to idm-data in artifactory using curl (might need to get permission).
- Include full or partial URL in params set to model webservice along with existing params.
- Model webservice will download the new model files from the public data location prior to kicking off the simulation.
It would be pretty trivial to mock this up but somehow doesn't feel completely worthwhile unless someone really wants to see it all for themselves.
The workflow outlined above and the existing functionality of the model webservice doesn't return the simulation_output.csv file from the model but rather is calibration oriented, so that a post-proc Python script is run at the end which calculate and return a small dictionary of key-value pairs, which is essentially some key metrics from the simulation which can be used for calibration.
A more general workflow would involve either:
- Returning the full csv file itself;
- Returning a URL to the full csv itself, so it can be download as a separate step;
- Allowing the user to upload a custom post-process script in R or Python and still return the output of that. Note that when I say "upload" here, you can substitute one of the 3 "values" of upload from the section above.
None of these 3 alternatives are currently demoed. But option 3 was fully implemented in 2022 in EMOD. The hardest part was creating a fully working R base image with necessary R packages pre-installed.