Swap config to a class with static fields #98

natemcintosh · 2024-11-21T18:22:36Z

Goal

Create a Config class with static fields for all the required input parameters for the model.

Context

Configuration files for this pipeline are generated by this very handy config generator. However, because it is in a different repo, differences in the required fields between the repos can sneakily make their way in, only to be found at run time. From personal experience doing a test run of the pipeline, I found two fields that had gotten out of sync, and had to manually fix them in the config files.

At the moment, all of the requried parameters that go in the input configuration JSON file can be found by reading through the code, and seeing what fields are accessed from the config list. For example, grepping through the repo for config[[:

~/dev/cfa-epinow2-pipeline> rg --fixed-strings --glob "**/*.R" "config[["
R/pipeline.R
80:    job_id = config[["job_id"]],
81:    task_id = config[["task_id"]]
87:    config[["job_id"]],
89:    config[["task_id"]]
106:  cli::cli_alert_info("Using job id {.field {config[['job_id']]}}")
107:  cli::cli_alert_info("Using task id {.field {config[['task_id']]}}")
147:    data_path = config[["data"]][["path"]],
... (many more lines cut out)

tests/testthat/test-pipeline.R
40:  job_path <- file.path(output_dir, config[["job_id"]])
41:  task_path <- file.path(job_path, "tasks", config[["task_id"]])
50:          config[["task_id"]],
... (many more lines cut out)

This works, but could be easier.

Since this pipeline is the "source of truth" for all the input parameters, I suggest creating a Config class, where all of the fields are explicitly defined, all in one place. This would make it much simpler to keep both repos in sync, ideally moving config sync issues to dev time vs. run time.

The goal of this is not to lock down the config generator repo into this class only. In the very near future, we hope to have other models doing Rt estimation for us, and the symmetric difference of their parameters will probably not be the empty set. The goal is only to make the job ensuring the two are synced up simpler, and remove run time errors.

Requirements

A Config class (maybe R6 or S7?) with all of the fields defined in one place.
A function to convert the raw list data from a JSON file into a Config instance.
Use the resulting Config instance throughout the code.

Nice to haves

Method for serialization of an instance into JSON.
Method for creating a JSON schema based on the class definition. Would make syncing the repos even simpler.
Automatic validation of used parameters, letting in unused parameters that might be necessary for other models. Perhaps also warning the user that that those parameters are not being used.

Out of scope

Anything outside of replacing the current config list that is used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swap config to a class with static fields #98

Swap config to a class with static fields #98

natemcintosh commented Nov 21, 2024 •

edited

Loading

Swap config to a class with static fields #98

Swap config to a class with static fields #98

Comments

natemcintosh commented Nov 21, 2024 • edited Loading

Goal

Context

Requirements

Nice to haves

Out of scope

Related

natemcintosh commented Nov 21, 2024 •

edited

Loading