generated from CDCgov/template
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Config class with static fields (#99)
* First go mostly generated by o1-mini based on a sample JSON file. I went through by hand and fixed most of the obvious issues. Will return tomorrow. * fixed up docs and types Also ran check(), but still a few issues there for sure. For some reason it thinks no changes were made to NEWS.md even though a new line was clearly added? * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * a rough first attempt at using Config in run_pipeline This is not intended as the final version, only for rough testing. The next step is to create a function for reading a JSON file into a Config class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Can now read JSON into a Config!! Was a little worried about future proofing this as parameters change. It should do everything automatically, EXCEPT for the case where we add another nested class to Config. In that case, we'll have to add it to str2class in read_json_into_config(). If we forget, we should at least get a warning about an usued parameter * small basic changes * Config class is now operational This took a little more finessing than expected. Particularly around using lists for the sampler opts and the priors, I went back to lists from S7 objects, and added a default list that shows the desired keys and the expected types. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use nice pluralization * config generator puts as_of_date inside parameters * fix placement of as_of_date * use super type with field inheritance. Also fix placement of as_of_date in the Config object itself * prep for a test on config from config generator * changes from document() * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add identical row, but with covid as disease * all tests now passing * forgot to add the test file * forgot to change path to parameters for test input * now error out on invalid config files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tell lintr to ignore UpperCamelCase classes * doc formatting changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try ignoring Rd line endings * remove the default list value This was a merry goose chase! Until S7 is better supported in Roxygen, I think we should rely on the documentation to see what fields are required for these lists * forgot to update comments --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
54fec40
commit a163788
Showing
21 changed files
with
641 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,245 @@ | ||
character_or_null <- S7::new_union(S7::class_character, NULL) | ||
|
||
#' Exclusions Class | ||
#' | ||
#' Represents exclusion criteria for the pipeline. | ||
#' | ||
#' @param path A string specifying the path to a CSV file containing exclusion | ||
#' data. It should include at least the columns: `reference_date`, | ||
#' `report_date`, ' `state_abb`, `disease`. | ||
#' @param blob_storage_container Optional. The name of the blob storage | ||
#' container to get it from. If NULL, will look locally. | ||
#' @export | ||
Exclusions <- S7::new_class( # nolint: object_name_linter | ||
"Exclusions", | ||
properties = list( | ||
path = character_or_null, | ||
blob_storage_container = character_or_null | ||
) | ||
) | ||
|
||
#' Interval Class | ||
#' | ||
#' Represents a generic interval. Meant to be subclassed. | ||
#' | ||
#' @param path A string specifying the path to the generation interval CSV file. | ||
#' @param blob_storage_container Optional. The name of the blob storage | ||
#' container to get it from. If NULL, will look locally. | ||
#' @name Interval | ||
Interval <- S7::new_class( # nolint: object_name_linter | ||
"Interval", | ||
properties = list( | ||
path = character_or_null, | ||
blob_storage_container = character_or_null | ||
) | ||
) | ||
|
||
#' GenerationInterval Class | ||
#' | ||
#' Represents the generation interval parameters. | ||
#' @rdname Interval | ||
#' @export | ||
GenerationInterval <- S7::new_class( # nolint: object_name_linter | ||
"GenerationInterval", | ||
parent = Interval, | ||
) | ||
|
||
#' DelayInterval Class | ||
#' | ||
#' Represents the delay interval parameters. | ||
#' @rdname Interval | ||
#' @export | ||
DelayInterval <- S7::new_class( # nolint: object_name_linter | ||
"DelayInterval", | ||
parent = Interval, | ||
) | ||
|
||
#' RightTruncation Class | ||
#' | ||
#' Represents the right truncation parameters. | ||
#' @rdname Interval | ||
#' @export | ||
RightTruncation <- S7::new_class( # nolint: object_name_linter | ||
"RightTruncation", | ||
parent = Interval, | ||
) | ||
|
||
#' Parameters Class | ||
#' | ||
#' Holds all parameter-related configurations for the pipeline. | ||
#' @param as_of_date A string representing the as-of date. Formatted as | ||
#' "YYYY-MM-DD". | ||
#' @param generation_interval An instance of `GenerationInterval` class. | ||
#' @param delay_interval An instance of `DelayInterval` class. | ||
#' @param right_truncation An instance of `RightTruncation` class. | ||
#' @export | ||
Parameters <- S7::new_class( # nolint: object_name_linter | ||
"Parameters", | ||
properties = list( | ||
as_of_date = S7::class_character, | ||
generation_interval = S7::S7_class(GenerationInterval()), | ||
delay_interval = S7::S7_class(DelayInterval()), | ||
right_truncation = S7::S7_class(RightTruncation()) | ||
) | ||
) | ||
|
||
#' Data Class | ||
#' | ||
#' Represents the data-related configurations. | ||
#' | ||
#' @param path A string specifying the path to the data Parquet file. | ||
#' @param blob_storage_container Optional. The name of the blob storage | ||
#' container to which the data file will be uploaded. If NULL, no upload will | ||
#' occur. | ||
#' @param report_date A list of strings representing report dates. | ||
#' @param reference_date A list of strings representing reference dates. | ||
#' @param production_date A list of strings representing production dates. | ||
#' @export | ||
Data <- S7::new_class( # nolint: object_name_linter | ||
"Data", | ||
properties = list( | ||
path = S7::class_character, | ||
blob_storage_container = character_or_null, | ||
report_date = S7::class_character, | ||
reference_date = S7::class_character, | ||
production_date = S7::class_character | ||
) | ||
) | ||
|
||
#' Config Class | ||
#' | ||
#' Represents the complete configuration for the pipeline. | ||
#' | ||
#' @param job_id A string specifying the job. | ||
#' @param task_id A string specifying the task. | ||
#' @param min_reference_date A string representing the minimum reference | ||
#' date. Formatted as "YYYY-MM-DD". | ||
#' @param max_reference_date A string representing the maximum reference | ||
#' date. Formatted as "YYYY-MM-DD". | ||
#' @param disease A string specifying the disease being modeled. | ||
#' @param geo_value A string specifying the geographic value, usually a state. | ||
#' @param geo_type A string specifying the geographic type, usually "state". | ||
#' @param data An instance of `Data` class containing data configurations. | ||
#' @param seed An integer for setting the random seed. | ||
#' @param horizon An integer specifying the forecasting horizon. | ||
#' @param priors A list of lists. The first level should contain the key `rt` | ||
#' with elements `mean` and `sd` and the key `gp` with element `alpha_sd`. | ||
#' @param parameters An instance of `Parameters` class containing parameter | ||
#' configurations. | ||
#' @param sampler_opts A list. The Stan sampler options to be passed through | ||
#' EpiNow2. It has required keys: `cores`, `chains`, `iter_warmup`, | ||
#' `iter_sampling`, `max_treedepth`, and `adapt_delta`. | ||
#' @param exclusions An instance of `Exclusions` class containing exclusion | ||
#' criteria. | ||
#' @param config_version A numeric value specifying the configuration version. | ||
#' @param quantile_width A vector of numeric values representing the desired | ||
#' quantiles. | ||
#' @param model A string specifying the model to be used. | ||
#' @param report_date A string representing the report date. Formatted as | ||
#' "YYYY-MM-DD". | ||
#' @export | ||
Config <- S7::new_class( # nolint: object_name_linter | ||
"Config", | ||
properties = list( | ||
job_id = S7::class_character, | ||
task_id = S7::class_character, | ||
min_reference_date = S7::class_character, | ||
max_reference_date = S7::class_character, | ||
report_date = S7::class_character, | ||
disease = S7::class_character, | ||
geo_value = S7::class_character, | ||
geo_type = S7::class_character, | ||
seed = S7::class_integer, | ||
horizon = S7::class_integer, | ||
model = S7::new_property(S7::class_character, default = "EpiNow2"), | ||
config_version = S7::class_character, | ||
quantile_width = S7::new_property(S7::class_vector, default = c(0.5, 0.95)), | ||
data = S7::S7_class(Data()), | ||
# Using a list instead of an S7 object, because EpiNow2 expects a list, and | ||
# because it reduces changes to the pipeline code. | ||
# Would add default values, but Roxygen isn't happy about them yet. | ||
priors = S7::class_list, | ||
parameters = S7::S7_class(Parameters()), | ||
# Using a list instead of an S7 object, because stan expects a list, and | ||
# because it reduces changes to the pipeline code. | ||
# Would add default values, but Roxygen isn't happy about them yet. | ||
sampler_opts = S7::class_list, | ||
exclusions = S7::S7_class(Exclusions()) | ||
) | ||
) | ||
|
||
#' Read JSON Configuration into Config Object | ||
#' | ||
#' Reads a JSON file from the specified path and converts it into a `Config` | ||
#' object. | ||
#' | ||
#' @param config_path A string specifying the path to the JSON configuration | ||
#' file. | ||
#' @param optional_fields A list of strings specifying the optional fields in | ||
#' the JSON file. If a field is not present in the JSON file, and is marked as | ||
#' optional, it will be set to either the empty type (e.g. `chr(0)`), or NULL. | ||
#' If a field is not present in the JSON file, and is not marked as optional, an | ||
#' error will be thrown. | ||
#' @return An instance of the `Config` class populated with the data from the | ||
#' JSON file. | ||
#' @export | ||
read_json_into_config <- function(config_path, optional_fields) { | ||
# First, our hard coded, flattened, map from strings to Classes. If any new | ||
# subclasses are added above, they will also need to be added here. If we | ||
# create a more automated way to do this, we can remove this. | ||
str2class <- list( | ||
data = Data, | ||
parameters = Parameters, | ||
exclusions = Exclusions, | ||
generation_interval = GenerationInterval, | ||
delay_interval = DelayInterval, | ||
right_truncation = RightTruncation | ||
) | ||
|
||
# First, read the JSON file into a list | ||
raw_input <- jsonlite::read_json(config_path, simplifyVector = TRUE) | ||
|
||
# Check what top level properties were not in the raw input | ||
missing_properties <- setdiff(S7::prop_names(Config()), names(raw_input)) | ||
# Remove any optional fields from the missing properties, give info message | ||
# about what is being given a default arg. | ||
not_need_but_missing <- intersect(optional_fields, missing_properties) | ||
if (length(not_need_but_missing) > 0) { | ||
cli::cli_alert_info( | ||
"Optional field{?s} not in config file: {.var {not_need_but_missing}}" | ||
) | ||
} | ||
missing_properties <- setdiff(missing_properties, optional_fields) | ||
# Error out if missing any fields | ||
if (length(missing_properties) > 0) { | ||
cli::cli_abort(c( | ||
"Propert{?y/ies} not in the config file: {.var {missing_properties}}" | ||
)) | ||
} | ||
|
||
inner <- function(raw_data, class_to_fill) { | ||
# For each property, check if it is a regular value, or an S7 object. | ||
# If it is an S7 object, we need to create an instance of that class, and do | ||
# all the same checks for properties that we did above. If not, just add it | ||
# to the config object. | ||
config <- class_to_fill() | ||
for (prop_name in names(raw_data)) { | ||
if (prop_name %in% names(str2class)) { | ||
# This is a class, call inner() again to recursively build it. | ||
S7::prop(config, prop_name) <- inner( | ||
raw_data[[prop_name]], str2class[[prop_name]] | ||
) | ||
} else if (!(prop_name %in% S7::prop_names(class_to_fill()))) { | ||
cli::cli_alert_info( | ||
"No Config field matching {.var {prop_name}}. Not using." | ||
) | ||
} else { | ||
# Else set it directly | ||
S7::prop(config, prop_name) <- raw_data[[prop_name]] | ||
} | ||
} | ||
config | ||
} | ||
|
||
inner(raw_input, Config) | ||
} |
Oops, something went wrong.