Skip to content

Commit

Permalink
Add Config class with static fields (#99)
Browse files Browse the repository at this point in the history
* First go

mostly generated by o1-mini based on a sample JSON file. I went through by hand and fixed most of the obvious issues. Will return tomorrow.

* fixed up docs and types

Also ran check(), but still a few issues there for sure. For some reason it thinks no changes were made to NEWS.md even though a new line was clearly added?

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* a rough first attempt at using Config in run_pipeline

This is not intended as the final version, only for rough testing. The next step is to create a function for reading a JSON file into a Config class

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Can now read JSON into a Config!!

Was a little worried about future proofing this as parameters change. It should do everything automatically, EXCEPT for the case where we add another nested class to Config. In that case, we'll have to add it to str2class in read_json_into_config(). If we forget, we should at least get a warning about an usued parameter

* small basic changes

* Config class is now operational

This took a little more finessing than expected. Particularly around using lists for the sampler opts and the priors, I went back to lists from S7 objects, and added a default list that shows the desired keys and the expected types.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use nice pluralization

* config generator puts as_of_date inside parameters

* fix placement of as_of_date

* use super type with field inheritance.

Also fix placement of as_of_date in the Config object itself

* prep for a test on config from config generator

* changes from document()

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add identical row, but with covid as disease

* all tests now passing

* forgot to add the test file

* forgot to change path to parameters for test input

* now error out on invalid config files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tell lintr to ignore UpperCamelCase classes

* doc formatting changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try ignoring Rd line endings

* remove the default list value

This was a merry goose chase! Until S7 is better supported in Roxygen, I think we should rely on the documentation to see what fields are required for these lists

* forgot to update comments

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
natemcintosh and pre-commit-ci[bot] authored Dec 4, 2024
1 parent 54fec40 commit a163788
Show file tree
Hide file tree
Showing 21 changed files with 641 additions and 67 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ repos:
- id: mixed-line-ending
args: ['--fix=lf']
- id: trailing-whitespace
exclude: 'tests/testthat/_snaps/'
exclude: '(tests/testthat/_snaps/)|(\.Rd)'
- repo: https://github.com/pre-commit-ci/pre-commit-ci-config
rev: v1.6.1
hooks:
Expand Down
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ Imports:
jsonlite,
rlang,
rstan,
S7,
tidybayes
Additional_repositories:
https://stan-dev.r-universe.dev
Expand Down
8 changes: 8 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Generated by roxygen2: do not edit by hand

export(Config)
export(Data)
export(DelayInterval)
export(Exclusions)
export(GenerationInterval)
export(Parameters)
export(RightTruncation)
export(apply_exclusions)
export(download_from_azure_blob)
export(execute_model_logic)
Expand All @@ -18,5 +25,6 @@ export(read_data)
export(read_disease_parameters)
export(read_exclusions)
export(read_interval_pmf)
export(read_json_into_config)
export(write_model_outputs)
export(write_output_dir_structure)
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# CFAEpiNow2Pipeline (development version)

* Creating a Config class to make syncing configuration differences easier.
* Add a JSON reader for the Config class.
* Use the Config class throughout the pipeline.
* Adding a script to setup the Azure Batch Pool to link the container.
* Adding new action to post a comment on PRs with a link to the rendered pkgdown site.
* Add inner pipeline responsible for running the model fitting process
Expand Down
245 changes: 245 additions & 0 deletions R/config.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
character_or_null <- S7::new_union(S7::class_character, NULL)

#' Exclusions Class
#'
#' Represents exclusion criteria for the pipeline.
#'
#' @param path A string specifying the path to a CSV file containing exclusion
#' data. It should include at least the columns: `reference_date`,
#' `report_date`, ' `state_abb`, `disease`.
#' @param blob_storage_container Optional. The name of the blob storage
#' container to get it from. If NULL, will look locally.
#' @export
Exclusions <- S7::new_class( # nolint: object_name_linter
"Exclusions",
properties = list(
path = character_or_null,
blob_storage_container = character_or_null
)
)

#' Interval Class
#'
#' Represents a generic interval. Meant to be subclassed.
#'
#' @param path A string specifying the path to the generation interval CSV file.
#' @param blob_storage_container Optional. The name of the blob storage
#' container to get it from. If NULL, will look locally.
#' @name Interval
Interval <- S7::new_class( # nolint: object_name_linter
"Interval",
properties = list(
path = character_or_null,
blob_storage_container = character_or_null
)
)

#' GenerationInterval Class
#'
#' Represents the generation interval parameters.
#' @rdname Interval
#' @export
GenerationInterval <- S7::new_class( # nolint: object_name_linter
"GenerationInterval",
parent = Interval,
)

#' DelayInterval Class
#'
#' Represents the delay interval parameters.
#' @rdname Interval
#' @export
DelayInterval <- S7::new_class( # nolint: object_name_linter
"DelayInterval",
parent = Interval,
)

#' RightTruncation Class
#'
#' Represents the right truncation parameters.
#' @rdname Interval
#' @export
RightTruncation <- S7::new_class( # nolint: object_name_linter
"RightTruncation",
parent = Interval,
)

#' Parameters Class
#'
#' Holds all parameter-related configurations for the pipeline.
#' @param as_of_date A string representing the as-of date. Formatted as
#' "YYYY-MM-DD".
#' @param generation_interval An instance of `GenerationInterval` class.
#' @param delay_interval An instance of `DelayInterval` class.
#' @param right_truncation An instance of `RightTruncation` class.
#' @export
Parameters <- S7::new_class( # nolint: object_name_linter
"Parameters",
properties = list(
as_of_date = S7::class_character,
generation_interval = S7::S7_class(GenerationInterval()),
delay_interval = S7::S7_class(DelayInterval()),
right_truncation = S7::S7_class(RightTruncation())
)
)

#' Data Class
#'
#' Represents the data-related configurations.
#'
#' @param path A string specifying the path to the data Parquet file.
#' @param blob_storage_container Optional. The name of the blob storage
#' container to which the data file will be uploaded. If NULL, no upload will
#' occur.
#' @param report_date A list of strings representing report dates.
#' @param reference_date A list of strings representing reference dates.
#' @param production_date A list of strings representing production dates.
#' @export
Data <- S7::new_class( # nolint: object_name_linter
"Data",
properties = list(
path = S7::class_character,
blob_storage_container = character_or_null,
report_date = S7::class_character,
reference_date = S7::class_character,
production_date = S7::class_character
)
)

#' Config Class
#'
#' Represents the complete configuration for the pipeline.
#'
#' @param job_id A string specifying the job.
#' @param task_id A string specifying the task.
#' @param min_reference_date A string representing the minimum reference
#' date. Formatted as "YYYY-MM-DD".
#' @param max_reference_date A string representing the maximum reference
#' date. Formatted as "YYYY-MM-DD".
#' @param disease A string specifying the disease being modeled.
#' @param geo_value A string specifying the geographic value, usually a state.
#' @param geo_type A string specifying the geographic type, usually "state".
#' @param data An instance of `Data` class containing data configurations.
#' @param seed An integer for setting the random seed.
#' @param horizon An integer specifying the forecasting horizon.
#' @param priors A list of lists. The first level should contain the key `rt`
#' with elements `mean` and `sd` and the key `gp` with element `alpha_sd`.
#' @param parameters An instance of `Parameters` class containing parameter
#' configurations.
#' @param sampler_opts A list. The Stan sampler options to be passed through
#' EpiNow2. It has required keys: `cores`, `chains`, `iter_warmup`,
#' `iter_sampling`, `max_treedepth`, and `adapt_delta`.
#' @param exclusions An instance of `Exclusions` class containing exclusion
#' criteria.
#' @param config_version A numeric value specifying the configuration version.
#' @param quantile_width A vector of numeric values representing the desired
#' quantiles.
#' @param model A string specifying the model to be used.
#' @param report_date A string representing the report date. Formatted as
#' "YYYY-MM-DD".
#' @export
Config <- S7::new_class( # nolint: object_name_linter
"Config",
properties = list(
job_id = S7::class_character,
task_id = S7::class_character,
min_reference_date = S7::class_character,
max_reference_date = S7::class_character,
report_date = S7::class_character,
disease = S7::class_character,
geo_value = S7::class_character,
geo_type = S7::class_character,
seed = S7::class_integer,
horizon = S7::class_integer,
model = S7::new_property(S7::class_character, default = "EpiNow2"),
config_version = S7::class_character,
quantile_width = S7::new_property(S7::class_vector, default = c(0.5, 0.95)),
data = S7::S7_class(Data()),
# Using a list instead of an S7 object, because EpiNow2 expects a list, and
# because it reduces changes to the pipeline code.
# Would add default values, but Roxygen isn't happy about them yet.
priors = S7::class_list,
parameters = S7::S7_class(Parameters()),
# Using a list instead of an S7 object, because stan expects a list, and
# because it reduces changes to the pipeline code.
# Would add default values, but Roxygen isn't happy about them yet.
sampler_opts = S7::class_list,
exclusions = S7::S7_class(Exclusions())
)
)

#' Read JSON Configuration into Config Object
#'
#' Reads a JSON file from the specified path and converts it into a `Config`
#' object.
#'
#' @param config_path A string specifying the path to the JSON configuration
#' file.
#' @param optional_fields A list of strings specifying the optional fields in
#' the JSON file. If a field is not present in the JSON file, and is marked as
#' optional, it will be set to either the empty type (e.g. `chr(0)`), or NULL.
#' If a field is not present in the JSON file, and is not marked as optional, an
#' error will be thrown.
#' @return An instance of the `Config` class populated with the data from the
#' JSON file.
#' @export
read_json_into_config <- function(config_path, optional_fields) {
# First, our hard coded, flattened, map from strings to Classes. If any new
# subclasses are added above, they will also need to be added here. If we
# create a more automated way to do this, we can remove this.
str2class <- list(
data = Data,
parameters = Parameters,
exclusions = Exclusions,
generation_interval = GenerationInterval,
delay_interval = DelayInterval,
right_truncation = RightTruncation
)

# First, read the JSON file into a list
raw_input <- jsonlite::read_json(config_path, simplifyVector = TRUE)

# Check what top level properties were not in the raw input
missing_properties <- setdiff(S7::prop_names(Config()), names(raw_input))
# Remove any optional fields from the missing properties, give info message
# about what is being given a default arg.
not_need_but_missing <- intersect(optional_fields, missing_properties)
if (length(not_need_but_missing) > 0) {
cli::cli_alert_info(
"Optional field{?s} not in config file: {.var {not_need_but_missing}}"
)
}
missing_properties <- setdiff(missing_properties, optional_fields)
# Error out if missing any fields
if (length(missing_properties) > 0) {
cli::cli_abort(c(
"Propert{?y/ies} not in the config file: {.var {missing_properties}}"
))
}

inner <- function(raw_data, class_to_fill) {
# For each property, check if it is a regular value, or an S7 object.
# If it is an S7 object, we need to create an instance of that class, and do
# all the same checks for properties that we did above. If not, just add it
# to the config object.
config <- class_to_fill()
for (prop_name in names(raw_data)) {
if (prop_name %in% names(str2class)) {
# This is a class, call inner() again to recursively build it.
S7::prop(config, prop_name) <- inner(
raw_data[[prop_name]], str2class[[prop_name]]
)
} else if (!(prop_name %in% S7::prop_names(class_to_fill()))) {
cli::cli_alert_info(
"No Config field matching {.var {prop_name}}. Not using."
)
} else {
# Else set it directly
S7::prop(config, prop_name) <- raw_data[[prop_name]]
}
}
config
}

inner(raw_input, Config)
}
Loading

0 comments on commit a163788

Please sign in to comment.