Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Config class with static fields #99

Merged
merged 32 commits into from
Dec 4, 2024
Merged

Conversation

natemcintosh
Copy link
Collaborator

@natemcintosh natemcintosh commented Nov 21, 2024

Goal

As stated in #98, the goal is to create a Config class with static fields for all the required input parameters for the model.

You can see all of the fields for the class by creating an empty instance of it:

r$> Config()
<CFAEpiNow2Pipeline::Config>
 @ job_id            : chr(0) 
 @ task_id           : chr(0) 
 @ min_reference_date: chr(0) 
 @ max_reference_date: chr(0) 
 @ report_date       : chr(0) 
 @ as_of_date        : chr(0) 
 @ disease           : chr(0) 
 @ geo_value         : chr(0) 
 @ geo_type          : chr(0) 
 @ seed              : int(0) 
 @ horizon           : int(0) 
 @ model             : chr "EpiNow2"
 @ config_version    : chr(0) 
 @ quantile_width    : num [1:2] 0.5 0.95
 @ data              : <CFAEpiNow2Pipeline::Data>
 .. @ path                  : chr(0) 
 .. @ blob_storage_container: chr(0) 
 .. @ report_date           : chr(0) 
 .. @ reference_date        : chr(0) 
 .. @ production_date       : chr(0) 
 @ priors            :List of 2
 .. $ rt:List of 2
 ..  ..$ mean: <S7_union>: <integer> or <double>
 ..  ..$ sd  : <S7_union>: <integer> or <double>
 .. $ gp:List of 1
 ..  ..$ alpha_sd: <S7_union>: <integer> or <double>
 @ parameters        : <CFAEpiNow2Pipeline::Parameters>
 .. @ generation_interval: <CFAEpiNow2Pipeline::GenerationInterval>
 .. .. @ path                  : chr(0) 
 .. .. @ blob_storage_container: chr(0) 
 .. @ delay_interval     : <CFAEpiNow2Pipeline::DelayInterval>
 .. .. @ path                  : chr(0) 
 .. .. @ blob_storage_container: chr(0) 
 .. @ right_truncation   : <CFAEpiNow2Pipeline::RightTruncation>
 .. .. @ path                  : chr(0) 
 .. .. @ blob_storage_container: chr(0) 
 @ sampler_opts      :List of 6
 .. $ cores        : <S7_base_class>: <integer>
 .. $ chains       : <S7_base_class>: <integer>
 .. $ iter_warmup  : <S7_base_class>: <integer>
 .. $ iter_sampling: <S7_base_class>: <integer>
 .. $ max_treedepth: <S7_base_class>: <integer>
 .. $ adapt_delta  : <S7_union>: <integer> or <double>
 @ exclusions        : <CFAEpiNow2Pipeline::Exclusions>
 .. @ path                  : chr(0) 
 .. @ blob_storage_container: chr(0)

Hopefully this should make syncing changes to the fields easier.

Gotchas

To read a JSON file into the Config class, I ran into an issue created by the nested classes (e.g. Data class inside the Config class), so this uses a hard-coded solution which maps the name of the field in the Config class to the nested class, e.g. list(data = Data). If we ever add more sub-classes, they will have to be added to this hard coded list. If any knows an easier way, I'd love to use that.

Defaults

Note that S7 adds the "empty value" of the type as the default, hence the many chr(0) and int(0). In a few places, I told S7 that a field could be a union of string and NULL, to deal with reading in NULL values from the JSON. I also added defaults of actual values for

  • model: "EpiNow2"
  • quantile_width: c(0.5 0.95)

Feel free to change / add / remove these as necessary.

Use of lists for sampler_opts and priors

For these two, I had originally created an S7 class each, but ran into issues that required changing the pipeline code more than desired. Instead, I went back to lists, but created default values that show all of the required keys, and set the values to be the expected types. As you can see above, it should make it easy to see what is required. Because the values are classes (e.g. S7::class_integer), and not actual values like 10, it should fail loudly if the default values are used.

mostly generated by o1-mini based on a sample JSON file. I went through by hand and fixed most of the obvious issues. Will return tomorrow.
@natemcintosh natemcintosh linked an issue Nov 21, 2024 that may be closed by this pull request
6 tasks
natemcintosh and others added 2 commits November 22, 2024 09:55
Also ran check(), but still a few issues there for sure. For some reason it thinks no changes were made to NEWS.md even though a new line was clearly added?
@natemcintosh natemcintosh changed the title First go Add Config class with static fields Nov 22, 2024
natemcintosh and others added 2 commits November 22, 2024 12:10
This is not intended as the final version, only for rough testing. The next step is to create a function for reading a JSON file into a Config class
Copy link

github-actions bot commented Nov 22, 2024

Thank you for your contribution @natemcintosh 🚀! Your pkgdown-site is ready for download 👉 here 👈!
(The artifact expires on 2024-12-10T22:21:44Z. You can re-generate it by re-running the workflow here.)

Copy link

codecov bot commented Nov 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Additional details and impacted files

📢 Thoughts on this report? Let us know!

natemcintosh and others added 5 commits November 22, 2024 16:45
Was a little worried about future proofing this as parameters change. It should do everything automatically, EXCEPT for the case where we add another nested class to Config. In that case, we'll have to add it to str2class in read_json_into_config(). If we forget, we should at least get a warning about an usued parameter
This took a little more finessing than expected. Particularly around using lists for the sampler opts and the priors, I went back to lists from S7 objects, and added a default list that shows the desired keys and the expected types.
@natemcintosh
Copy link
Collaborator Author

Looks like lintr is failing because the class names start with a capital. I had thought classes should start with a capital letter? Am willing to change this if they should be lower case.

@zsusswein
Copy link
Collaborator

Nah this is just a lintr default. What do the S7 docs say about naming conventions?

@zsusswein
Copy link
Collaborator

e.g., R6 docs follow the standard naming convention for generators and class objects but S3 uses a different style (non-encapsulated) and doesn't use that convention.

@natemcintosh natemcintosh marked this pull request as ready for review November 26, 2024 17:04
@natemcintosh
Copy link
Collaborator Author

natemcintosh commented Nov 26, 2024

The S7 docs seem to use UpperCamelCase in their examples. I couldn't find explicit instructions on naming anywhere though.

Copy link
Collaborator

@zsusswein zsusswein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super cool. A few inline questions as I try to get my head around S7.

Can you add tests to bring patch coverage up to 100%?

DESCRIPTION Show resolved Hide resolved
R/config.R Outdated Show resolved Hide resolved
R/config.R Outdated Show resolved Hide resolved
R/config.R Show resolved Hide resolved
@natemcintosh
Copy link
Collaborator Author

Can you add tests to bring patch coverage up to 100%?

😅 my brain conveniently forgot to write tests. Will add tests.

@natemcintosh
Copy link
Collaborator Author

Ok, things are looking pretty good. I am left confused by the warning from R CMD check however. There's some formatting difference between docstrings and the .Rd files? Searching for this online leads me down rabbit holes, and no direct answers.

@natemcintosh
Copy link
Collaborator Author

After reviewing the differences in a differ, they are indeed all formatting differences.
Screenshot 2024-11-29 at 3 54 10 PM
Screenshot 2024-11-29 at 3 55 22 PM
Screenshot 2024-11-29 at 3 56 34 PM

Ahhhhhhh

@natemcintosh
Copy link
Collaborator Author

Ok, it looks like running pre-commit run -a has to remove white space at the end of lines, and running devtools::check() changes Config.Rd back to what it used to be.

@zsusswein
Copy link
Collaborator

zsusswein commented Dec 2, 2024

I think you need to run Rscript -e "roxygen2::roxygenize()" to re-render the docs (that should fix the R CMD check problems) and then add https://github.com/CDCgov/cfa-gam-rt/blob/2f5f94a57e16b3c5e51213d01da562bddc2f05ec/.pre-commit-config.yaml#L29 to the hooks to prevent it from messing with the formatting.

@natemcintosh
Copy link
Collaborator Author

natemcintosh commented Dec 2, 2024

It looks like that line is already in the repo. Running roxygen2::roxygenize(); devtools::check() doesn't seem to fix the issue sadly

@zsusswein
Copy link
Collaborator

Ok noting that I can reproduce the R CMD check warning. Looking into it a bit.

@natemcintosh
Copy link
Collaborator Author

natemcintosh commented Dec 3, 2024

Looks like it just be that Roxygen2 does not yet support S7. Thanks @damonbayer for noticing this.

If this is the issue, then I think our choices are:

  1. Stick with S7, and merge over Roxygen's protests
  2. Re-write in e.g. S3, and don't have to deal with this
  3. Remove the default values for the list fields. This will make it harder to know what fields are required in them, but our documentation can supplement that until S7 and Roxygen get along better.

At this point, I think I would go for 3.

@damonbayer
Copy link
Contributor

damonbayer commented Dec 3, 2024

@natemcintosh you could manually modify the usage section of the documentation files.

@natemcintosh
Copy link
Collaborator Author

@natemcintosh you could manually modify the usage section of the documentation.

As in manually change whitespace differences?

@damonbayer
Copy link
Contributor

@natemcintosh you could manually modify the usage section of the documentation.

As in manually change whitespace differences?

I thought the problem was all of this bonus stuff in the usage section of the Rd file:

.data, validator = function (object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"), structure(list(class = "double", constructor_name =
"double", constructor = function (.data = numeric(0))
.data, validator = function
(object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"))), class = "S7_union"), sd = structure(list(classes =
list(structure(list(class = "integer", constructor_name = "integer", constructor =
function (.data = integer(0))
.data, validator = function (object)
{
if
(base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"), structure(list(class = "double", constructor_name =
"double", constructor = function (.data = numeric(0))
.data, validator = function
(object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"))), class = "S7_union")), gp = list(alpha_sd =
structure(list(classes = list(structure(list(class = "integer", constructor_name =
"integer", constructor = function (.data = integer(0))
.data, validator = function
(object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"), structure(list(class = "double", constructor_name =
"double", constructor = function (.data = numeric(0))
.data, validator = function
(object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"))), class = "S7_union"))),
parameters = Parameters(),
sampler_opts = list(cores = structure(list(class = "integer", constructor_name =
"integer", constructor = function (.data = integer(0))
.data, validator = function
(object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"), chains = structure(list(class = "integer",
constructor_name = "integer", constructor = function (.data = integer(0))
.data,
validator = function (object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"), iter_warmup = structure(list(class = "integer",
constructor_name = "integer", constructor = function (.data = integer(0))
.data,
validator = function (object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"), iter_sampling = structure(list(class = "integer",
constructor_name = "integer", constructor = function (.data = integer(0))
.data,
validator = function (object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"), max_treedepth = structure(list(class = "integer",
constructor_name = "integer", constructor = function (.data = integer(0))
.data,
validator = function (object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"), adapt_delta = structure(list(classes =
list(structure(list(class = "integer", constructor_name = "integer", constructor =
function (.data = integer(0))
.data, validator = function (object)
{
if
(base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"), structure(list(class = "double", constructor_name =
"double", constructor = function (.data = numeric(0))
.data, validator = function
(object)
{
if (base_class(object) != name) {
sprintf("Underlying data must be <\%s> not <\%s>", name, base_class(object))
}
}), class = "S7_base_class"))), class = "S7_union")),
exclusions = Exclusions()
)
}

My suggestion was to remove that by hand (or automate its removal). I could be misunderstanding the issue, though.

@natemcintosh
Copy link
Collaborator Author

After a little more playing around, I discovered that removing the default values for two lists inside Config resolved the issue. I'll add more detail on the issue for adding S7 support in Roxygen.

But I will also update the list of possible actions we can take above.

This was a merry goose chase! Until S7 is better supported in Roxygen, I think we should rely on the documentation to see what fields are required for these lists
Copy link
Collaborator

@kgostic kgostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked one question but otherwise LGTM. Please have Zack look too before merge!

😎

R/pipeline.R Show resolved Hide resolved
@zsusswein zsusswein merged commit a163788 into main Dec 4, 2024
11 checks passed
@zsusswein zsusswein deleted the nam-static-config-class branch December 4, 2024 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Swap config to a class with static fields
4 participants