-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor convert_input
to Perform tasks via helper function
#3338
base: develop
Are you sure you want to change the base?
Refactor convert_input
to Perform tasks via helper function
#3338
Conversation
Return from convert_input via a helper function Update corresponding test files and add tests to ensure do_conversions isn't affected by current applied changes Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
base/db/R/convert_input.R
Outdated
result_sizes <- checked.missing.files$result_sizes; | ||
outlist <- checked.missing.files$outlist; | ||
existing.input <- checked.missing.files$existing.input; | ||
existing.dbfile <- checked.missing.files$existing.dbfile; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is their any other way to unwrap these variables in a single context line. For example, inheriting the above list and utilising the list$var
syntax in-place. A lot of variable operations(assignment and reassignment) would consume extra memory in the heap. Should we also implement a Garbage Collection (gc()
) or is it automatically applied in R ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than try to unwrap them in one line I'd consider whether they need to be assigned separate names at all -- e.g. can all places below that refer to result_sizes
refer to checked.missing.files$result_sizes
instead? If that's what you're suggesting by "utilising the list$var syntax in-place", then yes, I like that approach.
That said these objects are all very small and the runtime/memory overhead from reassignment will be negligible and R's reference-counting garbage collection will take care of the cleanup for us. Happily though, the same optimization gives (what I think is) a substantial improvement in code clarity, so I endorse it on those grounds.
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
#' @return list of machine, input, and dbfile records | ||
#' @author Betsy Cowdery, Michael Dietze, Ankur Desai, Tony Gardella, Luke Dramko | ||
|
||
get.machine.info <- function(host, input.args, input.id = NULL, con = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well use underscore instead of dot in the name here to avoid possible confusion with S3 methods. These days I only use dot where it's needed for consistency with existing code, and the db package is already inconsistent about it 😉
get.machine.info <- function(host, input.args, input.id = NULL, con = NULL) { | |
get_machine_info <- function(host, input.args, input.id = NULL, con = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to change the usages below to match, naturally
#' @param con database connection | ||
#' @return list of machine host and machine information | ||
#' @author Abhinav Pandey | ||
get.machine.host <- function(host, con = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- rename as discussed for get.machine.info above
- db.query will fail if con = NULL, might as well make it mandatory by providing no default value
get.machine.host <- function(host, con = NULL) { | |
get_machine_host <- function(host, con) { |
return(NULL) | ||
} | ||
|
||
if (missing(input.id) || is.na(input.id) || is.null(input.id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the missing()
here because get.machine.info now defaults it to NULL
if (missing(input.id) || is.na(input.id) || is.null(input.id)) { | |
if (is.na(input.id) || is.null(input.id)) { |
|
||
if (nrow(machine) == 0) { | ||
PEcAn.logger::logger.error("machine not found", host$name) | ||
return(NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that returning from get.machine.info could have different behavior than returning from convert.input. I'll need to read to the end to know whether that difference matters, but mentioning it now before I forget
# Print the structure of `res` for debugging | ||
str(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Print the structure of `res` for debugging | |
str(res) |
# This is to tell input.insert if we are writing ensembles | ||
# Why does it need it? Because it checks for inputs with the same time period, site, and machine | ||
# and if it returns something it does not insert anymore, but for ensembles, it needs to bypass this condition | ||
ens.flag <- if (!is.null(ensemble) | is.null(ensemble_name)) TRUE else FALSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ens.flag <- if (!is.null(ensemble) | is.null(ensemble_name)) TRUE else FALSE | |
ens.flag <- (!is.null(ensemble) || is.null(ensemble_name)) |
} | ||
} # End for loop | ||
|
||
successful <- TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like successful
will need to be passed back to convert.input
somehow -- it uses it to determine what files to clean up when it exits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a quick skim it looks like none of the cases that can set successful = FALSE
occur inside add.database.entries
. Maybe a possible solution is to remove them here and have convert.input set successful
after it calls add.database.entries
and before returning?
mocked_res <- mockery::mock(data.frame(file = c("A", "B"), file_size = c(100, 200), missing = c(FALSE, FALSE), empty = c(FALSE, FALSE))) | ||
mockery::stub(check_missing_files, "purrr::map_dfr", mocked_res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what happens inside map_dfr is part of what we want to be testing here -- might be better to stub out file.size
since that's the only place that would be looking at the filesystem.
mocked_res <- mockery::mock(data.frame(file = c("A", "B"), file_size = c(100, 200), missing = c(FALSE, FALSE), empty = c(FALSE, FALSE))) | |
mockery::stub(check_missing_files, "purrr::map_dfr", mocked_res) | |
mocked_size <- mockery::mock(100,200) | |
mockery::stub(check_missing_files, "file.size", mocked_res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to include cases with missing and empty files and confirm that it produces the expected error
@@ -0,0 +1,19 @@ | |||
test_that("`check_missing_files()` able to return correct missing files", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filename should match function name (so use _
not .
)
Come to think of it, same for the files containing the function definitions
existing.input = list(data.frame(file = character(0))), | ||
existing.dbfile = list(data.frame(file = character(0))) | ||
)) | ||
mockery::stub(convert_input, "add.database.entries", list(input.id = 1, dbfile.id = 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a lot of mocks and stubs, but I don't see any that aren't needed. Hopefully as we break up the function further we'll be able to simplify these.
Description
Following changes were performed :
run.meta.analysis
Motivation and Context
The main motive for these changes is to simplify convert_input by trying to break some of its components and branch it to other functions.
This PR may fix a task within #3307
Review Time Estimate
Types of changes
Checklist: