Refactor `convert_input` to Perform tasks via helper function #3338

Sweetdevil144 · 2024-07-18T07:46:07Z

Description

Following changes were performed :

Shift functions to check for missing files
Return from convert_input via a helper function
Refactor extra variables in run.meta.analysis
Update corresponding test files and add tests to ensure do_conversions isn't affected by current applied changes

Motivation and Context

The main motive for these changes is to simplify convert_input by trying to break some of its components and branch it to other functions.

This PR may fix a task within #3307

Review Time Estimate

Immediately
Within one week
When possible

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation.
My name is in the list of CITATION.cff
I have updated the CHANGELOG.md.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

Return from convert_input via a helper function Update corresponding test files and add tests to ensure do_conversions isn't affected by current applied changes Signed-off-by: Abhinav Pandey <[email protected]>

Signed-off-by: Abhinav Pandey <[email protected]>

Sweetdevil144 · 2024-07-19T07:22:10Z

base/db/R/convert_input.R

+ result_sizes <- checked.missing.files$result_sizes;
+ outlist <- checked.missing.files$outlist;
+ existing.input <- checked.missing.files$existing.input;
+ existing.dbfile <- checked.missing.files$existing.dbfile;


Is their any other way to unwrap these variables in a single context line. For example, inheriting the above list and utilising the list$var syntax in-place. A lot of variable operations(assignment and reassignment) would consume extra memory in the heap. Should we also implement a Garbage Collection (gc()) or is it automatically applied in R ?

Rather than try to unwrap them in one line I'd consider whether they need to be assigned separate names at all -- e.g. can all places below that refer to result_sizes refer to checked.missing.files$result_sizes instead? If that's what you're suggesting by "utilising the list$var syntax in-place", then yes, I like that approach.

That said these objects are all very small and the runtime/memory overhead from reassignment will be negligible and R's reference-counting garbage collection will take care of the cleanup for us. Happily though, the same optimization gives (what I think is) a substantial improvement in code clarity, so I endorse it on those grounds.

base/db/R/convert_input.R

base/db/R/check.missing.files.R

Signed-off-by: Abhinav Pandey <[email protected]>

infotroph · 2024-09-21T04:34:04Z

base/db/R/get.machine.info.R

+#' @return list of machine, input, and dbfile records
+#' @author Betsy Cowdery, Michael Dietze, Ankur Desai, Tony Gardella, Luke Dramko
+
+get.machine.info <- function(host, input.args, input.id = NULL, con = NULL) {


Might as well use underscore instead of dot in the name here to avoid possible confusion with S3 methods. These days I only use dot where it's needed for consistency with existing code, and the db package is already inconsistent about it 😉

Suggested change

get.machine.info <- function(host, input.args, input.id = NULL, con = NULL) {

get_machine_info <- function(host, input.args, input.id = NULL, con = NULL) {

Need to change the usages below to match, naturally

infotroph · 2024-09-21T04:35:50Z

base/db/R/get.machine.info.R

+#' @param con database connection
+#' @return list of machine host and machine information
+#' @author Abhinav Pandey
+get.machine.host <- function(host, con = NULL) {


rename as discussed for get.machine.info above

db.query will fail if con = NULL, might as well make it mandatory by providing no default value

Suggested change

get.machine.host <- function(host, con = NULL) {

get_machine_host <- function(host, con) {

infotroph · 2024-09-21T05:03:12Z

base/db/R/get.machine.info.R

+ return(NULL)
+ }
+
+ if (missing(input.id) || is.na(input.id) || is.null(input.id)) {


Don't need the missing() here because get.machine.info now defaults it to NULL

Suggested change

if (missing(input.id) || is.na(input.id) || is.null(input.id)) {

if (is.na(input.id) || is.null(input.id)) {

infotroph · 2024-09-21T05:05:33Z

base/db/R/get.machine.info.R

+
+ if (nrow(machine) == 0) {
+ PEcAn.logger::logger.error("machine not found", host$name)
+ return(NULL)


Note that returning from get.machine.info could have different behavior than returning from convert.input. I'll need to read to the end to know whether that difference matters, but mentioning it now before I forget

infotroph · 2024-09-21T05:12:04Z

base/db/tests/testthat/test.check.missing.files.R

+ # Print the structure of `res` for debugging
+ str(res)


Suggested change

# Print the structure of `res` for debugging

str(res)

infotroph · 2024-09-21T06:37:42Z

base/db/R/add.database.entries.R

+ # This is to tell input.insert if we are writing ensembles
+ # Why does it need it? Because it checks for inputs with the same time period, site, and machine
+ # and if it returns something it does not insert anymore, but for ensembles, it needs to bypass this condition
+ ens.flag <- if (!is.null(ensemble) | is.null(ensemble_name)) TRUE else FALSE


Suggested change

ens.flag <- if (!is.null(ensemble) | is.null(ensemble_name)) TRUE else FALSE

ens.flag <- (!is.null(ensemble) || is.null(ensemble_name))

infotroph · 2024-09-21T06:42:12Z

base/db/R/add.database.entries.R

+ }
+ } # End for loop
+
+ successful <- TRUE


Looks like successful will need to be passed back to convert.input somehow -- it uses it to determine what files to clean up when it exits

On a quick skim it looks like none of the cases that can set successful = FALSE occur inside add.database.entries. Maybe a possible solution is to remove them here and have convert.input set successful after it calls add.database.entries and before returning?

infotroph · 2024-09-21T06:59:52Z

base/db/tests/testthat/test.check.missing.files.R

+ mocked_res <- mockery::mock(data.frame(file = c("A", "B"), file_size = c(100, 200), missing = c(FALSE, FALSE), empty = c(FALSE, FALSE)))
+ mockery::stub(check_missing_files, "purrr::map_dfr", mocked_res)


I think what happens inside map_dfr is part of what we want to be testing here -- might be better to stub out file.size since that's the only place that would be looking at the filesystem.

Suggested change

mocked_res <- mockery::mock(data.frame(file = c("A", "B"), file_size = c(100, 200), missing = c(FALSE, FALSE), empty = c(FALSE, FALSE)))

mockery::stub(check_missing_files, "purrr::map_dfr", mocked_res)

mocked_size <- mockery::mock(100,200)

mockery::stub(check_missing_files, "file.size", mocked_res)

It would also be good to include cases with missing and empty files and confirm that it produces the expected error

infotroph · 2024-09-21T07:04:48Z

base/db/tests/testthat/test.check.missing.files.R

@@ -0,0 +1,19 @@
+test_that("`check_missing_files()` able to return correct missing files", {


Filename should match function name (so use _ not .)
Come to think of it, same for the files containing the function definitions

infotroph · 2024-09-21T07:09:08Z

base/db/tests/testthat/test.convert_input.R

+ existing.input = list(data.frame(file = character(0))),
+ existing.dbfile = list(data.frame(file = character(0)))
+ ))
+ mockery::stub(convert_input, "add.database.entries", list(input.id = 1, dbfile.id = 1))


This feels like a lot of mocks and stubs, but I don't see any that aren't needed. Hopefully as we break up the function further we'll be able to simplify these.

Shift functions to check for missing files

614b8f9

Return from convert_input via a helper function Update corresponding test files and add tests to ensure do_conversions isn't affected by current applied changes Signed-off-by: Abhinav Pandey <[email protected]>

github-actions bot added Tests Base labels Jul 18, 2024

Update CHANGELOG

838af61

Signed-off-by: Abhinav Pandey <[email protected]>

Sweetdevil144 commented Jul 19, 2024

View reviewed changes

infotroph reviewed Jul 23, 2024

View reviewed changes

base/db/R/convert_input.R Show resolved Hide resolved

infotroph reviewed Jul 23, 2024

View reviewed changes

base/db/R/check.missing.files.R Outdated Show resolved Hide resolved

Sweetdevil144 added 7 commits July 25, 2024 14:09

Merge branch 'develop' into gsoc/convert-input

d5e8d24

Remove unutilized variables from convert_input

f22b962

Signed-off-by: Abhinav Pandey <[email protected]>

Update logger statements in convert_input

d884203

Signed-off-by: Abhinav Pandey <[email protected]>

Added seperate function to check machine info

68d9516

Signed-off-by: Abhinav Pandey <[email protected]>

Update input args to get machine info

5208b02

Signed-off-by: Abhinav Pandey <[email protected]>

Correct roxygen documentations

f570646

Signed-off-by: Abhinav Pandey <[email protected]>

Update tests

e479c46

Signed-off-by: Abhinav Pandey <[email protected]>

This comment was marked as outdated.

Sign in to view

Sweetdevil144 and others added 7 commits July 31, 2024 11:48

Merge branch 'PecanProject:develop' into gsoc/convert-input

d9e911b

Merge branch 'develop' into gsoc/convert-input

ed581f7

Merge branch 'develop' into gsoc/convert-input

63ac964

Merge branch 'develop' into gsoc/convert-input

0f9ac13

Merge branch 'develop' into gsoc/convert-input

b98617f

Merge branch 'develop' into gsoc/convert-input

4b771d3

Refactor extra variables in run.meta.anbalysis

63f270f

Signed-off-by: Abhinav Pandey <[email protected]>

github-actions bot added the Modules label Aug 14, 2024

Sweetdevil144 and others added 7 commits August 16, 2024 22:45

Merge branch 'PecanProject:develop' into gsoc/convert-input

dbb7a6d

get existing machine info using helper function

74003d9

Signed-off-by: Abhinav Pandey <[email protected]>

Merge branch 'develop' into gsoc/convert-input

95fb810

Merge branch 'develop' into gsoc/convert-input

2bcb7c4

Merge branch 'develop' into gsoc/convert-input

fcae9bd

Merge branch 'develop' into gsoc/convert-input

c8e8a02

Merge branch 'develop' into gsoc/convert-input

766174f

infotroph requested changes Sep 21, 2024

View reviewed changes

Merge branch 'develop' into gsoc/convert-input

d9074df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `convert_input` to Perform tasks via helper function #3338

Refactor `convert_input` to Perform tasks via helper function #3338

Sweetdevil144 commented Jul 18, 2024 •

edited

Loading

Sweetdevil144 Jul 19, 2024

infotroph Jul 23, 2024

This comment was marked as outdated.

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

infotroph Sep 21, 2024

	get.machine.info <- function(host, input.args, input.id = NULL, con = NULL) {
	get_machine_info <- function(host, input.args, input.id = NULL, con = NULL) {

	get.machine.host <- function(host, con = NULL) {
	get_machine_host <- function(host, con) {

	if (missing(input.id) \|\| is.na(input.id) \|\| is.null(input.id)) {
	if (is.na(input.id) \|\| is.null(input.id)) {

	ens.flag <- if (!is.null(ensemble) \| is.null(ensemble_name)) TRUE else FALSE
	ens.flag <- (!is.null(ensemble) \|\| is.null(ensemble_name))

		mocked_res <- mockery::mock(data.frame(file = c("A", "B"), file_size = c(100, 200), missing = c(FALSE, FALSE), empty = c(FALSE, FALSE)))
		mockery::stub(check_missing_files, "purrr::map_dfr", mocked_res)

		@@ -0,0 +1,19 @@
		test_that("`check_missing_files()` able to return correct missing files", {

Refactor convert_input to Perform tasks via helper function #3338

Are you sure you want to change the base?

Refactor convert_input to Perform tasks via helper function #3338

Conversation

Sweetdevil144 commented Jul 18, 2024 • edited Loading

Description

Motivation and Context

Review Time Estimate

Types of changes

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as outdated.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Refactor `convert_input` to Perform tasks via helper function #3338

Refactor `convert_input` to Perform tasks via helper function #3338

Sweetdevil144 commented Jul 18, 2024 •

edited

Loading