Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{simulist} v0.3.0 full package review #117

Closed
wants to merge 487 commits into from
Closed

{simulist} v0.3.0 full package review #117

wants to merge 487 commits into from

Conversation

joshwlambert
Copy link
Member

This PR is to provide a platform to review the entirety of the package.

Once this review concludes I will release v0.3.0 on GitHub.

Please see the NEWS.md file for an overview of changes between v0.2.0 and v0.3.0.

This PR is unconventional as it is not intended for merging or for additional commits (unless minor) and instead comments will be converted to issues and these will be addressed in their own PRs.

actions-user and others added 30 commits January 9, 2024 17:06
* Add DOI to README

* Add CITATION file with DOI

* Update CITATION.cff

* Automatic readme update

* Add more info

* Update CITATION.cff

* Fix code syntax

* Update CITATION.cff

* Update CITATION.cff

* Automatic readme update

---------

Co-authored-by: GitHub Action <[email protected]>
…ew arguments (mean_contacts, contact_interval, prob_infect)
@joshwlambert joshwlambert added the pkg review Full package review label May 22, 2024
Copy link

This pull request:

  • Adds 45 new dependencies (direct and indirect)
  • Adds 6 new system dependencies
  • Removes 0 existing dependencies (direct and indirect)
  • Removes 0 existing system dependencies

(Note that results may be inacurrate if you branched from an outdated version of the target branch.)

@chartgerink chartgerink self-requested a review May 23, 2024 07:09
Copy link

This pull request:

  • Adds 49 new dependencies (direct and indirect)
  • Adds 7 new system dependencies
  • Removes 0 existing dependencies (direct and indirect)
  • Removes 0 existing system dependencies

Reach out on slack (#code-review or #help channels) to double check if there are base R alternatives to the new dependencies.

(Note that results may be inaccurate if you branched from an outdated version of the target branch.)

@joshwlambert
Copy link
Member Author

When I originally opened this PR the review branch was many commits behind main and I did not notice (I think this was because I previously had a review branch for the last full package review (#73) and the branch did not update even though I deleted it previously). I've now fixed this by rebasing review onto main and now this PR contains all the changes up to the HEAD of main. Apologies if this has caused any inconvenience.

Copy link
Member

@chartgerink chartgerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given I previously approved simulist 0.2.0, I tried to focus on the changes for this review. I initially forgot about my previous review so pardon if I'm opening up old discussions - I tried to cross-check as much as I could.

Happy to approve again 💯 I left some comments for your consideration, but in no way are they blockers from my end.

tests/testthat/testdata/README.md Show resolved Hide resolved
R/sim_linelist.R Show resolved Hide resolved
Comment on lines +226 to +243
if (is.data.frame(hosp_risk)) {
hosp_risk <- .check_risk_df(
hosp_risk,
age_range = age_range
)
}
if (is.data.frame(hosp_death_risk)) {
hosp_death_risk <- .check_risk_df(
hosp_death_risk,
age_range = age_range
)
}
if (is.data.frame(non_hosp_death_risk)) {
non_hosp_death_risk <- .check_risk_df(
non_hosp_death_risk,
age_range = age_range
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems repetitive and a good candidate for a refactor into a generalized (internal) function.

function (object, range) {
  if (is.data.frame(object)) {
    object <- .check_risk_df(object, range)
  }
}
# OR even more generic
function (object, fn, range) {
  if (is.data.frame(object)) {
    object <- fn(object, range)
  }
}

This way you don't have to repeat yourself here and any future changes need to also only be done once.

The second option does not check the function arguments so that may prove a hurdle down the line (regrettably R and strong types is not the best match).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this looks like a code smell. Thanks for the suggestions on how to refactor this chunk. I've made some changes from these suggestions on a branch called refactor_df_checks.

However, I'm not sure that I prefer the new structure. Although ideally we'd be able to remove the chain of if statements all performing the same check, when implemented in it's own function the risks need to be bundled into a list, the new function adds more code to the package, function documentation and unit tests (yet to write) are required, and then the checked objects require unpacking in the sim_*() functions.

Note the .check_risk_df() does some <data.frame> formatting and so is not called purely for side-effects of warnings or errors. Therefore returning the objects and reassigning them to variables in required.

Please let me know your thoughts on this refactor and which you prefer, or if the refactor I've implemented is different from what you had in mind. I will leave the refactor_df_checks branch open on the package so if we decide to make this change it can be merged after the v0.3.0 release.

R/sim_linelist.R Show resolved Hide resolved
README.Rmd Show resolved Hide resolved
names_mf
}

#' Anonymise names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice I expect this to be .anonymize and I realize that we do not have any conventions for naming within Epiverse. I don't want to start a language discussion as I know this is personal preference, but wonder what your thoughts are on this? Do you expect functions to be named in en-GB, en-US or not do you have no implicit assumptions for yourself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've chosen to use en-GB as the language for the package: https://github.com/epiverse-trace/simulist/blob/main/DESCRIPTION#L50. This impacts the spell checking we perform in the package testing.

There is no Epiverse policy on which language to pick (see epiverse-trace/packagetemplate#103). There was also a discussion on the internal slack which I can send you if you're interested.

My personal policy is that if the function is exported and there is a difference in spelling between US and GB spelling then I export both. See https://github.com/epiverse-trace/epiparameter/blob/main/R/epidist.R#L674-L680. (Tidyverse do the same). For internal functions like .anonymise() I only have a GB spelling, but would not have a problem if other developers wanted their internal functions to use en-US.

#'
#' @return A vector of `character` strings of equal length to the input.
#' @keywords internal
.anonymise <- function(x, string_len = 10) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sometimes have a bit of friction with our minimal dependency requirement, as I have here - it seems like a lot of work to create a md5sum/shasum for strings, which does not seem possible in base R. It makes sense under minimal dependencies to create a custom function.

digest provides this functionality as well and could save you these lines of code. It also seems quite widely used and may meet the availability expectations (although certainly not as available as ggplot2).

Yet your implementation seems pretty good as well, so I am not going to argue for adding a dependency. Surfacing my thoughts on this so you can check-in on them if you'd like. Happy to discuss, but not a hard want 😊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick search of packages that offer hashing and several looked appropriate. I decided to write an internal implementation as the package does not need all of the complexity offered by these packages and the complexity of the function was not that difficult/time consuming to write, and I don't think it adds much of a maintenance burden to the package.

To me this kind of thing can go either way without much issue, either keep the internal function or import {digest} or equivalent package. I'll leave the current implementation for now as it seems to work fine, but we can reopen this discussion if things change. 😄

@pratikunterwegs pratikunterwegs self-assigned this May 28, 2024
Copy link

@pratikunterwegs pratikunterwegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening @joshwlambert - I've added only a few small technical suggestions in the files below; I think apart from one regarding regexes, the others don't prevent a release. I haven't been able to look at the tests or vignettes in much detail as there are quite a few files, so hopefully others can pitch in to review those.

DESCRIPTION Show resolved Hide resolved
DESCRIPTION Show resolved Hide resolved
https://epiverse-trace.github.io/simulist/
BugReports: https://github.com/epiverse-trace/simulist/issues
Depends:
R (>= 3.6.0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've forgotten why there's an R version dependency; could you add this to the design decisions vignette?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly why this version was initially chosen. It does (roughly) match up with the Epiverse policy to support the last 4 R versions. It also seems from some discussions found online that working out which is the minimum R version that the package supports is not easy (without setting up many CI runs on many older versions of R).

For now I'll leave the dependency on R >= 3.6.0 as is, and will consider adding a note to the dependencies section of the design principles vignette, although not sure we want to replicate info from blueprints in every package.

LICENSE Show resolved Hide resolved
R/add_cols.R Show resolved Hide resolved
R/checkers.R Show resolved Hide resolved
R/checkers.R Show resolved Hide resolved
R/checkers.R Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some distinct logical blocks in this function - suggest adding comments that explain what the blocks do and why they are separated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see the logical blocks you mention. Could you please explain where you think the comments should be added? I don't mind if you do this here or as an issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a number of if statements which seemed to be handling different cases, and it would be good to add short comments to explain them. It's not a priority and could be added in a future release.

inst/CITATION Show resolved Hide resolved
@joshwlambert
Copy link
Member Author

A big thank you to both reviewers for providing many helpful comments and suggestions. As always these reviews have improved the package.

Most comments have been responded to, linking to commit hashes or PRs that implement the change. Some changes that require more time and thought have been logged as issues and will be tackled in development for the next version. Some comments are left unresolved so that anyone can chip in and share their opinions.

I will now close this full package review PR and move on to the release checklist and release v0.3.0. 🚀

Thanks again for the speedy and enjoyable review! 😄

@joshwlambert joshwlambert deleted the review branch May 31, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg review Full package review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants