-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review #126
Review #126
Conversation
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this. I like the reporting functionality, and I think it can provide a unique value compared to other data cleaning packages.
I've left quite a lot of comments. Please focus on the ones regarding the user interface before the CRAN release.
Two general comments applicable throughout the codebase:
- when possible, please try to address the edge cases as part of the general cases. Otherwise, if every edge case if special cased, the code becomes very long and difficult to follow.
- in general, user input should be properly formatted as standard R objects. We don't want to have to clean and parse user input on top of already messy data.
R/span.R
Outdated
# end_date can be a column of the input data or | ||
# a vector of Date values with the same length as number of row in data or | ||
# a Date value | ||
if (is.character(end_date) && end_date %in% colnames(data)) { | ||
span_result <- abs(unclass(data[[target_column]]) - | ||
unclass(data[[end_date]])) | ||
} else { | ||
span_result <- abs(unclass(data[[target_column]]) - unclass(end_date)) | ||
} | ||
units <- c(365.25, 30.0, 7.0, 1.0) | ||
names(units) <- c("years", "months", "weeks", "days") | ||
if (!is.null(span_remainder_unit)) { | ||
data[, span_column_name] <- floor(span_result / units[span_unit]) | ||
data[, sprintf("remainder_%s", span_remainder_unit)] <- round( | ||
(span_result %% units[span_unit]) / units[span_remainder_unit], | ||
digits = 2L) | ||
} else { | ||
data[, span_column_name] <- round(span_result / units[span_unit], | ||
digits = 2L) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe lubridate or base R have good functionality to deal with date differences. Any reasons to not use these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was making use of lubridate functions, which @pratikunterwegs polished furhter.
We were at some point doing a proof of concept about reducing dependencies by making the function with only base R. This function happened to be the test function and we managed to archieve the same as when lubridate was used. So we decided to keep this version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are including lubridate anyway (as it currently stands), keeping the custom function because it work does not seem like a convincing reason to include it.
A custom function also increases maintenance complexity. This custom function is not as well validated and tested as lubridate, which risks us having to deal with bugs and edge cases down the line.
I would recommend tracking this further in an issue, so that we don't lose track of it and a decision on this does not block the package review process. As a nice side benefit: using lubridate directly could also resolve #134 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have restored the usage of {lubridate} functionalities in this function and changed the function name from span()
to timespan()
. See commit 1a448d1
.
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Karim-Mane - the most critical issue has been resolved from my end (creating and deleting tmp folders). The dependencies are something that can (and in my opinion, should) be revisited at a later time.
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
…ck_timeframe() function
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
…n() into timespan()
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
…n() into timespan()
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
…mat for every date value.
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
…mat for every date value.
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
@Bisaloo, @chartgerink - I have made some changes to account for your latest reviews. Kindly have a look and let me know your thoughts. I am aiming to close the PR on Wednesday latest. |
Restarting conversation thread as it's getting lost in the noise otherwise:
I still don't follow. The example in the vignette reads:
But if I already know that rows 33 and 55 are to be removed, why would I use
|
Thanks @Bisaloo for pointing these out.
Without this argument, the
After looking into the
I suggest to keep the function and continue using it. We could delete it and write a new function. But the code base in that new function will not be too different (if it is) from what is there currently. I have added some code to output rows where the date values comply with multiple formats. This information is returned as a data frame and will be shown in the report to help the user decide whether to confirm or amend the result. |
No, it still doesn't really make sense to me. I understand and I agree
There are many other packages with solid functionality for just this (e.g. the anytime R package). In order to not get stuck on this discussion and delay the release, let's please:
|
I guess what I was trying to say was that I will delete this
Issue #133 can be used for this
Sounds good with me. |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
…riginal column names
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This is a PR for a full package review of
{cleanepi}