-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes ids_get() Default Parameters for counterparts and start_date to Reduce Data Volume and Align with User Needs #50 #57
base: main
Are you sure you want to change the base?
Conversation
This reverts commit aa48566. on suggestion of Christopher Smith, who is trying to help me fix my Git mistakes Reverting on suggestion of Christopher Smith who is helping me navigate my GitMistakes
- Changed example code for ids_bulk() to use `\dontrun` instead of `\dontest`. because of CMD Check error related to #51. - Updated function title and parameter descriptions for clarity. - Set new default values for `counterparts` and `start_date`. - changed date logic so it would work with new default start date (previously it would give from 1970 unless a end_date was also specified). - Added filtering logic to remove rows with NA values beyond the latest year of observed data, (2023) while allowing projections (2031). - Enhanced tests to validate new defaults and filtering behavior.
#' Common options: | ||
#' * "WLD" - World total (aggregated across all creditors) | ||
#' * "all" - Retrieve data broken down by all creditors | ||
#' * Individual creditors use numeric codes (e.g., "730" for China) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important to be careful about the terms "numeric" and "text" since these have a special meaning in R. (I.e., are we supposed to provide 730 as a numeric variable but "907" as a string?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have to provide identifiers as string (I guess because there can also be leading 0s?). I also believe that we should just write about text codes even if they are numbers.
#' data volume. For historical analysis, explicitly set to 1970. | ||
#' | ||
#' @param end_date A numeric value representing the ending year (default: NULL). | ||
#' Must be >= 1970 and cannot be earlier than start_date. If NULL, returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually could remove this constraint, replace any sub-1970 year with 1970, and just raise a warning flag to let the user know we did so.
#' bondholders) | ||
#' Cannot contain NA values. | ||
#' | ||
#' @param start_date A numeric value representing the starting year (default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that if you set the default start date to 2000, you risk misleading casual users into thinking no earlier data is available.
} else { | ||
"all" | ||
} | ||
} | ||
|
||
filter_post_actual_na <- function(data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it you're trying to drop NAs outside the coverage period, but preserve NAs within the coverage period? To avoid hardcoding and continually updating the start and end years, I would suggest:
- Get the first and last years in the retrieved dataset with a non-NA value, and treat these as the boundaries of the data, then
- Drop all NAs outside those boundaries.
I pushed a fix for a typo and a missing mock that were breaking some of the tests. Approved the PR but also added some comments above on the |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #57 +/- ##
===========================================
- Coverage 100.00% 96.31% -3.69%
===========================================
Files 10 10
Lines 235 244 +9
===========================================
Hits 235 235
- Misses 0 9 +9 ☔ View full report in Codecov by Sentry. |
Closes #50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good mostly! There is a test coverage gap now according to the codecov report: the multi-page request is not fired during testing because we don't pull enough data through ids_get() - need to include at least one test / example where we download a lot of data from the API.
#' or `FALSE`. | ||
#' @param geographies A character vector of geography identifiers representing | ||
#' debtor countries and aggregates. Must use `geography_id` from | ||
#' `ids_list_geographies()`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use \link{ids_list_geographies} instead of ids_list_geographies()
, then you get a hyperlink to the function in the docs and I believe this is desired :)
#' Common options: | ||
#' * "WLD" - World total (aggregated across all creditors) | ||
#' * "all" - Retrieve data broken down by all creditors | ||
#' * Individual creditors use numeric codes (e.g., "730" for China) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have to provide identifiers as string (I guess because there can also be leading 0s?). I also believe that we should just write about text codes even if they are numbers.
Changed default parameters for
start_date
andcounterparts
forids_get()
.start_date
to 2000, it would still provide data from 1970 if end_date was empty (it worked as expected when end_date was specified). I fixed this, and created filtering logic so that debt service data with projections through 2031 would show up, but for data without projections it filtered out NA values after 2023 (last observed data)ids_list_years()
and create some logic that pulled the years from there automatically. I figured we have more pressing issues, but might want to address that later.This is my first pull request! Any feedback welcome.