Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes ids_get() Default Parameters for counterparts and start_date to Reduce Data Volume and Align with User Needs #50 #57

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

t-emery
Copy link
Contributor

@t-emery t-emery commented Dec 6, 2024

Changed default parameters for start_date and counterparts for ids_get().

  • Improved function documentation & examples, partially addressing Enhancement: Add documentation to clarify country coverage of IDS + add related informative error handling #47.
  • changed date behavior. Initially despite changing start_date to 2000, it would still provide data from 1970 if end_date was empty (it worked as expected when end_date was specified). I fixed this, and created filtering logic so that debt service data with projections through 2031 would show up, but for data without projections it filtered out NA values after 2023 (last observed data)
  • Fixing this required using hard-coded dates. These can be updated every year (Perhaps we can document the "yearly data update checklist in the wiki? Another option would be to add a ids_list_years() and create some logic that pulled the years from there automatically. I figured we have more pressing issues, but might want to address that later.

This is my first pull request! Any feedback welcome.

This reverts commit aa48566.
on suggestion of Christopher Smith, who is trying to help me fix my Git mistakes

Reverting on suggestion of Christopher Smith who is helping me navigate my GitMistakes
- Changed example code for ids_bulk() to use `\dontrun` instead of `\dontest`. because of CMD Check error related to #51.
- Updated function title and parameter descriptions for clarity.
- Set new default values for `counterparts` and `start_date`.
- changed date logic so it would work with new default start date (previously it would give from 1970  unless a end_date was also specified).
- Added filtering logic to remove rows with NA values beyond the latest year of observed data, (2023) while allowing projections (2031).
- Enhanced tests to validate new defaults and filtering behavior.
@t-emery t-emery added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 6, 2024
#' Common options:
#' * "WLD" - World total (aggregated across all creditors)
#' * "all" - Retrieve data broken down by all creditors
#' * Individual creditors use numeric codes (e.g., "730" for China)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important to be careful about the terms "numeric" and "text" since these have a special meaning in R. (I.e., are we supposed to provide 730 as a numeric variable but "907" as a string?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to provide identifiers as string (I guess because there can also be leading 0s?). I also believe that we should just write about text codes even if they are numbers.

#' data volume. For historical analysis, explicitly set to 1970.
#'
#' @param end_date A numeric value representing the ending year (default: NULL).
#' Must be >= 1970 and cannot be earlier than start_date. If NULL, returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually could remove this constraint, replace any sub-1970 year with 1970, and just raise a warning flag to let the user know we did so.

#' bondholders)
#' Cannot contain NA values.
#'
#' @param start_date A numeric value representing the starting year (default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that if you set the default start date to 2000, you risk misleading casual users into thinking no earlier data is available.

} else {
"all"
}
}

filter_post_actual_na <- function(data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you're trying to drop NAs outside the coverage period, but preserve NAs within the coverage period? To avoid hardcoding and continually updating the start and end years, I would suggest:

  1. Get the first and last years in the retrieved dataset with a non-NA value, and treat these as the boundaries of the data, then
  2. Drop all NAs outside those boundaries.

@chriscarrollsmith
Copy link
Contributor

I pushed a fix for a typo and a missing mock that were breaking some of the tests. Approved the PR but also added some comments above on the ids_get documentation that you should take a look at before merging. If you'd like me to tackle any additional adjustments, let me know.

Copy link

codecov bot commented Dec 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.31%. Comparing base (b6f1b9e) to head (6f86aa4).

Additional details and impacted files
@@             Coverage Diff             @@
##              main      #57      +/-   ##
===========================================
- Coverage   100.00%   96.31%   -3.69%     
===========================================
  Files           10       10              
  Lines          235      244       +9     
===========================================
  Hits           235      235              
- Misses           0        9       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chriscarrollsmith
Copy link
Contributor

Closes #50

Copy link
Collaborator

@christophscheuch christophscheuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good mostly! There is a test coverage gap now according to the codecov report: the multi-page request is not fired during testing because we don't pull enough data through ids_get() - need to include at least one test / example where we download a lot of data from the API.

#' or `FALSE`.
#' @param geographies A character vector of geography identifiers representing
#' debtor countries and aggregates. Must use `geography_id` from
#' `ids_list_geographies()`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use \link{ids_list_geographies} instead of ids_list_geographies(), then you get a hyperlink to the function in the docs and I believe this is desired :)

#' Common options:
#' * "WLD" - World total (aggregated across all creditors)
#' * "all" - Retrieve data broken down by all creditors
#' * Individual creditors use numeric codes (e.g., "730" for China)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to provide identifiers as string (I guess because there can also be leading 0s?). I also believe that we should just write about text codes even if they are numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
3 participants