Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes ids_get() Default Parameters for counterparts and start_date to Reduce Data Volume and Align with User Needs #50 #57

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion R/ids_bulk.R
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
#'
#' @export
#' @examplesIf curl::has_internet() && rlang::is_installed("readxl")
#' \donttest{
#' \dontrun{
#' available_files <- ids_bulk_files()
#' data <- ids_bulk(
#' available_files$file_url[1]
Expand Down
181 changes: 124 additions & 57 deletions R/ids_get.R
Original file line number Diff line number Diff line change
@@ -1,79 +1,119 @@
#' Fetch Debt Statistics from the World Bank International Debt Statistics API
#' Fetch Data from the World Bank International Debt Statistics (IDS) API
#'
#' This function returns a tibble with debt statistics data fetched from the
#' World Bank International Debt Statistics (IDS) API. The data can be filtered
#' by geographies, series, counterparts, and time periods.
#' Retrieves standardized debt statistics from the World Bank's International
#' Debt Statistics (IDS) database, which provides comprehensive data on the
#' external debt of low and middle-income countries. The function handles
#' country identification, data validation, and unit standardization, making it
#' easier to conduct cross-country debt analysis and monitoring.
#'
#' @param geographies A character vector representing the geographic codes
#' (e.g., "ZMB" for Zambia). This argument is required and cannot contain NA
#' values.
#' @param series A character vector representing the series codes (e.g.,
#' "DT.DOD.DPPG.CD"). This argument is required and cannot contain NA values.
#' @param counterparts A character vector representing counterpart areas (e.g.,
#' "all", "001"). This argument is required and cannot contain NA values
#' (default: "all").
#' @param start_date An optional numeric value representing the starting year
#' (e.g., 2015). It must be greater than or equal to 1970. If not provided, the
#' entire time range is used.
#' @param end_date An optional numeric value representing the ending year (e.g.,
#' 2020). It must be greater than or equal to 1970 and cannot be earlier than
#' `start_date`. If not provided, the entire available time range is used.
#' @param progress A logical value indicating whether to display a progress
#' message during the request process (default: `FALSE`). Must be either `TRUE`
#' or `FALSE`.
#' @param geographies A character vector of geography identifiers representing
#' debtor countries and aggregates. Must use `geography_id` from
#' `ids_list_geographies()`:
#' * For individual countries, use ISO3C codes (e.g., "GHA" for Ghana)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use \link{ids_list_geographies} instead of ids_list_geographies(), then you get a hyperlink to the function in the docs and I believe this is desired :)

#' * For aggregates, use World Bank codes (e.g., "LIC" for low income
#' countries)
#' The IDS database covers low and middle-income countries and related
#' aggregates only. Cannot contain NA values.
#'
#' @param series A character vector of debt statistics series identifiers that
#' must match the `series_id` column from `ids_list_series()`. Each series
#' represents a specific debt statistic (e.g., "DT.DOD.DECT.CD" for total
#' external debt stocks, "DT.TDS.DECT.CD" for debt service payments). Cannot
#' contain NA values.
#'
#' @param counterparts A character vector of creditor identifiers that must
#' match the `counterpart_id` column from `ids_list_counterparts()`. The
#' default "WLD" returns aggregated global totals across all creditors.
#' Common options:
#' * "WLD" - World total (aggregated across all creditors)
#' * "all" - Retrieve data broken down by all creditors
#' * Individual creditors use numeric codes (e.g., "730" for China)
#' * Special creditors have text codes (e.g., "907" for IMF, "BND" for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important to be careful about the terms "numeric" and "text" since these have a special meaning in R. (I.e., are we supposed to provide 730 as a numeric variable but "907" as a string?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to provide identifiers as string (I guess because there can also be leading 0s?). I also believe that we should just write about text codes even if they are numbers.

#' bondholders)
#' Cannot contain NA values.
#'
#' @param start_date A numeric value representing the starting year (default:
#' 2000). Must be >= 1970. The default focuses on modern data while reducing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that if you set the default start date to 2000, you risk misleading casual users into thinking no earlier data is available.

#' data volume. For historical analysis, explicitly set to 1970.
#'
#' @param end_date A numeric value representing the ending year (default: NULL).
#' Must be >= 1970 and cannot be earlier than start_date. If NULL, returns
#' data through the most recent available year. Some debt service related
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually could remove this constraint, replace any sub-1970 year with 1970, and just raise a warning flag to let the user know we did so.

#' series include projections of debt service. For the 2024 data release,
#' debt service projections available through 2031.
#'
#' @param progress A logical value indicating whether to display progress
#' messages during data retrieval (default: FALSE).
#'
#' @return A tibble containing debt statistics with the following columns:
#' \describe{
#' \item{geography_id}{The unique identifier for the geography (e.g., "ZMB").}
#' \item{series_id}{The unique identifier for the series (e.g.,
#' "DT.DOD.DPPG.CD").}
#' \item{counterpart_id}{The unique identifier for the counterpart (e.g.,
#' "all").}
#' \item{year}{The year corresponding to the data (e.g., 2020).}
#' \item{value}{The numeric value representing the statistic for the given
#' geography, series, counterpart, and year.}
#' \item{geography_id}{The identifier for the debtor geography (e.g., "GHA"
#' for Ghana, "LIC" for low income countries)}
#' \item{series_id}{The identifier for the debt statistic series (e.g.,
#' "DT.DOD.DECT.CD" for total external debt stocks)}
#' \item{counterpart_id}{The identifier for the creditor (e.g., "WLD" for
#' world total, "730" for China)}
#' \item{year}{The year of the observation}
#' \item{value}{The numeric value of the debt statistic, standardized to the
#' units specified in the series definition (typically current US dollars)}
#' }
#'
#' @export
#' @section Data Coverage and Validation:
#' The IDS database provides detailed debt statistics for low and middle-income
#' countries, including:
#' * Debt stocks and flows
#' * Debt service and interest payments
#' * Creditor composition
#' * Terms and conditions of new commitments
#'
#' @examplesIf curl::has_internet()
#' \donttest{
#' # Fetch data for a series without specifying a time range or counterpart
#' ids_get(
#' geographies = "ZMB",
#' series = "DT.DOD.DPPG.CD",
#' )
#' To ensure valid queries:
#' * Use `ids_list_geographies()` to find valid debtor geography codes
#' * Use `ids_list_series()` to explore available debt statistics
#' * Use `ids_list_counterparts()` to see available creditor codes
#'
#' # Fetch specific debt statistics for Zambia from 2015 to 2020
#' ids_get(
#' geographies = "ZMB",
#' series = c("DT.DOD.DPPG.CD", "BM.GSR.TOTL.CD"),
#' start_date = 2015,
#' end_date = 2020
#' @examples
#' \donttest{
#' # Get total external debt stocks for a single country from 2000 onward
#' ghana_debt <- ids_get(
#' geographies = "GHA",
#' series = "DT.DOD.DECT.CD" # External debt stocks, total
#' )
#'
#' # Fetch data for specific counterparts
#' ids_get(
#' geographies = "ZMB",
#' series = "DT.DOD.DPPG.CD",
#' counterparts = c("216", "231")
#' # Compare debt service metrics across income groups
#' income_groups <- ids_get(
#' geographies = c("LIC", "LMC", "UMC"), # Income group aggregates
#' series = "DT.TDS.DECT.CD", # Total debt service
#' start_date = 2010
#' )
#'
#' # Fetch data for multiple geographies and counterparts
#' ids_get(
#' geographies = c("ZMB", "CHN"),
#' series = "DT.DOD.DPPG.CD",
#' counterparts = c("216", "231"),
#' start_date = 2019,
#' end_date = 2020
#' # Analyze debt composition by major creditors
#' creditor_analysis <- ids_get(
#' geographies = c("KEN", "ETH"), # Kenya and Ethiopia
#' series = c(
#' "DT.DOD.DECT.CD", # Total external debt
#' "DT.TDS.DECT.CD" # Total debt service
#' ),
#' counterparts = c(
#' "WLD", # World total
#' "730", # China
#' "907", # IMF
#' "BND" # Bondholders
#' ),
#' start_date = 2015
#' )
#' }
#'
#' @seealso
#' * `ids_list_geographies()` for available debtor geography codes
#' * `ids_list_series()` for available debt statistics series codes
#' * `ids_list_counterparts()` for available creditor codes
#'
#' @export
ids_get <- function(
geographies,
series,
counterparts = "all",
start_date = NULL,
counterparts = "WLD",
start_date = 2000,
end_date = NULL,
progress = FALSE
) {
Expand All @@ -97,6 +137,10 @@ ids_get <- function(
.progress = progress
)


# Apply specific filtering logic for years beyond latest actual data
debt_statistics <- filter_post_actual_na(debt_statistics)

debt_statistics
}

Expand Down Expand Up @@ -201,6 +245,12 @@ validate_progress <- function(progress) {
}
}


# to be updated manually with each release
# for the 2024-12 IDS release:
latest_year_observied <- 2023
latest_year_projections <- 2031

create_time <- function(start_date, end_date) {
if (!is.null(start_date) && !is.null(end_date)) {
if (start_date > end_date) {
Expand All @@ -209,7 +259,24 @@ create_time <- function(start_date, end_date) {
)
}
paste0("YR", seq(start_date, end_date, by = 1))
} else if (!is.null(start_date)) {
paste0("YR", seq(start_date, latest_year_projections, by = 1))
} else {
"all"
}
}

filter_post_actual_na <- function(data) {
# Identify rows after the latest actual year
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you're trying to drop NAs outside the coverage period, but preserve NAs within the coverage period? To avoid hardcoding and continually updating the start and end years, I would suggest:

  1. Get the first and last years in the retrieved dataset with a non-NA value, and treat these as the boundaries of the data, then
  2. Drop all NAs outside those boundaries.

data_after_actual <- data |>
filter(.data$year > latest_year_observied)

# Check if all rows for these years have NA in `value`
if (all(is.na(data_after_actual$value))) {
# Remove these rows from the data
data <- data |>
filter(.data$year <= latest_year_observied)
}

data
}
2 changes: 1 addition & 1 deletion man/ids_bulk.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading