generated from CDCgov/template
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PRISM categorization #31
Draft
dylanhmorris
wants to merge
19
commits into
main
Choose a base branch
from
dhm-add-categorization
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
708c5d0
Add categorize_vector
dylanhmorris c34d9de
Add prism thresholds as raw data
dylanhmorris 2a2d0a8
Add prism threshold dataframe
dylanhmorris f944682
Add explanatory comments
dylanhmorris bc31183
Tweak explanatory comment formatting
dylanhmorris 6fc011e
Add get_prism_cutpoints()
dylanhmorris 385934d
Make hubverse tests more platform robust, improve categorize vector f…
dylanhmorris a41574c
Merge branch 'main' into dhm-add-categorization
dylanhmorris 8f067d5
Document get_prism_cutpoints
dylanhmorris af11743
Fix get_prism_cutpoints docs
dylanhmorris 18ba933
Merge branch 'main' into dhm-add-categorization
dylanhmorris 2e87fe9
Merge branch 'main' into dhm-add-categorization
dylanhmorris 95f52cb
Merge branch 'main' into dhm-add-categorization
dylanhmorris 587a57a
Add correctness tests for categorize_prism
dylanhmorris e9fb850
Add some details on output format for get_prism_cutpoints
dylanhmorris 7032ea9
Update tests
dylanhmorris b618805
Better testthat setup
dylanhmorris 6338db1
Higher precision expectations
dylanhmorris 1690c6d
Checkpoint categorize prism
dylanhmorris File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
#' Get PRISM activity level cutpoints for given | ||
#' diseases and locations | ||
#' | ||
#' @param diseases disease(s) for which to return the cutpoints. | ||
#' One of `"ARI"`, `"COVID-19"`, `"Influenza"`, or `"RSV"`, or | ||
#' an array of those values. | ||
#' @param locations location(s) for which to return the cutpoints. | ||
#' A location two-letter abbreviation as in the `short_name` | ||
#' column of [forecasttools::us_location_table], or an array | ||
#' of those abbreviations. | ||
#' @return The cutpoints, as an ordered list of vectors. | ||
#' | ||
#' @examples | ||
#' get_prism_cutpoints("WA", "Influenza") | ||
#' | ||
#' get_prism_cutpoints(c("US", "WA"), "COVID-19") | ||
#' | ||
#' get_prism_cutpoints(c("US", "WA"), c("ARI", "RSV")) | ||
#' @export | ||
get_prism_cutpoints <- function(locations, diseases) { | ||
checkmate::assert_names( | ||
diseases, | ||
subset.of = | ||
dimnames(forecasttools::prism_thresholds)$disease, | ||
what = "disease" | ||
) | ||
checkmate::assert_names( | ||
locations, | ||
subset.of = | ||
dimnames(forecasttools::prism_thresholds)$location, | ||
what = "location" | ||
) | ||
|
||
return(purrr::map2(locations, diseases, \(x, y) { | ||
forecasttools::prism_thresholds[x, y, ] | ||
})) | ||
} | ||
|
||
|
||
|
||
#' Categorize a vector of values into PRISM | ||
#' activity level bins. | ||
#' | ||
#' @param values values to categorize | ||
#' @param locations vector of locations of length equal to | ||
#' `values` or a single location for all `values`. | ||
#' @param diseases vector of diseases of length equal to | ||
#' `values` or a single disease for all `values`. | ||
#' @param prism_bin_names Bin names for the PRISM bins. | ||
#' in order from lowest to highest. Must be a vector of | ||
#' length 5. `list(prism_bin_names)` will be passed as the | ||
#' `label_sets` argument to [categorize_vector()]. | ||
#' Defaults to the standard PRISM bin names in title case: | ||
#' `c("Very Low", "Low", "Moderate", "High", "Very High")`. | ||
#' @return A factor vector equal in length to `values` of | ||
#' the categories, as the output of [categorize_vector()]. | ||
#' @export | ||
categorize_prism <- function(values, | ||
locations, | ||
diseases, | ||
prism_bin_names = c( | ||
"Very Low", | ||
"Low", | ||
"Moderate", | ||
"High", | ||
"Very High" | ||
)) { | ||
cutpoints <- get_prism_cutpoints(locations, diseases) | ||
|
||
return(categorize_vector( | ||
values, | ||
break_sets = cutpoints, | ||
label_sets = list(prism_bin_names) | ||
)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
#' Catgeorize entries in a numeric vector according | ||
#' to a list of breakpoints and labels. | ||
#' | ||
#' Vectorized version of [base::cut()] that | ||
#' allows each entry in the vector to use different | ||
#' break points and labels. | ||
#' | ||
#' @param values Vector of values to categorize. | ||
#' @param break_sets sets of category breakpoints, | ||
#' with one set for each entry of `values`, as a list of | ||
#' numeric vectors. Alternatively, a single set of category | ||
#' breakpoints to use for all values, as a one-entry list | ||
#' containing a single numeric vector. | ||
#' @param label_sets sets of labels to associate, | ||
#' with the corresponding breakpoints in `break_sets`, | ||
#' with one set for each entry of `values`, as a list of | ||
#' character vectors. Alternatively, a single set of labels | ||
#' to use for all values, as a one-entry list containing a | ||
#' single character vector. | ||
#' @param include.lowest Passed to [base::cut()]. Default `TRUE`. | ||
#' @param order Passed to [base::cut()]. Default `TRUE`. | ||
#' @param right Passed to [base::cut()]. Default `TRUE`. | ||
#' @param ... Additional keyword arguments passed to | ||
#' [base::cut()]. | ||
#' @return A categorized vector, as vector of ordered | ||
#' factors. | ||
#' @export | ||
categorize_vector <- function(values, | ||
break_sets, | ||
label_sets, | ||
include.lowest = TRUE, # nolint | ||
order = TRUE, | ||
right = TRUE, | ||
...) { | ||
n_vals <- length(values) | ||
|
||
checkmate::assert_list(break_sets, | ||
types = "numeric" | ||
) | ||
checkmate::assert_list(label_sets, | ||
types = "character" | ||
) | ||
checkmate::qassert(break_sets, c("l1", glue::glue("l{n_vals}"))) | ||
checkmate::qassert(label_sets, c("l1", glue::glue("l{n_vals}"))) | ||
|
||
return(purrr::pmap_vec( | ||
list( | ||
x = values, | ||
breaks = break_sets, | ||
labels = label_sets, | ||
include.lowest = include.lowest, | ||
order = order, | ||
right = right, | ||
... | ||
), | ||
cut | ||
)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
#' When there is an appropriate online endpoint for these | ||
#' thresholds, this this script will be updated to point there. | ||
#' For now, it reads from a bundled .tsv in inst/extdata | ||
|
||
thresholds <- readr::read_tsv( | ||
fs::path( | ||
"inst", | ||
"extdata", | ||
"prism_thresholds", | ||
ext = "tsv" | ||
), | ||
show_col_types = FALSE | ||
) | ||
|
||
## Transform thresholds from percentage to proprotion | ||
prop_thresholds <- thresholds |> | ||
dplyr::transmute( | ||
disease, | ||
location = state_abb, | ||
prop_lower_bound = 0, | ||
prop_low = perc_level_low / 100, | ||
prop_moderate = perc_level_moderate / 100, | ||
prop_high = perc_level_high / 100, | ||
prop_very_high = perc_level_very_high / 100, | ||
prop_upper_bound = 1 | ||
) | ||
|
||
#' Transform thresholds from flat table to | ||
#' multi-dimensional array, via a nested list. | ||
#' | ||
#' Method for conversion to multi-dim array: | ||
#' 1. Transform the long-form tabular data to a | ||
#' nested named list (via nest() and deframe()) | ||
#' 2. Transform the nested named list to a multi-dimensional | ||
#' array with dimension names (via simplify2array() and unlist()) | ||
#' 3. Order and name the dimensions of that array. | ||
|
||
thresholds_nested_list <- prop_thresholds |> | ||
tidyr::nest(breaks = dplyr::starts_with("prop_")) |> | ||
tidyr::nest(loc_breaks = c(location, breaks)) |> | ||
tibble::deframe() |> | ||
purrr::map(deframe) # yields a nested list of tibbles | ||
|
||
prism_thresholds <- thresholds_nested_list |> | ||
simplify2array() |> # yields a 2D array of length-1 tibbles | ||
apply(1:2, unlist) |> # yields a 3D array | ||
aperm(c(2, 3, 1)) # orders 3D array dimensions as desired | ||
|
||
names(dimnames(prism_thresholds)) <- c("location", "disease", "breaks") | ||
|
||
usethis::use_data(prism_thresholds, | ||
overwrite = TRUE | ||
) |
Binary file not shown.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels inefficient but equally it's readable and premature optimization is bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any way to avoid it because how R interprets array slicing.