-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add simulated data from Gostic, 2020 for benchmarking (#37)
* Add simulated data from Gostic, 2020 for benchmarking This commit re-uses the data processing and documentation from CDCgov/cfa-epinow2-pipeline#17. That repo is open source and public domain by nature of being USG property. I think it's convenient to re-use, but @athowes and @kgostic there's a little text in there that you both suggested, so I'd appreciate if you could give your permission for the re-use here! I've made sure to credit you both in the commit as it's partially your writing. It's probably best practice to make the data prep and processing here fully reproducible in `data-raw/` but it's quite a pain to do, with a mixture of shells cripting and R scripting needed. I skipped it out of convenience, but if you have a really strong feeling that it's required @seabbs, let me know and we can revisit. Closes #9 Co-authored-by: Adam Howes <[email protected]> Co-authored-by: Katie Gostic <[email protected]> * Bump NEWS * Coerce new `obs_incidence` col to integer from double * Generate GI PMF in `data-raw/` * Point to specific CFAEpiNow2Pipeline commit * Remove copied & pasted unrelated text * Re-render roxygen docs * Add primarycensoreddist to remotes * Typo * Rename synthetic dataset and GI dist * Pin primarycensorreddist to R-universe not github --------- Co-authored-by: Adam Howes <[email protected]> Co-authored-by: Katie Gostic <[email protected]>
- Loading branch information
1 parent
f80eefe
commit bd6d033
Showing
13 changed files
with
205 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,5 +5,6 @@ | |
^\.pre-commit-config\.yaml$ | ||
^_pkgdown\.yml$ | ||
^codecov\.yml$ | ||
^data-raw$ | ||
^docs$ | ||
^pkgdown$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
#' Synthetic dataset of stochastic SIR system with known Rt | ||
#' | ||
#' A dataset from Gostic, Katelyn M., et al. "Practical considerations for | ||
#' measuring the effective reproductive number, Rt." PLoS Computational Biology | ||
#' 16.12 (2020): e1008409. The data are simulated from a stochastic SEIR | ||
#' compartmental model. | ||
#' | ||
#' This synthetic dataset has a number of desirable properties: | ||
#' | ||
#' 1. The force of infection changes depending on the Rt, allowing for sudden | ||
#' changes in the Rt. This allows for modeling of sudden changes in infection | ||
#' dynamics, which might otherwise be difficult to capture. Rt estimation | ||
#' framework | ||
#' | ||
#' 2. The realized Rt is known at each timepoint | ||
#' | ||
#' 3. The dataset incorporates a simple generation interval and a reporting | ||
#' delay. | ||
#' | ||
#' Gostic et al. benchmark the performance of a number of Rt estimation | ||
#' frameworks, providing practical guidance on how to use this dataset to | ||
#' evaluate Rt estimates. | ||
#' | ||
#' In practice, we've found that the amount of observation noise in the | ||
#' incidence and/or observed cases is often undesirably low for testing. Many | ||
#' empirical datasets are much noisier. As a result, models built with these | ||
#' settings in mind can perform poorly on this dataset or fail to converge. We | ||
#' manually add observation noise with `rnbinom(299, mu = | ||
#' stochastic_sir_rt[["obs_cases"]], size = 10)` and the random seed 123456 and | ||
#' store it in the `obs_incidence` column. | ||
#' | ||
#' @name stochastic_sir_rt | ||
#' @format `stochastic_sir_rt` A data frame with 301 rows and 12 columns: | ||
#' \describe{ | ||
#' \item{time}{Timestep of the discrete-time stochastic SEIR simulation} | ||
#' \item{date}{Added from the original Gostic, 2020 dataset. A date | ||
#' corresponding to the assigned `time`. Arbitrarily starts on January 1st, | ||
#' 2023.} | ||
#' \item{S, E, I, R}{The realized state of the stochastic SEIR system} | ||
#' \item{dS, dEI, DIR}{The stochastic transition between compartments} | ||
#' \item{incidence}{The true incidence in the `I` compartment at time t} | ||
#' \item{obs_cases}{The observed number of cases at time t from | ||
#' forward-convolved incidence.} | ||
#' \item{obs_incidence}{Added from the original Gostic, 2020 dataset. The | ||
#' `incidence` column with added negative-binomial observation noise. | ||
#' Created with `set.seed(123456)` and the call | ||
#' `rnbinom(299, mu = [["incidence"]], size = 10)` Useful for | ||
#' testing.} | ||
#' \item{true_r0}{The initial R0 of the system (i.e., 2)} | ||
#' \item{true_rt}{The known, true Rt of the epidemic system} | ||
#' } | ||
#' @source | ||
#' <https://github.com/cobeylab/Rt_estimation/tree/d9d8977ba8492ac1a3b8287d2f470b313bfb9f1d> # nolint | ||
#' <https://github.com/CDCgov/cfa-epinow2-pipeline/pull/17> # nolint | ||
"stochastic_sir_rt" | ||
|
||
#' Generation interval corresponding to the sample `stochastic_sir_rt` dataset | ||
#' | ||
#' Gostic et al., 2020 simulates data from a stochastic SEIR model. Residence | ||
#' time in both the E and the I compartments is exponentially distributed, with | ||
#' a mean of 4 days (or a rate/inverse-scale of 1/4). These residence times | ||
#' imply a gamma-distributed generation time distribution with a shape of 2 and | ||
#' a rate of 1/4. The distribution can be regenerated in | ||
#' `data-raw/sir_gt_pmf.R`. | ||
#' | ||
#' From this parametric specification, we produce a double-censored, | ||
#' left-truncated probability mass function of the generation interval | ||
#' distribution. We produce the PMF using | ||
#' [primarycensoreddist::dpcens()] with version 0.4.0. See | ||
#' https://doi.org/10.1101/2024.01.12.24301247 for more information on | ||
#' double-censoring biases and corrections. | ||
#' | ||
#' @name sir_gt_pmf | ||
#' @format `sir_gt_pmf` A numeric vector of length 26 that sums to one within | ||
#' numerical tolerance | ||
"sir_gt_pmf" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# E and I compartments both with exponentially distributed residence times | ||
# with a mean of 4 days. | ||
shape <- 2 | ||
rate <- 1 / 4 | ||
|
||
sir_gt_pmf <- primarycensoreddist::dpcens(0:26, | ||
pgamma, | ||
shape = shape, | ||
rate = rate, | ||
D = 27 | ||
) # v0.4.0 | ||
|
||
# Drop first element because GI can't have same-day transmission | ||
sir_gt_pmf <- sir_gt_pmf[2:27] | ||
|
||
# Renormalize to a proper PMF | ||
while (abs(sum(sir_gt_pmf) - 1) > 1e-10) { | ||
sir_gt_pmf <- sir_gt_pmf / sum(sir_gt_pmf) | ||
} | ||
|
||
usethis::use_data(sir_gt_pmf, overwrite = TRUE) |
Binary file not shown.
Binary file not shown.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.