Skip to content

Commit

Permalink
Add simulated data from Gostic, 2020 for benchmarking (#37)
Browse files Browse the repository at this point in the history
* Add simulated data from Gostic, 2020 for benchmarking

This commit re-uses the data processing and documentation from CDCgov/cfa-epinow2-pipeline#17. That repo is open source and public domain by nature of being USG property. I think it's convenient to re-use, but @athowes and @kgostic there's a little text in there that you both suggested, so I'd appreciate if you could give your permission for the re-use here! I've made sure to credit you both in the commit as it's partially your writing.

It's probably best practice to make the data prep and processing here fully reproducible in `data-raw/` but it's quite a pain to do, with a mixture of shells cripting and R scripting needed. I skipped it out of convenience, but if you have a really strong feeling that it's required @seabbs, let me know and we can revisit.

Closes #9

Co-authored-by: Adam Howes <[email protected]>
Co-authored-by: Katie Gostic <[email protected]>

* Bump NEWS

* Coerce new `obs_incidence` col to integer from double

* Generate GI PMF in `data-raw/`

* Point to specific CFAEpiNow2Pipeline commit

* Remove copied & pasted unrelated text

* Re-render roxygen docs

* Add primarycensoreddist to remotes

* Typo

* Rename synthetic dataset and GI dist

* Pin primarycensorreddist to R-universe not github

---------

Co-authored-by: Adam Howes <[email protected]>
Co-authored-by: Katie Gostic <[email protected]>
  • Loading branch information
3 people authored Sep 12, 2024
1 parent f80eefe commit bd6d033
Show file tree
Hide file tree
Showing 13 changed files with 205 additions and 3 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@
^\.pre-commit-config\.yaml$
^_pkgdown\.yml$
^codecov\.yml$
^data-raw$
^docs$
^pkgdown$
1 change: 1 addition & 0 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ jobs:
r-version: ${{ matrix.config.r }}
http-user-agent: ${{ matrix.config.http-user-agent }}
use-public-rspm: true
extra-repositories: 'https://epinowcast.r-universe.dev'

- uses: r-lib/actions/setup-r-dependencies@v2
with:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ jobs:
- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true
extra-repositories: 'https://epinowcast.r-universe.dev'

- uses: r-lib/actions/setup-r-dependencies@v2
with:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ jobs:
- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true
extra-repositories: 'https://epinowcast.r-universe.dev'

- uses: r-lib/actions/setup-r-dependencies@v2
with:
Expand Down
1 change: 0 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ repos:
- id: style-files
args: [--style_pkg=styler, --style_fun=tidyverse_style,
--cache-root=styler-perm]
- id: roxygenize
- id: use-tidy-description
- id: lintr
- id: readme-rmd-rendered
Expand Down
11 changes: 9 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,20 @@ BugReports: https://github.com/cdcgov/cfa-gam-rt/issues
Suggests:
testthat (>= 3.0.0),
pkgdown,
withr
withr,
usethis,
primarycensoreddist
Config/testthat/edition: 3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
Imports:
cli,
glue,
mgcv,
rlang
Depends:
R (>= 2.10)
LazyData: true
Additional_repositories:
https://epinowcast.r-universe.dev
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# RtGam (development version)

* Add data from Gostic, 2020 for testing and benchmarking (#37)
* Fix math rendering in pkgdown (#36)
* CI status in README badges (#27)
* Update pre-commit hooks from template repo to newest version (#25)
Expand Down
76 changes: 76 additions & 0 deletions R/data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#' Synthetic dataset of stochastic SIR system with known Rt
#'
#' A dataset from Gostic, Katelyn M., et al. "Practical considerations for
#' measuring the effective reproductive number, Rt." PLoS Computational Biology
#' 16.12 (2020): e1008409. The data are simulated from a stochastic SEIR
#' compartmental model.
#'
#' This synthetic dataset has a number of desirable properties:
#'
#' 1. The force of infection changes depending on the Rt, allowing for sudden
#' changes in the Rt. This allows for modeling of sudden changes in infection
#' dynamics, which might otherwise be difficult to capture. Rt estimation
#' framework
#'
#' 2. The realized Rt is known at each timepoint
#'
#' 3. The dataset incorporates a simple generation interval and a reporting
#' delay.
#'
#' Gostic et al. benchmark the performance of a number of Rt estimation
#' frameworks, providing practical guidance on how to use this dataset to
#' evaluate Rt estimates.
#'
#' In practice, we've found that the amount of observation noise in the
#' incidence and/or observed cases is often undesirably low for testing. Many
#' empirical datasets are much noisier. As a result, models built with these
#' settings in mind can perform poorly on this dataset or fail to converge. We
#' manually add observation noise with `rnbinom(299, mu =
#' stochastic_sir_rt[["obs_cases"]], size = 10)` and the random seed 123456 and
#' store it in the `obs_incidence` column.
#'
#' @name stochastic_sir_rt
#' @format `stochastic_sir_rt` A data frame with 301 rows and 12 columns:
#' \describe{
#' \item{time}{Timestep of the discrete-time stochastic SEIR simulation}
#' \item{date}{Added from the original Gostic, 2020 dataset. A date
#' corresponding to the assigned `time`. Arbitrarily starts on January 1st,
#' 2023.}
#' \item{S, E, I, R}{The realized state of the stochastic SEIR system}
#' \item{dS, dEI, DIR}{The stochastic transition between compartments}
#' \item{incidence}{The true incidence in the `I` compartment at time t}
#' \item{obs_cases}{The observed number of cases at time t from
#' forward-convolved incidence.}
#' \item{obs_incidence}{Added from the original Gostic, 2020 dataset. The
#' `incidence` column with added negative-binomial observation noise.
#' Created with `set.seed(123456)` and the call
#' `rnbinom(299, mu = [["incidence"]], size = 10)` Useful for
#' testing.}
#' \item{true_r0}{The initial R0 of the system (i.e., 2)}
#' \item{true_rt}{The known, true Rt of the epidemic system}
#' }
#' @source
#' <https://github.com/cobeylab/Rt_estimation/tree/d9d8977ba8492ac1a3b8287d2f470b313bfb9f1d> # nolint
#' <https://github.com/CDCgov/cfa-epinow2-pipeline/pull/17> # nolint
"stochastic_sir_rt"

#' Generation interval corresponding to the sample `stochastic_sir_rt` dataset
#'
#' Gostic et al., 2020 simulates data from a stochastic SEIR model. Residence
#' time in both the E and the I compartments is exponentially distributed, with
#' a mean of 4 days (or a rate/inverse-scale of 1/4). These residence times
#' imply a gamma-distributed generation time distribution with a shape of 2 and
#' a rate of 1/4. The distribution can be regenerated in
#' `data-raw/sir_gt_pmf.R`.
#'
#' From this parametric specification, we produce a double-censored,
#' left-truncated probability mass function of the generation interval
#' distribution. We produce the PMF using
#' [primarycensoreddist::dpcens()] with version 0.4.0. See
#' https://doi.org/10.1101/2024.01.12.24301247 for more information on
#' double-censoring biases and corrections.
#'
#' @name sir_gt_pmf
#' @format `sir_gt_pmf` A numeric vector of length 26 that sums to one within
#' numerical tolerance
"sir_gt_pmf"
21 changes: 21 additions & 0 deletions data-raw/sir_gt_pmf.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# E and I compartments both with exponentially distributed residence times
# with a mean of 4 days.
shape <- 2
rate <- 1 / 4

sir_gt_pmf <- primarycensoreddist::dpcens(0:26,
pgamma,
shape = shape,
rate = rate,
D = 27
) # v0.4.0

# Drop first element because GI can't have same-day transmission
sir_gt_pmf <- sir_gt_pmf[2:27]

# Renormalize to a proper PMF
while (abs(sum(sir_gt_pmf) - 1) > 1e-10) {
sir_gt_pmf <- sir_gt_pmf / sum(sir_gt_pmf)
}

usethis::use_data(sir_gt_pmf, overwrite = TRUE)
Binary file added data/sir_gt_pmf.rda
Binary file not shown.
Binary file added data/stochastic_sir_rt.rda
Binary file not shown.
30 changes: 30 additions & 0 deletions man/sir_gt_pmf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 64 additions & 0 deletions man/stochastic_sir_rt.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit bd6d033

Please sign in to comment.