Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions to simulate datasets to the seroprevalence_data module.… #83

Closed
wants to merge 116 commits into from

Conversation

ntorresd
Copy link
Member

@ntorresd ntorresd commented Jul 4, 2023

This PR partly closes #57

  • fix: add exception to function plot_foi() to plot a FOI trend with different length along with the data for the case when their sizes don't coincide

  • clean test_visualisation

  • feature: add three functions to simulate datasets. get_sim_counts() generates a list with simulated counts by age following a binomial distribution. generate_sim_data() uses the counts generated by get_sim_counts() to create a dataframe with the necessary structure to use other functions of the package. group_sim_data() serves to group the previously generated dataset by age group; right now it groups the data by periods of 5 years.

  • add test_simulate_data to test the data simulation functions in the seroprevalence_data module

tracelac and others added 30 commits August 8, 2022 18:45
* arreglo carpetas R/stanmodels

* Funciones finales pendientes a modificación

* Delete Funciones eliminadas.docx

* Delete Explicación del código del paquete SEROFOI.docx

* Documentación en inglés y español

* Documentación lista

* modify package structure

* modify package structure

* modify names and folder test

* Creation of general modules and revision of functions

* refactoring visualization functions and define some general pck structure

* update readme file

* update readme file

* general settings

* modify dependecies, document and name functions

Co-authored-by: Zulma M. Cucunubá <[email protected]>
Co-authored-by: megamezl <[email protected]>
Co-authored-by: megamezl <[email protected]>
minor correction to function `get_age_group` documentation
@ntorresd ntorresd changed the base branch from main to dev August 22, 2023 22:54
ntorresd and others added 4 commits August 22, 2023 18:09
* remove serodata .Rdata and .Rd files

* remove R/serodata.R

* doc: update functions  documentation replacing  for  in examples

* doc: update serofoi logo

* doc: update README and vignettes
This commit changes the removed preloaded dataset `serodata` for the identical `chagas2012`.

* fix: minor correction to test_modelling
remove function `group_sim_data` from export and the corresponding
example
Remove unnecessary parameters from `get_age_group`.
Now the function takes an age vector as input rather than a dataframe
containing an specific age column.
@ntorresd ntorresd mentioned this pull request Aug 23, 2023
jpavlich and others added 5 commits August 24, 2023 10:09
 Simplify fit_seromodel() output and related refactorizations
Remove data writing from `test_simulate_data.R` as well as the corresponding
.csv files.
Remove unused data paths from `test_simulate_data`
Add tests for `get_sim_probability` and `get_sim_n_seropositive`
Modify test for `generate_sim_data` and `group_sim_data` functions.
Remove unnecessary model running.
Remove `expect_doppleganger` tests.
Copy link
Collaborator

@ben18785 ben18785 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ntorresd -- looking better. We still need much more extensive tests of the functions though as these functions are really key to lots of other functionality (including how we test that the Stan models are themselves working):

  • The get_sim_probability function needs value-based-testing. I.e. do the probabilities it outputs match theoretically derived ones?
  • Same for get_sim_data.
  • There are a variety of internal functions that also need checking, which I think are currently lacking unit tests.

R/seroprevalence_data.R Outdated Show resolved Hide resolved
step = 5) {
age <- sim_data[[col_age]]
sim_data$age_group <- get_age_group(age = age, step = step)
sim_data_grouped <- sim_data %>% group_by(age_group) %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing but can we move the group_by to the next line for consistency when using the pipe?

tests/testthat/test_get_sim.R Outdated Show resolved Hide resolved
Comment on lines 22 to 27
sim_data <- generate_sim_data(foi = foi_sim,
sample_size_by_age = sample_size_by_age,
tsur = 2050,
birth_year_min = 2000,
survey_label = 'foi_sim',
seed = seed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this function is stochastic but we can still test it works by varying the FOIs and checking that in limits it behaves as we'd like it to.

Copy link
Member Author

@ntorresd ntorresd Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I'm relaying on model implementation to test this functionality. My idea is to make use of suitable models for different FOI trends. It is to be expected for a constant foi_simto be correctly approximated by the "constant" model for all times, so what I do is to make sure that this lies in the confidence interval obtained by implementing this model:

# Define constant FoI for simulations
case_label <- "constant_foi_"
foi_model <- "constant"
foi_sim <- rep(0.02, tsur - birth_year_min)
max_lambda <- 0.035

# Generate simulated data and run the "constant" model
sim_data <- generate_sim_data(foi = foi_sim,
                              sample_size_by_age = sample_size_by_age,
                              tsur = tsur,
                              birth_year_min = birth_year_min,
                              survey_label = 'foi_sim',
                              seed = seed)
sim_seromodel <- run_seromodel(sim_data, foi_model = foi_model, n_iters = n_iters)

# Check consistency between sim_foi and the fitted foi
foi <- rstan::extract(sim_seromodel$seromodel_fit, "foi", inc_warmup = FALSE)[[1]]
foi_lower <- apply(foi, 2, function(x) quantile(x, 0.05))
foi_upper <- apply(foi, 2, function(x) quantile(x, 0.95))
expect_true(all((foi_sim >= foi_lower) & (foi_sim <= foi_upper)))

plot_foi_const_test

I implemented this for a constant FOI and for the following smooth-decreasing FOI (red lines in the images):

case_label <- "smth_dec_foi_" #Smooth-decendent FoI
foi_model = "tv_normal"
foi_max = 0.2
stretch = 0.15
x <- 1:(tsur - birth_year_min)
foi_sim <- (-foi_max * (atan(stretch * (x - 25))) / (0.5 * pi) + foi_max) / 2

plot_foi_smth_desc_test

I think it's worth to open a separate issue to discuss this in more detail, in case that you're not convinced by my approach.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to test whether data we obtain from simulating from these models is as expected without resorting to solving the inverse problem (as this leaves us liable to a number of issues, e.g. the FOIs aren't identifiable given the data). We have analytical results for all of the models which should allow us to test these directly. E.g. for a constant FOI model then we will know approximately the probability that someone aged x is seropositive; if we use a large sample size, we can check that the simulated proportion is near to that value.

@ntorresd ntorresd requested review from ben18785 and removed request for ben18785 September 11, 2023 17:11
jpavlich and others added 6 commits September 15, 2023 03:52
* Enabling CMD check when doing pull request on `dev`

* Removed tidyverse dependency
Add `get_sim_probability` to export to enable testing.
Test the values of the probabilities.
The idea of the test is to make sure that the foi trend used to simulate
the data is in the confidence interval of a suitable model.

Add possibility to test for:
- constant FoI
- smooth-descendent FoI

Minor clean up of `plot_foi`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simulate from time-varying FOI serocatalytic models
9 participants