-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functions to simulate datasets to the seroprevalence_data module.… #83
Conversation
creating base line of package
…nto dev-feature-strmodel
…rofoi into dev-feature-strmodel-miguel
* arreglo carpetas R/stanmodels * Funciones finales pendientes a modificación * Delete Funciones eliminadas.docx * Delete Explicación del código del paquete SEROFOI.docx * Documentación en inglés y español * Documentación lista * modify package structure * modify package structure * modify names and folder test * Creation of general modules and revision of functions * refactoring visualization functions and define some general pck structure * update readme file * update readme file * general settings * modify dependecies, document and name functions Co-authored-by: Zulma M. Cucunubá <[email protected]> Co-authored-by: megamezl <[email protected]> Co-authored-by: megamezl <[email protected]>
…ction separation by model.
minor correction to function `get_age_group` documentation
* remove serodata .Rdata and .Rd files * remove R/serodata.R * doc: update functions documentation replacing for in examples * doc: update serofoi logo * doc: update README and vignettes This commit changes the removed preloaded dataset `serodata` for the identical `chagas2012`. * fix: minor correction to test_modelling
remove function `group_sim_data` from export and the corresponding example
Remove unnecessary parameters from `get_age_group`. Now the function takes an age vector as input rather than a dataframe containing an specific age column.
Simplify fit_seromodel() output and related refactorizations
Remove data writing from `test_simulate_data.R` as well as the corresponding .csv files.
Remove unused data paths from `test_simulate_data`
Add tests for `get_sim_probability` and `get_sim_n_seropositive`
Modify test for `generate_sim_data` and `group_sim_data` functions. Remove unnecessary model running. Remove `expect_doppleganger` tests.
6eff129
to
8d0ebba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ntorresd -- looking better. We still need much more extensive tests of the functions though as these functions are really key to lots of other functionality (including how we test that the Stan models are themselves working):
- The
get_sim_probability
function needs value-based-testing. I.e. do the probabilities it outputs match theoretically derived ones? - Same for
get_sim_data
. - There are a variety of internal functions that also need checking, which I think are currently lacking unit tests.
R/seroprevalence_data.R
Outdated
step = 5) { | ||
age <- sim_data[[col_age]] | ||
sim_data$age_group <- get_age_group(age = age, step = step) | ||
sim_data_grouped <- sim_data %>% group_by(age_group) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing but can we move the group_by
to the next line for consistency when using the pipe?
tests/testthat/test_sim_data.R
Outdated
sim_data <- generate_sim_data(foi = foi_sim, | ||
sample_size_by_age = sample_size_by_age, | ||
tsur = 2050, | ||
birth_year_min = 2000, | ||
survey_label = 'foi_sim', | ||
seed = seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this function is stochastic but we can still test it works by varying the FOIs and checking that in limits it behaves as we'd like it to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I'm relaying on model implementation to test this functionality. My idea is to make use of suitable models for different FOI trends. It is to be expected for a constant foi_sim
to be correctly approximated by the "constant"
model for all times, so what I do is to make sure that this lies in the confidence interval obtained by implementing this model:
# Define constant FoI for simulations
case_label <- "constant_foi_"
foi_model <- "constant"
foi_sim <- rep(0.02, tsur - birth_year_min)
max_lambda <- 0.035
# Generate simulated data and run the "constant" model
sim_data <- generate_sim_data(foi = foi_sim,
sample_size_by_age = sample_size_by_age,
tsur = tsur,
birth_year_min = birth_year_min,
survey_label = 'foi_sim',
seed = seed)
sim_seromodel <- run_seromodel(sim_data, foi_model = foi_model, n_iters = n_iters)
# Check consistency between sim_foi and the fitted foi
foi <- rstan::extract(sim_seromodel$seromodel_fit, "foi", inc_warmup = FALSE)[[1]]
foi_lower <- apply(foi, 2, function(x) quantile(x, 0.05))
foi_upper <- apply(foi, 2, function(x) quantile(x, 0.95))
expect_true(all((foi_sim >= foi_lower) & (foi_sim <= foi_upper)))
I implemented this for a constant FOI and for the following smooth-decreasing FOI (red lines in the images):
case_label <- "smth_dec_foi_" #Smooth-decendent FoI
foi_model = "tv_normal"
foi_max = 0.2
stretch = 0.15
x <- 1:(tsur - birth_year_min)
foi_sim <- (-foi_max * (atan(stretch * (x - 25))) / (0.5 * pi) + foi_max) / 2
I think it's worth to open a separate issue to discuss this in more detail, in case that you're not convinced by my approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to test whether data we obtain from simulating from these models is as expected without resorting to solving the inverse problem (as this leaves us liable to a number of issues, e.g. the FOIs aren't identifiable given the data). We have analytical results for all of the models which should allow us to test these directly. E.g. for a constant FOI model then we will know approximately the probability that someone aged x is seropositive; if we use a large sample size, we can check that the simulated proportion is near to that value.
* Enabling CMD check when doing pull request on `dev` * Removed tidyverse dependency
Add `get_sim_probability` to export to enable testing. Test the values of the probabilities.
The idea of the test is to make sure that the foi trend used to simulate the data is in the confidence interval of a suitable model. Add possibility to test for: - constant FoI - smooth-descendent FoI Minor clean up of `plot_foi`.
This PR partly closes #57
fix: add exception to function plot_foi() to plot a FOI trend with different length along with the data for the case when their sizes don't coincide
clean test_visualisation
feature: add three functions to simulate datasets. get_sim_counts() generates a list with simulated counts by age following a binomial distribution. generate_sim_data() uses the counts generated by get_sim_counts() to create a dataframe with the necessary structure to use other functions of the package. group_sim_data() serves to group the previously generated dataset by age group; right now it groups the data by periods of 5 years.
add test_simulate_data to test the data simulation functions in the seroprevalence_data module