From efe2645a622d22eb165da5ed128bfcbb9bf5cc7f Mon Sep 17 00:00:00 2001 From: jsocolar Date: Wed, 18 Oct 2023 16:17:34 -0400 Subject: [PATCH 1/4] renaming stuff --- inst/cache_vignettes.R | 22 +++++++++++++++++++ ...dels.Rmd => augmented_models.Rmd.orig.Rmd} | 0 ...rial.Rmd => flocker_tutorial.Rmd.orig.Rmd} | 0 ...dels.Rmd => nonlinear_models.Rmd.orig.Rmd} | 0 4 files changed, 22 insertions(+) create mode 100644 inst/cache_vignettes.R rename vignettes/{augmented_models.Rmd => augmented_models.Rmd.orig.Rmd} (100%) rename vignettes/{flocker_tutorial.Rmd => flocker_tutorial.Rmd.orig.Rmd} (100%) rename vignettes/{nonlinear_models.Rmd => nonlinear_models.Rmd.orig.Rmd} (100%) diff --git a/inst/cache_vignettes.R b/inst/cache_vignettes.R new file mode 100644 index 0000000..b316a38 --- /dev/null +++ b/inst/cache_vignettes.R @@ -0,0 +1,22 @@ +# This script creates .Rmd vignettes that include pre-computed R output so that +# whent they get built on CI it's a lightweight operation. + +# This function should be run every time there is an update to the vignettes +# or an update to the package for which the vignettes are a desirable +# part of the test harness (but ideally the actual test harness should be +# sufficient for testing, and we shouldn't rely on vignettes). + +setwd(paste0(dirname(rstudioapi::getActiveDocumentContext()$path), "/..")) + +knitr::knit( + "vignettes/augmented_models.Rmd.orig", + output = "vignettes/augmented_models.Rmd" + ) +knitr::knit( + "vignettes/flocker_tutorial.Rmd.orig", + output = "vignettes/flocker_tutorial.Rmd" +) +knitr::knit( + "vignettes/nonlinear_models.Rmd.orig", + output = "vignettes/nonlinear_models.Rmd" +) \ No newline at end of file diff --git a/vignettes/augmented_models.Rmd b/vignettes/augmented_models.Rmd.orig.Rmd similarity index 100% rename from vignettes/augmented_models.Rmd rename to vignettes/augmented_models.Rmd.orig.Rmd diff --git a/vignettes/flocker_tutorial.Rmd b/vignettes/flocker_tutorial.Rmd.orig.Rmd similarity index 100% rename from vignettes/flocker_tutorial.Rmd rename to vignettes/flocker_tutorial.Rmd.orig.Rmd diff --git a/vignettes/nonlinear_models.Rmd b/vignettes/nonlinear_models.Rmd.orig.Rmd similarity index 100% rename from vignettes/nonlinear_models.Rmd rename to vignettes/nonlinear_models.Rmd.orig.Rmd From c53c89cbf0f645b84f75976e5932296ab2737e63 Mon Sep 17 00:00:00 2001 From: jsocolar Date: Wed, 18 Oct 2023 16:19:29 -0400 Subject: [PATCH 2/4] correct file naming --- .../{augmented_models.Rmd.orig.Rmd => augmented_models.Rmd.orig} | 0 .../{flocker_tutorial.Rmd.orig.Rmd => flocker_tutorial.Rmd.orig} | 0 .../{nonlinear_models.Rmd.orig.Rmd => nonlinear_models.Rmd.orig} | 0 3 files changed, 0 insertions(+), 0 deletions(-) rename vignettes/{augmented_models.Rmd.orig.Rmd => augmented_models.Rmd.orig} (100%) rename vignettes/{flocker_tutorial.Rmd.orig.Rmd => flocker_tutorial.Rmd.orig} (100%) rename vignettes/{nonlinear_models.Rmd.orig.Rmd => nonlinear_models.Rmd.orig} (100%) diff --git a/vignettes/augmented_models.Rmd.orig.Rmd b/vignettes/augmented_models.Rmd.orig similarity index 100% rename from vignettes/augmented_models.Rmd.orig.Rmd rename to vignettes/augmented_models.Rmd.orig diff --git a/vignettes/flocker_tutorial.Rmd.orig.Rmd b/vignettes/flocker_tutorial.Rmd.orig similarity index 100% rename from vignettes/flocker_tutorial.Rmd.orig.Rmd rename to vignettes/flocker_tutorial.Rmd.orig diff --git a/vignettes/nonlinear_models.Rmd.orig.Rmd b/vignettes/nonlinear_models.Rmd.orig similarity index 100% rename from vignettes/nonlinear_models.Rmd.orig.Rmd rename to vignettes/nonlinear_models.Rmd.orig From 716dd58cc1618bc142912ebe8c82afc94b6f741e Mon Sep 17 00:00:00 2001 From: jsocolar Date: Wed, 18 Oct 2023 17:17:02 -0400 Subject: [PATCH 3/4] pre-computed vignettes --- inst/cache_vignettes.R | 2 +- vignettes/augmented_models.Rmd | 79 ++++ vignettes/flocker_tutorial.Rmd | 797 +++++++++++++++++++++++++++++++++ vignettes/nonlinear_models.Rmd | 265 +++++++++++ 4 files changed, 1142 insertions(+), 1 deletion(-) create mode 100644 vignettes/augmented_models.Rmd create mode 100644 vignettes/flocker_tutorial.Rmd create mode 100644 vignettes/nonlinear_models.Rmd diff --git a/inst/cache_vignettes.R b/inst/cache_vignettes.R index b316a38..70ef67b 100644 --- a/inst/cache_vignettes.R +++ b/inst/cache_vignettes.R @@ -19,4 +19,4 @@ knitr::knit( knitr::knit( "vignettes/nonlinear_models.Rmd.orig", output = "vignettes/nonlinear_models.Rmd" -) \ No newline at end of file +) diff --git a/vignettes/augmented_models.Rmd b/vignettes/augmented_models.Rmd new file mode 100644 index 0000000..1e34158 --- /dev/null +++ b/vignettes/augmented_models.Rmd @@ -0,0 +1,79 @@ +--- +title: "Data-augmented models in flocker" +author: "Jacob Socolar" +date: "2023-10-18" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Data-augmented models in flocker} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + + + + +When modeling multiple ecologically comparable species simultaneously, occupancy models often assume conditional exchangeability across species and fit species-specific terms as random effects. Data-augmented multi-species models leverage this exchangeability assumption to estimate the number and prevalence of never-detected species within the study region. To do so, the dataset is augmented with a large number of pseudospecies with all-zero detection histories, each species (both the observed species and the augmented pseudospecies) is ascribed a common parameter $\omega$ giving the Bernoulli probability that a given (pseudo)species occurs in the study area, and random-effects exchangeability assumptions are assumed to hold for all species-specific modeled terms (i.e. intercepts and slopes for occupancy and detection). + +For a data-augmented multi-species model, we marginalize over the occupancy status of a closure-unit as for a single-season model yielding the unit-wise likelihood $\mathcal{L}_i$, and we additionally marginalize over the availability of each (pseudo)species, yielding the species-wise likelihood +\[ \mathcal{N}_s = + \begin{cases} + B(1 | \omega)\prod\limits_{i \textrm{ in } I_s}{\mathcal{L}_i} & \text{if $r_s = 1$} \\ + B(0 | \omega) + B(1 | \omega)\prod\limits_{i \textrm{ in } I_s}{\mathcal{L}_i}& \text{if $r_s = 0$} + \end{cases} +\] +where $s$ indexes the species, $\omega$ is the fitted availability probability of a species, $I_s$ is the set of all closure-unit indices $i$ pertaining to species $s$, $r_s$ is an indicator that takes the value $1$ if there is at least one positive detection of the (pseudo)species in the entire dataset and $0$ otherwise. + +Fitting the data-augmented model in \mintinline{r}{flocker} requires passing the observed data as a three-dimensional array with sites along the first dimension, visits along the second, and species along the third. Additionally, we must supply the `n_aug` argument to `make_flocker_data()`, specifying how many all-zero pseudospecies to augment the data with. + + +``` +#> Formatting data for a single-season multispecies occupancy model with data augmentation for never-observed species. For details, see make_flocker_data_augmented. All warnings and error messages should be interpreted in the context of make_flocker_data_augmented +#> formatting rep indices +#> | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |=== | 3% | |=== | 4% | |==== | 4% | |==== | 5% | |===== | 5% | |===== | 6% | |====== | 6% | |====== | 7% | |======= | 7% | |======= | 8% | |======== | 8% | |======== | 9% | |========= | 9% | |========= | 10% | |========== | 10% | |========== | 11% | |========== | 12% | |=========== | 12% | |=========== | 13% | |============ | 13% | |============ | 14% | |============= | 14% | |============= | 15% | |============== | 15% | |============== | 16% | |=============== | 16% | |=============== | 17% | |================ | 17% | |================ | 18% | |================= | 18% | |================= | 19% | |================== | 19% | |================== | 20% | |=================== | 20% | |=================== | 21% | |==================== | 21% | |==================== | 22% | |==================== | 23% | |===================== | 23% | |===================== | 24% | |====================== | 24% | |====================== | 25% | |======================= | 25% | |======================= | 26% | |======================== | 26% | |======================== | 27% | |========================= | 27% | |========================= | 28% | |========================== | 28% | |========================== | 29% | |=========================== | 29% | |=========================== | 30% | |============================ | 30% | |============================ | 31% | |============================= | 31% | |============================= | 32% | |============================== | 32% | |============================== | 33% | |============================== | 34% | |=============================== | 34% | |=============================== | 35% | |================================ | 35% | |================================ | 36% | |================================= | 36% | |================================= | 37% | |================================== | 37% | |================================== | 38% | |=================================== | 38% | |=================================== | 39% | |==================================== | 39% | |==================================== | 40% | |===================================== | 40% | |===================================== | 41% | |====================================== | 41% | |====================================== | 42% | |======================================= | 42% | |======================================= | 43% | |======================================== | 43% | |======================================== | 44% | |========================================= | 45% | |========================================= | 46% | |========================================== | 46% | |========================================== | 47% | |=========================================== | 47% | |=========================================== | 48% | |============================================ | 48% | |============================================ | 49% | |============================================= | 49% | |============================================= | 50% | |============================================== | 50% | |============================================== | 51% | |=============================================== | 51% | |=============================================== | 52% | |================================================ | 52% | |================================================ | 53% | |================================================= | 53% | |================================================= | 54% | |================================================== | 54% | |================================================== | 55% | |=================================================== | 56% | |=================================================== | 57% | |==================================================== | 57% | |==================================================== | 58% | |===================================================== | 58% | |===================================================== | 59% | |====================================================== | 59% | |====================================================== | 60% | |======================================================= | 60% | |======================================================= | 61% | |======================================================== | 61% | |======================================================== | 62% | |========================================================= | 62% | |========================================================= | 63% | |========================================================== | 63% | |========================================================== | 64% | |=========================================================== | 64% | |=========================================================== | 65% | |============================================================ | 65% | |============================================================ | 66% | |============================================================= | 66% | |============================================================= | 67% | |============================================================= | 68% | |============================================================== | 68% | |============================================================== | 69% | |=============================================================== | 69% | |=============================================================== | 70% | |================================================================ | 70% | |================================================================ | 71% | |================================================================= | 71% | |================================================================= | 72% | |================================================================== | 72% | |================================================================== | 73% | |=================================================================== | 73% | |=================================================================== | 74% | |==================================================================== | 74% | |==================================================================== | 75% | |===================================================================== | 75% | |===================================================================== | 76% | |====================================================================== | 76% | |====================================================================== | 77% | |======================================================================= | 77% | |======================================================================= | 78% | |======================================================================= | 79% | |======================================================================== | 79% | |======================================================================== | 80% | |========================================================================= | 80% | |========================================================================= | 81% | |========================================================================== | 81% | |========================================================================== | 82% | |=========================================================================== | 82% | |=========================================================================== | 83% | |============================================================================ | 83% | |============================================================================ | 84% | |============================================================================= | 84% | |============================================================================= | 85% | |============================================================================== | 85% | |============================================================================== | 86% | |=============================================================================== | 86% | |=============================================================================== | 87% | |================================================================================ | 87% | |================================================================================ | 88% | |================================================================================= | 88% | |================================================================================= | 89% | |================================================================================= | 90% | |================================================================================== | 90% | |================================================================================== | 91% | |=================================================================================== | 91% | |=================================================================================== | 92% | |==================================================================================== | 92% | |==================================================================================== | 93% | |===================================================================================== | 93% | |===================================================================================== | 94% | |====================================================================================== | 94% | |====================================================================================== | 95% | |======================================================================================= | 95% | |======================================================================================= | 96% | |======================================================================================== | 96% | |======================================================================================== | 97% | |========================================================================================= | 97% | |========================================================================================= | 98% | |========================================================================================== | 98% | |========================================================================================== | 99% | |===========================================================================================| 99% | |===========================================================================================| 100% +#> Compiling Stan program... +#> Start sampling +#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. +#> Running the chains for more iterations may help. See +#> https://mc-stan.org/misc/warnings.html#bulk-ess +``` + +Here, the random effect of species is specified using the special grouping keyword `ff_species` (names beginning with `ff_` are reserved in `flocker` and are not allowed as names for user-supplied covariates). + + +```r +summary(fm) +#> Family: occupancy_augmented +#> Links: mu = identity; occ = identity; Omega = identity +#> Formula: ff_y | vint(ff_n_unit, ff_n_rep, ff_Q, ff_n_sp, ff_superQ, ff_species, ff_rep_index1, ff_rep_index2, ff_rep_index3, ff_rep_index4) ~ uc1 + ec1 + (1 + uc1 + ec1 | ff_species) +#> occ ~ (1 | ff_species) +#> Omega ~ 1 +#> Data: data (Number of observations: 25600) +#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; +#> total post-warmup draws = 4000 +#> +#> Group-Level Effects: +#> ~ff_species (Number of levels: 128) +#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS +#> sd(Intercept) 1.60 0.27 1.15 2.21 1.00 813 1716 +#> sd(uc1) 0.26 0.11 0.07 0.50 1.00 1245 1708 +#> sd(ec1) 0.39 0.09 0.23 0.59 1.00 1976 2720 +#> sd(occ_Intercept) 1.57 0.31 1.08 2.29 1.01 785 1751 +#> cor(Intercept,uc1) 0.75 0.21 0.18 0.97 1.00 2064 2635 +#> cor(Intercept,ec1) 0.57 0.20 0.10 0.86 1.00 2356 2740 +#> cor(uc1,ec1) 0.60 0.25 0.00 0.95 1.00 904 1860 +#> +#> Population-Level Effects: +#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS +#> Intercept 0.49 0.33 -0.16 1.15 1.01 377 783 +#> occ_Intercept 0.26 0.33 -0.40 0.90 1.01 495 951 +#> Omega_Intercept -1.27 0.22 -1.71 -0.87 1.00 5876 3077 +#> uc1 0.01 0.07 -0.14 0.15 1.00 906 1634 +#> ec1 0.07 0.09 -0.11 0.26 1.00 1114 1747 +#> +#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS +#> and Tail_ESS are effective sample size measures, and Rhat is the potential +#> scale reduction factor on split chains (at convergence, Rhat = 1). +``` + +`flocker` enables users to fit data-augmented models using arbitrary `brms` formulas for the occupancy and detection components. However, we caution that continuous covariates in the occupancy sub-model can lead to pitfalls in interpretation. A seemingly straightforward application, for example, might be to ask how many species are present along an elevational gradient, fitting species-specific quadratic elevation-occupancy relationships. In our experience, a data-augmented model that includes quadratic elevation terms will place an arbitrarily large number of pseudo-species along the gradient, but will do so by placing pseudospecies' estimated elevational ranges entirely outside the range covered by the sampling effort ([Socolar et al 2022](https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.9328)). The model is in effect trying to estimate how many species occur in a landscape with elevations ranging from negative to positive infinity. This extrapolation is unprincipled, most obviously because it does not account for hard limits imposed by the physical termina of the gradient (valley floor and mountain peak). Thus, although `flocker` provides functionality to fit continuous covariates in the occupancy term, we recommend extreme caution in interpreting patterns estimated for never-observed species. diff --git a/vignettes/flocker_tutorial.Rmd b/vignettes/flocker_tutorial.Rmd new file mode 100644 index 0000000..10ebe7a --- /dev/null +++ b/vignettes/flocker_tutorial.Rmd @@ -0,0 +1,797 @@ +--- +title: "Fitting occupancy models with flocker" +author: Jacob Socolar & Simon Mills +date: "2023-10-18" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Fitting occupancy models with flocker} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + + + + +`flocker` is an R package for fitting [occupancy models](https://jsocolar.github.io/closureOccupancy/) that incorportate +sophisticated effects structures using simple formula-based syntax. `flocker` is +built on R package `brms`, which in turn is a front-end for `Stan`. + +This vignette is intended as a companion to Socolar & Mills 2023 (INSERT LINK), +where we provide details of the models and post-processing functionality +available in `flocker` in greater detail. Here, we provide illustrative R code +for several types of model, demonstrating data simulation, model fitting, +and model post-processing. We also showcase the `brms` syntax +that `flocker` can use to fit a variety of sophisticated effect +structures. + +## Terms and definitions +Socolar & Mills (2023) introduce several terms that figure importantly in this +vignette, including: + +* **closure-unit**: The groupings of observations over which + [closure](https://jsocolar.github.io/closureOccupancy/) is assumed. In + single-species models, a closure-unit corresponds to a "site" or "point". In + multi-species models, a closure-unit is a species-site combination. In dynamic + (multi-season) models, a closure-unit is a site-season combination (or + species-site-season in a multi-species dynamic model). + +* **rep-constant**, **rep-varying**: We refer to models that assume constant + detection probabilities across repeat visits within closure-units as + *rep-constant models*, as contrasted with *rep-varying models* that incorporate + event-specific detection covariates. It turns out that rep-constant models + enable a more efficient parametrization of the likelihood than rep-varying models. + +* **unit covariates**, **event covariates**: We refer to any covariate that does + not vary across sampling events within closure-units as a "unit covariate". + This includes covariates that are intrinsically properties of single + closure-units (e.g. the elevations of sites in a single-species model), + covariates that are intrinsically properties of groups of closure units (e.g. + elevations of sites in a multi-species model), and covariates that are + intrinsically properties of sampling events but happen to be constant within + all closure-units (e.g. observer in a sampling design where every site is + visited by exactly one observer). We refer to any covariate that varies across + sampling events within covariates as an "event covariate". Note that while unit + covariates may appear in either the occupancy or the detection formula, event + covariates are restricted to the detection formula. Models that incorporate + event covariates are *rep-varying* (see above); those that do not are + *rep-constant*. + +## Installation and feedback +[Installation instructions are available here](https://jsocolar.github.io/flocker). +To request features or report bugs (much appreciated!), please [open an issue on GitHub](https://github.com/jsocolar/flocker/issues). + +To make `flocker` and `brms` functions globally available within an R session +run: + + +## Data simulation +General purpose data simulation is provided via `simulate_flocker_data()`, which +by default will simulate a dataset with 30 species sampled at 50 sites using +four replicate surveys (i.e. a single-season multi-species dataset). Non-default +arguments will simulate example data for other likelihoods, including +multi-season and data-augmented occupancy models. + + +```r +d <- simulate_flocker_data() +``` + +The simulated data `d` are in list form, with elements for the +detection/non-detection observations `d$obs`, unit covariates +`d$unit_covs`, and event covariates `d$event_covs`. +`d$obs` is a matrix where rows are species-site combinations, +columns are replicate visits, and entries are `1` (detection), `0` +(nondetection), or `NA` (no visit). `d$unit_covs` is a dataframe +containing covariates that vary across the rows of obs (i.e. by closure-unit) +but not across the columns within any given row (i.e. do not vary across +replicate visits). `event_covs` is a named list of matrices, with each matrix +having the same dimensions as the observation matrix. Each list element +corresponds to a covariate that varies across the columns of `d$obs` (i.e. +varies between replicate visits). + +## Data formatting +`flock()`, the main function in `flocker` for fitting occupancy models, +expects a highly specific data format. The function `make_flocker_data()` +formats data for use with `flock()` automatically. For single-season models, +`make_flocker_data()` takes as input a matrix or dataframe of +detection/non-detection data. Rows represent closure-units, columns represent +repeated sampling events within closure-units, and entries must be `0` +(nondetection), `1` (detection), or `NA` (no corresponding sampling event). The +data must be formatted so that all `NA`s are trailing within their rows. For +example, if some units were sampled four times and other three times, the three +sampling events must be treated as events 1, 2, and 3 (with the fourth event +`NA`) rather than as events 1, 3, and 4 (with the second event `NA`) or any +other combination. + +Many occupancy models also include covariates that influence occupancy or +detection probabilities. Unit covariates (see *Terms and definitions* above) can +be passed to `make_flocker_data()` as a dataframe with the same number of rows +as the observation matrix and data in the same order as the rows of the +observation matrix. Columns are covariates, and we recommend using informative +column names. *Event covariates* (see *Terms and definitions* above) can be +passed as a named list of matrices whose elements `[i, j]` are the covariate +values for the sampling event represented by the corresponding position of the +observation matrix. Again, we recommend using informative names for the list +elements. If the corresponding observation is `NA`, then the value of the event +covariate does not matter. + +To pass data to `flocker`, we first pass the output from +`simulate_flocker_data()` to `make_flocker_data()`, which will repackage data +and apply the necessary formatting: + + +```r +fd_rep_varying <- make_flocker_data( + obs = d$obs, + unit_covs = d$unit_covs, + event_covs = d$event_covs + ) +#> Formatting data for a single-season occupancy model. For details, see make_flocker_data_static. All warnings and error messages should be interpreted in the context of make_flocker_data_static +``` + +The function `make_flocker_data()` outputs an object of class `flocker_data` +that we can pass to flocker's model fitting function `flock()`. Note that this +is the general workflow users will need to follow with real data. Alternative +inputs to `make_flocker_data()` and `flock()` enable the user to readily fit +multi-season models as well as multi-species models with data augmentation +(see below). + +## Model fitting +### The single-season rep-varying model +To fit a model, in this case a single-season multi-species occupancy model, we +use the function `flock()`. By supplying different arguments to this function, +all flavors of occupancy model available in `flocker` can be fitted. Formulas +for the different distributional parameters in the model (occupancy, detection, +colonization, extinction, and autologistic terms as applicable) are provided +as one-sided formulas to the relevant arguments of `flock()` (`f_occ`, `f_det`, +`f_col`, `f_ex`, and `f_auto` as applicable). + + +```r +rep_varying <- flock( + f_occ = ~ uc1 + (1 + uc1 | species), + f_det = ~ uc1 + ec1 + (1 + uc1 + ec1 | species), + flocker_data = fd_rep_varying, + cores = 4, + silent = 2, + refresh = 0 + ) +``` + +Arguments supplied to `flock()` define formulas using `brms` syntax for the +occupancy (`f_occ`) and detection (`f_det`) components, and also provide the +formatted data. At this stage, the full flexibility and power of `brms` formula +syntax are available to the user (see following sections for some examples). +`rep_varying` is a `brmsfit` object from package `brms` and also a `flockerfit` +object from package `flocker`. Post-processing functions from `brms` will +typically not work with this object and are instead replaced by `flocker` +equivalents. + +### The single-season rep-constant model +`make_flocker_data()` will automatically format the data for a rep-constant +model when `event_covs = NULL` and the desired model is a single-season model +without data augmentation. To take advantage of the efficiency gains and +post-processing functionality of the rep-constant model, it is necessary to +supply `event_covs = NULL` to `make_flocker_data()` at the moment of data +formatting; it is insufficient to omit event covariates from the detection +formula supplied to `flock()` after formatting the data for a rep-varying model. + + +```r +fd_rep_constant <- make_flocker_data( + obs = d$obs, + unit_covs = d$unit_covs + ) +#> Formatting data for a single-season occupancy model. For details, see make_flocker_data_static. All warnings and error messages should be interpreted in the context of make_flocker_data_static +rep_constant <- flock( + f_occ = ~ uc1 + (1 + uc1 | species), + f_det = ~ uc1 + (1 + uc1 | species), + flocker_data = fd_rep_constant, + save_pars = save_pars(all = TRUE), # for loo with moment matching + silent = 2, + refresh = 0, + cores = 4 + ) +``` + +Note that within-chain parallelization is available (uniquely so) for the +rep-constant mode: + +```r +rep_constant <- flock( + f_occ = ~ uc1 + (1 + uc1 | species), + f_det = ~ uc1 + (1 + uc1 | species), + flocker_data = fd_rep_constant, + silent = 2, + refresh = 0, + chains = 2, + cores = 2, + threads = 2 + ) +``` + +### Multi-season models +Here we provide code examples to complement the companion publication INSERT LINK. +For a more complete vignette on multi-season models in `flocker`, see the [multiseason models vignette](https://jsocolar.github.io/flocker/articles/multiseason_models.html). + +First, we simulate some data that are valid for use with multi-season models. +Here, we will simulate data for three seasons with one unit covariate and one +event covariate. The data will be simulated under a colonization-extinction +model with explicit inits, but we will be able to fit other models +(autologistic, equilibrium inits) to the same data (note that +`simulate_flocker_data()` can also simulate directly from these other model +types). + + +```r +multi_data <- simulate_flocker_data( + n_season = 3, + n_pt = 300, + n_sp = 1, + multiseason = "colex", + multi_init = "explicit", + seed = 1 + ) + +fd_multi <- make_flocker_data( + multi_data$obs, + multi_data$unit_covs, + multi_data$event_covs, + type = "multi", + quiet = TRUE + ) +``` + +Below, we fit the colonization-extinction model with an explicit model for +occupancy in the first timestep. Depending on hardware, fitting this model might +take several minutes. + + +```r +multi_colex <- flock( + f_occ = ~ uc1, + f_det = ~ uc1 + ec1, + f_col = ~ uc1, + f_ex = ~ uc1, + flocker_data = fd_multi, + multiseason = "colex", + multi_init = "explicit", + cores = 4, + silent = 2, + refresh = 0 +) +``` + +Here is the colonization-extinction model using equilibrium occupancy +probabilities in the first timestep: + + +```r +multi_colex_eq <- flock( + f_det = ~ uc1 + ec1, + f_col = ~ uc1, + f_ex = ~ uc1, + flocker_data = fd_multi, + multiseason = "colex", + multi_init = "equilibrium", + cores = 4, + silent = 2, + refresh = 0 + ) +``` + +Here is the autologistic model with explicit occupancy probabilities in the +first timestep. To reflect the stereotypical autologistic model with a constant +logit-scale offset separating colonization and persistence probabilities, we use +the formula `f_auto = ~ 1`, but it is fine to relax this constraint and use, +e.g. `f_auto = ~ uc1`. + + +```r +multi_auto <- flock( + f_occ = ~ uc1, + f_det = ~ uc1 + ec1, + f_col = ~ uc1, + f_auto = ~ 1, + flocker_data = fd_multi, + multiseason = "autologistic", + multi_init = "explicit", + cores = 4, + silent = 2, + refresh = 0 + ) +``` + +And the autologistic model with equilibrium occupancy probabilities in the +first timestep: + + +```r +multi_auto_eq <- flock( + f_det = ~ uc1 + ec1, + f_col = ~ uc1, + f_auto = ~ 1, + flocker_data = fd_multi, + multiseason = "autologistic", + multi_init = "equilibrium", + cores = 4, + silent = 2, + refresh = 0 +) +``` + +### Data-augmented multi-species models +Here we provide code examples to complement the companion publication INSERT LINK. +For a more complete unified vignette on data-augmented models in `flocker`, see +the [data-augmented models vignette](https://jsocolar.github.io/flocker/articles/augmented_models.html). + +Fitting the data-augmented model in `flocker` requires passing the observed +data as a three-dimensional array with sites along the first dimension, visits +along the second, and species along the third. Additionally, we must supply the +`n_aug` argument to `make_flocker_data()`, specifying how many all-zero +pseudospecies to augment the data with. + + +```r +augmented_data <- simulate_flocker_data( + augmented = TRUE + ) +fd_augmented <- make_flocker_data( + augmented_data$obs, augmented_data$unit_covs, augmented_data$event_covs, + type = "augmented", n_aug = 100, + quiet = TRUE + ) +augmented <- flock( + f_occ = ~ (1 | ff_species), + f_det = ~ uc1 + ec1 + (1 + uc1 + ec1 | ff_species), + augmented = TRUE, + flocker_data = fd_augmented, + cores = 4, + silent = 2, + refresh = 0 + ) +#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. +#> Running the chains for more iterations may help. See +#> https://mc-stan.org/misc/warnings.html#bulk-ess +``` + +Here, the random effect of species is specified using the special grouping +keyword `ff_species` (names beginning with `ff_` are reserved in `flocker` and +are not allowed as names for user-supplied covariates). + +## Post-processing +`flocker` provides functions for four main types of bespoke post-processing +for occupancy models. `fitted_flocker()` + +`get_Z()` provides the posterior distribution for the +latent occupancy state. `predict_flocker()` provides posterior predictions at +the observed points (e.g. for use in posterior predictive checking) or for new +data. `loo_flocker()` and `loo_compare_flocker()` both provide functionality for +model comparison. See below for details on all four types of post-processing. +Both posterior predictions and model comparison rely on subtle aspects of the +occupancy model likelihood that we explain in more detail in the companion paper +INSERT LINK and [here](https://jsocolar.github.io/likelihoodOccupancy/). + +### brms-native post-processing +All post-processing functions from `brms` work on single-season rep-constant +models, but do not work on any other model types. For example: + + +```r +predictions_rep_constant <- brms::posterior_predict(rep_constant) +loo_rep_constant <- brms::loo(rep_constant, moment_match = TRUE) +brms::conditional_effects(rep_constant) +``` + +The following functions work on all model types available in `flocker`. + +### Fitted values +Fitted values for any of the distributional parameter (one or more of occupancy, +detection, colonization, extinction, autologistic, and/or Omega, the fitted +probability that a given (pseudo)species occurrs in the metacommunity) are +available via `fitted_flocker`. For example: + + +```r +fitted_flocker(rep_constant) +fitted_flocker(rep_varying) +fitted_flocker(multi_colex) +fitted_flocker(augmented) +``` + +### The posterior occupancy state +The function `get_Z()` returns the posterior distribution of occupancy probabilities across the closure-units. The shape of the output depends on the class of model, and is an array in the shape of the first visit in `obs` as passed to `make_flocker_data`, with posterior iterations stacked along the final dimension. Thus, for a single-season rep-varying model, the output is a matrix where rows are posterior iterations, columns are closure-units, and values are draws from the posterior distribution of occupancy probabilities: + + +```r +get_Z(rep_varying) +``` + +For all model types, `get_Z()` accepts an optional `new_data` argument. Leaving +the default `new_data = NULL` supplies the posterior for the true occupancy state +at the locations of the data used to fit the model. Otherwise, the posterior is +computed over the new data. For single-season models, `new_data` can be supplied +as a dataframe of unit covariate values or as a `flocker_data` object. For +multi-season models, only a `flocker_data` object is allowed. Note that if +predictions are desired at sites without observations, it is acceptable to pass +an array of dummy observations (e.g. all zeros) to `make_flocker_data()` and +then to set `history_condition = FALSE` in the call to `get_Z()`. + +`get_Z()` accepts several additional arguments that control the way that posterior is obtained and the values returned. See the companion paper INSERT LINK and +`?get_Z` for details. + +### Posterior prediction +The function `predict_flocker()` provides posterior predictions. By default, +predictions are provided for the covariate data to which the model were fit, but predictions to new data are also possible via the `new_data` argument. The +output differs by model type. For single-season rep-constant models, the return +is a matrix where rows are iterations, columns are units, and values are the +number of detections. For single-season rep-varying models, the return is an +array whose first dimension is units, second dimension is sampling events, +third dimension is iterations, and values are `1`, `0`, or `NA`, representing +detection, nondetection, and no corresponding sampling event. For example: + + +```r +predict_flocker(rep_varying) +``` + +`predict_flocker()` accepts several additional arguments that control the way +that posterior is obtained and the values of returned. See the companion paper +INSERT LINK and `?predict_flocker` for details. + +### Model comparison +The most straightforward way to compare models fit with `flocker` is the +function `loo_compare_flocker()`. This function takes a list of flocker_fit +objects as its argument and returns a model comparison table based on the +difference in the expected log predictive density (elpd) between models. This +table is a `compare.loo` object from `loo::loo_compare()`. The "leave-one-out" +holdouts consist of entire closure-units (single-season models), series +(multi-season models), or species (augmented models), not single sampling events +(see the companion paper INSERT LINK and +[here](https://jsocolar.github.io/likelihoodOccupancy/) for details of why). + +`loo_compare_flocker()` accepts as input a list of `flockerfit` objects +and outputs a model comparison table. For example, we can compare the +rep-constant and rep-varying models that we fit to the same initial data. Recall +that the data were simulated with event-covariate effects on detection, and as +expected the rep-varying model performs best. Note that we ensure that these +comparisons between rep-constant and rep-varying models are valid by omitting +the binomial coefficient when computing the log-likelihood for the rep-constant +model. + + +```r +loo_compare_flocker( + list(rep_constant, rep_varying) +) +#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details. +#> Warning: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details. +#> elpd_diff se_diff +#> model2 0.0 0.0 +#> model1 -156.4 17.9 +``` + +Likewise, we can compare the four flavors of multi-season model that we fit +above. Recall that the data were simulated under colonization-extinction +dynamics (rather than autologistic) and under explicit initial occupancy +probabilities (rather than equilibrium). As expected, the `multi_colex` model +performs best: + + +```r +loo_compare_flocker( + list(multi_colex, multi_colex_eq, multi_auto, multi_auto_eq) +) +#> elpd_diff se_diff +#> model1 0.0 0.0 +#> model2 -17.2 5.9 +#> model3 -53.1 10.1 +#> model4 -65.1 10.3 +``` + +Flocker also provides the function `loo_flocker()` to return a table of +`elpd_loo`, `p_loo`, and `looic` estimates from `loo::loo()` or `brms::loo()` +(the latter for single-season rep-constant models only). + +## `brms` tips and tricks +Mastering advanced occupancy modeling via `flocker` is mostly a matter of +mastering the syntax available in `brms`. Here are some useful pieces of syntax: + +### Priors +Priors can be implemented as they would with any `brms` model. Priors can be +specified using `set_prior()`, with priors specified for groups of parameters +(via `class`) or individual parameters (via `coef`). The priors used for a +particular model can be retrieved using `brms::prior_summary()`, and the +names of the parameters and their default priors can be displayed prior to +model fitting using `get_flocker_prior()` which is a drop-in replacement for +`brms::get_prior()`. + + +```r +get_flocker_prior( + f_occ = ~ uc1 + (1 + uc1 | species), + f_det = ~ uc1 + ec1 + (1 + uc1 + ec1 | species), + flocker_data = fd_rep_varying + ) +#> prior class coef group resp dpar nlpar lb ub source +#> (flat) b default +#> (flat) b ec1 (vectorized) +#> (flat) b uc1 (vectorized) +#> lkj(1) cor default +#> lkj(1) cor species (vectorized) +#> student_t(3, 0, 2.5) Intercept default +#> student_t(3, 0, 2.5) sd 0 default +#> student_t(3, 0, 2.5) sd species 0 (vectorized) +#> student_t(3, 0, 2.5) sd ec1 species 0 (vectorized) +#> student_t(3, 0, 2.5) sd Intercept species 0 (vectorized) +#> student_t(3, 0, 2.5) sd uc1 species 0 (vectorized) +#> (flat) b occ default +#> (flat) b uc1 occ (vectorized) +#> (flat) Intercept occ default +#> student_t(3, 0, 2.5) sd occ 0 default +#> student_t(3, 0, 2.5) sd species occ 0 (vectorized) +#> student_t(3, 0, 2.5) sd Intercept species occ 0 (vectorized) +#> student_t(3, 0, 2.5) sd uc1 species occ 0 (vectorized) +brms::prior_summary(rep_varying) +#> prior class coef group resp dpar nlpar lb ub source +#> (flat) b default +#> (flat) b ec1 (vectorized) +#> (flat) b uc1 (vectorized) +#> (flat) b occ default +#> (flat) b uc1 occ (vectorized) +#> student_t(3, 0, 2.5) Intercept default +#> (flat) Intercept occ default +#> lkj_corr_cholesky(1) L default +#> lkj_corr_cholesky(1) L species (vectorized) +#> student_t(3, 0, 2.5) sd 0 default +#> student_t(3, 0, 2.5) sd occ 0 default +#> student_t(3, 0, 2.5) sd species 0 (vectorized) +#> student_t(3, 0, 2.5) sd ec1 species 0 (vectorized) +#> student_t(3, 0, 2.5) sd Intercept species 0 (vectorized) +#> student_t(3, 0, 2.5) sd uc1 species 0 (vectorized) +#> student_t(3, 0, 2.5) sd species occ 0 (vectorized) +#> student_t(3, 0, 2.5) sd Intercept species occ 0 (vectorized) +#> student_t(3, 0, 2.5) sd uc1 species occ 0 (vectorized) +``` + +Note that if there are parameters shared between both the occupancy and detection model formulas, e.g. + +```r +get_flocker_prior( + f_occ = ~ uc1 + (1|species), + f_det = ~ uc1 + ec2 + (1|species), + flocker_data = fd_rep_varying + ) +``` + +then there will be two entries for each of the shared parameters in the prior +table (`uc1` in this example). Specifying a priors for parameters in formulas +other than detection can be done with reference to the `dpar` column, e.g.: + + +```r +user_prior <- c(brms::set_prior("normal(0, 1)", coef = "uc1"), + brms::set_prior("normal(0, 3)", coef = "uc1", dpar = "occ")) +``` +where the `uc1` parameter in the occupancy component is specified by the +addition of the `dpar` argument, and the `uc1` parameter in the detection +component is specified without reference to `dpar`. + +For more on priors in `brms`, see `?brms::set_prior`. + +Users should understand the implications of the default `brms` behavior to +internally center the design matrix, which affects the prior on the intercept +gets set (see `?brms::set_prior`). Here is an example, based on a +single-season rep-varying model, wherein we set a logistic prior on the value of +the intercepts (flat on the probability scale) when all predictors are held at +their means and a moderately regularizing prior on the coefficients: + + +```r +rep_varying_prior1 <- flock( + f_occ = ~ uc1, + f_det = ~ ec1, + flocker_data = fd_rep_varying, + prior = + brms::set_prior("logistic(0,1)", class = "Intercept") + + brms::set_prior("logistic(0,1)", class = "Intercept", dpar = "occ") + + brms::set_prior("normal(0,2)", class = "b") + + brms::set_prior("normal(0,2)", class = "b", dpar = "occ"), + cores = 4, + silent = 2, + refresh = 0 + ) +brms::prior_summary(rep_varying_prior1) +#> prior class coef group resp dpar nlpar lb ub source +#> normal(0,2) b user +#> normal(0,2) b ec1 (vectorized) +#> normal(0,2) b occ user +#> normal(0,2) b uc1 occ (vectorized) +#> logistic(0,1) Intercept user +#> logistic(0,1) Intercept occ user +``` + +Here is an example where we set informative priors on the intercepts when all covariates are fixed to zero and the same moderately regularizing prior on the coefficients: + + +```r +rep_varying_prior2 <- flock( + f_occ = ~ 0 + Intercept + uc1, + f_det = ~ 0 + Intercept + ec1, + flocker_data = fd_rep_varying, + prior = + brms::set_prior("normal(0,2)", class = "b") + + brms::set_prior("normal(0,2)", class = "b", dpar = "occ") + + brms::set_prior("normal(1, 1)", class = "b", coef = "Intercept") + + brms::set_prior("normal(-1, 1)", class = "b", coef = "Intercept", dpar = "occ"), + cores = 4, + silent = 2, + refresh = 0 + ) +brms::prior_summary(rep_varying_prior2) +#> prior class coef group resp dpar nlpar lb ub source +#> normal(0,2) b user +#> normal(0,2) b ec1 (vectorized) +#> normal(1, 1) b Intercept user +#> normal(0,2) b occ user +#> normal(-1, 1) b Intercept occ user +#> normal(0,2) b uc1 occ (vectorized) +``` + +### Model formulas +Simple formulas follow the same syntax as R's `lm()` function. For example: + +```r +mod1 <- flock( + f_occ = ~ uc1 + (1|species), + f_det = ~ 1, + flocker_data = fd_rep_constant + ) +``` + +### Random effects +Simple random effects follow `lme4` syntax, including advanced `lme4` syntax +like `||` for uncorrelated effects and `/` and `:` for expansion of multiple +grouping terms. Here's a simple example: + +```r +mod2 <- flock( + f_occ = ~ uc1 + (1|species), + f_det = ~ 1, + flocker_data = fd_rep_constant + ) +``` + +When a model includes multiple random effects with the same grouping term, by +default they are modeled as correlated *within* the occupancy or detection +formulas, but as uncorrelated *between* formulas. For example, the code below +estimates a single correlation for the intercept and slope in the occupancy +sub-model. + +```r +mod3 <- flock( + f_occ = ~ uc1 + (1 + uc1 | species), + f_det = ~ ec1 + (1 | species), + flocker_data = fd_rep_varying + ) +``` +However, this assumption can easily be relaxed using the `||` syntax from +`brms`. The `` is an arbitrary character string representing a group of +terms to model as correlated. The below code, for example, models correlated +intercepts in the occupancy and detection sub-models, and correlated effects of +`sc1` on occupancy and `vc1` on detection, but no correlations between the +intercepts and the slopes in either sub-model: + +```r +mod4 <- flock( + f_occ = ~ uc1 + (1 |g1| species) + (0 + uc1 |g2| species), + f_det = ~ ec1 + (1 |g1| species) + (0 + ec1 |g2| species), + flocker_data = fd_rep_varying + ) +``` +For more on `brms` syntax for random effects syntax, see the [documentation here](https://cran.r-project.org/web/packages/brms/vignettes/brms_multilevel.pdf). + +### Nonlinear models +Via `brms`, `flocker` supports Gaussian processes of arbitrary dimensionality +(`brms::gp()`) as well as `mgcv` syntax for thin-plate regression splines +(`brms::s()`) and tensor product smooths (`brms::t2()`), and `brms` syntax for +monotonic effects of ordinal factors via `brms::mo()` ([see here](https://paul-buerkner.github.io/brms/articles/brms_monotonic.html)). For +example: + +```r +mod5 <- flock( + f_occ = ~ s(uc1), + f_det = ~ t2(uc1, ec1), + flocker_data = fd_rep_varying + ) + +mod6 <- flock( + f_occ = ~ 1, + f_det = ~ gp(uc1, ec1), + flocker_data = fd_rep_varying + ) +``` + +In addition, `brms` provides the ability to estimate models wherein the +predictors (e.g. for occupancy and detection) are parametric nonlinear functions +whose parameters have their own covariate-based linear predictors. For more +details and an example, see the [nonlinear models vignette](https://jsocolar.github.io/flocker/articles/nonlinear_models.html). + +### Phylogenetic models +Phylogenetic effects can be included by providing a covariance matrix as a +`data2` argument and using the `brms::gr()` function to link species identities +in `flocker_data` with the supplied covariance matrix. Note that phylogenetic +effects can be included in either the occupancy component, the detection +component, or both! In our experience, it can be computationally tractable to +include multiple phylogenetic effects within a single occupancy model (see +[Mills et al. 2022](https://doi.org/10.1002/ecy.3867)). + + +```r +# simulate an example phylogeny +phylogeny <- ape::rtree(30, tip.label = paste0("sp_", 1:30)) + +# calculate covariance matrix +A <- ape::vcv.phylo(phylogeny) + +mod8 <- flock( + f_occ = ~ 1 + (1|gr(species, cov = A)), + f_det = ~ 1 + ec1 + (1|species), + flocker_data = fd_rep_varying, + data2 = list(A = A) + ) + +mod9 <- flock( + f_occ = ~ 1 + (1|gr(species, cov = A)), + f_det = ~ 1 + ec1 + (1|gr(species, cov = A)), + flocker_data = fd_rep_varying, + data2 = list(A = A) + ) +``` + +[See here](https://paul-buerkner.github.io/brms/articles/brms_phylogenetics.html) for further details about specifying phylogenetic effects in `brms`. + +### Spatial and autoregressive structures +In addition to spatial Gaussian processes, `brms` provides a variety of +autoregressive structures, both one-dimensional (see `brms::ar()`, +`brms::arma()`) and two-dimensional (see `brms::car()`, `brms::sar()`. [See here](https://paul-buerkner.github.io/brms/reference/car.html) for details about conditional autoregressive (CAR) models in `brms`, and note that `flock()` +accepts a `data2` argument that it can pass to `brms` as necessary. + +Our principle caution for users is that these autoregressive structures might +lead to degenerate models when applied at the visit level (in detection +formulas) or at the closure-unit level (in occupancy, colonization, extinction, +or autologistic formulas) because observation-level random effects are often +degenerate in regressions with Bernoulli responses. Thus we recommend applying +autoregressive terms to groupings of multiple visits (detection formula) or +multiple closure-units (other formulas). However, we note that `flocker` *does* +provide a well-identified one-dimensional first-order autoregressive structure +for occupancy across closure-units in a single-season model. This is achieved by +co-opting the autologistic parameterization of the multi-season model and +applying it instead to closure-units arranged along a one-dimensional spatial +transect, yielding a one-dimensional analog of a spatial autologistic occupancy +model. + +A second caution is to remind users that in multi-species models, users will +likely want to fit separate spatial terms by species ([Doser et al 2022](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13897)). +For Gaussian processes, this can be achieved via the `by` argument to +`brms::gp()`. For some conditional autoregressive structures (those that allow +disconnected islands), this can be achieved by passing a block-diagonal +adjacency matrix wherein species are disconnected components. + +We note that gaussian process priors for spatially varying coefficients are +readily achieved via the nonlinear formula syntax of `brms`. For more +details and an example, see the [nonlinear models vignette](https://jsocolar.github.io/flocker/articles/nonlinear_models.html). + +### Measurement error in covariates +[See here](http://paul-buerkner.github.io/brms/reference/me.html) for relevant +`brms` documentation. + +## Additional fitting arguments +`flock` will pass any relevant parameters forward to `brms::brm()`, giving the +user important control over the algorithmic details of how the model is fit. See +`?brms::brm` for details. To speed up the execution, we recommend supplying the +argument `backend = "cmdstanr"`. This requires the cmdstanr package and a +working installation of cmdstan; [see here](https://mc-stan.org/cmdstanr/) for +instructions to get started and further details. + diff --git a/vignettes/nonlinear_models.Rmd b/vignettes/nonlinear_models.Rmd new file mode 100644 index 0000000..5c6ee30 --- /dev/null +++ b/vignettes/nonlinear_models.Rmd @@ -0,0 +1,265 @@ +--- +title: "Nonlinear models in flocker" +author: "Jacob Socolar" +date: "2023-10-18" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Nonlinear models in flocker} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + + + + +Here we show how we can use `flocker` to fit nonlinear occupancy models via +`brms`. In most occupancy models, occupancy and detection probabilties are +modeled as logit-linear combinations of covariates. In some models (e.g. those +with splines or Gaussian processes), probabilities are modeled as the sum of +more flexibile functions of covariates. These are straightforward to fit in +`flocker` using the `brms` functions `s()`, `t2()`, and `gp()`; see the [flocker tutorial vignette](https://jsocolar.github.io/flocker/articles/flocker_tutorial.html) for +details. + +This vignette focuses on more complicated nonlinear models that require the use +of special nonlinear `brms` formulas. We showcase two models. The first fits +a parameteric nonlinear predictor. The second fits a model with a spatially +varying coefficient that is given a gaussian process prior. + +## Parameteric nonlinear predictor +In this scenario, we consider a model where the response is a specific nonlinear +parametric function whose parameters are fitted and might or might not depend on covariates. Suppose for example that an expanding population of a territorial +species undergoes logistic growth, and also that some unknown proportion of +territories are unsuitable due to an unobserved factor, such that occupancy +asymptotes at some probability less than one. Thus, occupancy probability +changes through time as $\frac{L}{1 + e^{-k(t-t_0)}}$, where $L$ is the +asymptote, $k$ is a growth rate, $t$ is time, and $t_0$ is the timing of the +inflection point. At multiple discrete times, we randomly sample several sites +to survey, and survey each of those sites over several repeat visits. + + +```r +library(flocker); library(brms) +set.seed(3) + +L <- 0.5 +k <- .1 +t0 <- -5 +t <- seq(-15, 15, 1) +n_site_per_time <- 30 +n_visit <- 3 +det_prob <- .3 + +data <- data.frame( + t = rep(t, n_site_per_time) +) + +data$psi <- L/(1 + exp(-k*(t - t0))) +data$Z <- rbinom(nrow(data), 1, data$psi) +data$v1 <- data$Z * rbinom(nrow(data), 1, det_prob) +data$v2 <- data$Z * rbinom(nrow(data), 1, det_prob) +data$v3 <- data$Z * rbinom(nrow(data), 1, det_prob) + +fd <- make_flocker_data( + obs = as.matrix(data[,c("v1", "v2", "v3")]), + unit_covs = data.frame(t = data[,c("t")]), + event_covs <- list(dummy = matrix(rnorm(n_visit*nrow(data)), ncol = 3)) +) +``` + +We wish to fit an occupancy model that recovers the unknown parameters $L$, $k$, +and $t_0$. We can achieve this using the nonlinear formula syntax provided by +`brms` via `flocker`. + +`flocker` will always assume that the occupancy formula is provided on the logit +scale. Thus, we need to convert our nonlinear function giving the occupancy +probability to a function giving the logit occupancy probability. A bit of +simplification via Wolfram Alpha and we arrive at +$\log(\frac{L}{1 + e^{-k(t - t_0)} - L})$. We then write a `brms` formula +representing occupancy via this function. To specify a formula wherein a +distributional parameter (`occ` in this case, referring to occupancy) is +nonlinear we need to use `brms::set_nl()` rather than merely providing the +`nl = TRUE` argument to `brms::bf()`. + +`flocker`'s main fitting function `flock()` accepts `brmsformula` inputs to its +`f_det` argument. When supplying a `brmsformula` to `f_det` (rather than the +typical one-sided detection formula), the following behaviors are triggered: + +* Several input checks are turned off. For example, `flocker` no longer checks +to ensure that event covariates are absent from the occupancy formula. +`flocker` also no longer explicitly checks that formulas are provided for all of +the required distributional terms for a given family (detection, occupancy, +colonization, extinction, and autologistic terms, depending on the family). + +* All inputs to `f_occ`, `f_col`, `f_ex`, `f_auto` are silently ignored. It is +obligatory to pass the entire formula for all distributional parameters as a +single `brmsformula` object. This means in turn that the user must be familiar +with `flocker`'s internal naming conventions for all of the relevant +distributional parameters (`det` and one or more of `occ`, `colo`, `ex`, +`autologistic`, `Omega`). If fitting a data-augmented model, it will be requried +to pass the `Omega ~ 1` formula within the `brmsformula` (When passing the +traditional one-sided formula to `f_det`, `flocker` includes the formula for +`Omega` internally and automatically). + +* Nonlinear formulas that involve data that are required to be positive might +fail! Internally, some irrelevant data positions get filled with `-99`, but +these positions might still get evaluated by the nonlinear formula, even though +they make no contribution to the likelihood. + +With all of that said, we can go ahead and fit this model! + + +```r +fit <- flock(f_det = brms::bf( + det ~ 1 + dummy, + occ ~ log(L/(1 + exp(-k*(t - t0)) - L)), + L ~ 1, + k ~ 1, + t0 ~ 1 + ) + + brms::set_nl(dpar = "occ"), + prior = + c( + prior(normal(0, 5), nlpar = "t0"), + prior(normal(0, 1), nlpar = "k"), + prior(beta(1, 1), nlpar = "L", lb = 0, ub = 1) + ), + flocker_data = fd, + control = list(adapt_delta = 0.9), + cores = 4) +#> Warning: There were 1 divergent transitions after warmup. See +#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +#> to find out why this is a problem and how to eliminate them. +#> Warning: Examine the pairs() plot to diagnose sampling problems +``` + + +```r +summary(fit) +#> Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. +#> See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +#> Family: occupancy_single +#> Links: mu = identity; occ = identity +#> Formula: ff_y | vint(ff_n_unit, ff_n_rep, ff_Q, ff_rep_index1, ff_rep_index2, ff_rep_index3) ~ 1 + dummy +#> occ ~ log(L/(1 + exp(-k * (t - t0)) - L)) +#> L ~ 1 +#> k ~ 1 +#> t0 ~ 1 +#> Data: data (Number of observations: 2790) +#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; +#> total post-warmup draws = 4000 +#> +#> Population-Level Effects: +#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS +#> Intercept -0.90 0.13 -1.16 -0.66 1.00 2294 2302 +#> L_Intercept 0.50 0.10 0.37 0.75 1.00 1221 977 +#> k_Intercept 0.19 0.08 0.07 0.37 1.00 1200 1719 +#> t0_Intercept -5.90 3.41 -10.41 3.03 1.00 1223 1078 +#> dummy -0.00 0.08 -0.15 0.15 1.00 2752 2526 +#> +#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS +#> and Tail_ESS are effective sample size measures, and Rhat is the potential +#> scale reduction factor on split chains (at convergence, Rhat = 1). +``` + +It works! + +Note that if desired, we could fit more complicated formulas than `~ 1` for any of the nonlinear parameters. For more see the [brms nonlinear model vignette](https://cran.r-project.org/web/packages/brms/vignettes/brms_nonlinear.html). + +## Spatially varying coefficients via a Gaussian process +The `gp()` function in `brms` includes a Gaussian process of arbitary dimension +in the linear predictor. We can use the nonlinear formula syntax to tell `brms` +to include a Gaussian process prior on a coefficient as well. + +First we simulate some data wherein the logit of the occupancy probability +depends on a covariate, and the slope of the dependency is modeled via a +two-dimensional spatial Gaussian process: + + +```r +set.seed(1) +n <- 2000 # sample size +lscale <- 0.3 # square root of l of the gaussian kernel +sigma_gp <- 1 # sigma of the gaussian kernel +intercept <- 0 # occupancy logit-intercept +det_intercept <- -1 # detection logit-intercept +n_visit <- 4 + +# covariate data for the model +gp_data <- data.frame( + x = rnorm(n), + y = rnorm(n), + covariate = rnorm(n) + ) + +# get distance matrix +dist.mat <- as.matrix( + stats::dist(gp_data[,c("x", "y")]) + ) + +# get covariance matrix +cov.mat <- sigma_gp^2 * exp(- (dist.mat^2)/(2*lscale^2)) + +# simulate occupancy data +gp_data$coef <- mgcv::rmvn(1, rep(0, n), cov.mat) +gp_data$lp <- intercept + gp_data$coef * gp_data$covariate +gp_data$psi <- boot::inv.logit(gp_data$lp) +gp_data$Z <- rbinom(n, 1, gp_data$psi) + +# simulate visit data +obs <- matrix(nrow = n, ncol = n_visit) +for(j in 1:n_visit){ + obs[,j] <- gp_data$Z * rbinom(n, 1, boot::inv.logit(det_intercept)) +} +``` + + +And here's how we can fit this model in `flocker`! It turns out that we need +quite a lot of data points to constrain the standard deviation of the Gaussian +process, and so we use a Hilbert space approximate Gaussian process for +computational efficiency. + + +```r +fd2 <- make_flocker_data(obs = obs, unit_covs = gp_data[, c("x", "y", "covariate")]) +svc_mod <- flock( + f_det = brms::bf( + det ~ 1, + occ ~ occint + g * covariate, + occint ~ 1, + g ~ 0 + gp(x, y, scale = FALSE, k = 20, c = 1.25) + ) + + brms::set_nl(dpar = "occ"), + flocker_data = fd2, + cores = 4 +) +``` + + +```r +summary(svc_mod) +#> Family: occupancy_single_C +#> Links: mu = identity; occ = identity +#> Formula: ff_n_suc | vint(ff_n_trial) ~ 1 +#> occ ~ occint + g * covariate +#> occint ~ 1 +#> g ~ 0 + gp(x, y, scale = FALSE, k = 20, c = 1.25) +#> Data: data (Number of observations: 2000) +#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; +#> total post-warmup draws = 4000 +#> +#> Gaussian Process Terms: +#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS +#> sdgp(g_gpxy) 1.69 0.78 0.73 3.67 1.00 3368 3076 +#> lscale(g_gpxy) 0.23 0.11 0.08 0.50 1.00 4128 3128 +#> +#> Population-Level Effects: +#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS +#> Intercept -1.04 0.06 -1.15 -0.93 1.00 6089 2975 +#> occint_Intercept 0.09 0.09 -0.08 0.27 1.00 6221 3132 +#> +#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS +#> and Tail_ESS are effective sample size measures, and Rhat is the potential +#> scale reduction factor on split chains (at convergence, Rhat = 1). +``` +Again, it worked! From 6753e7a2d178360bf4a4ae46abfe914a48466737 Mon Sep 17 00:00:00 2001 From: "Jacob B. Socolar" Date: Wed, 18 Oct 2023 17:34:52 -0400 Subject: [PATCH 4/4] turn back on vignette building on CI --- .github/workflows/check-standard.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/check-standard.yaml b/.github/workflows/check-standard.yaml index db25cd4..47b1fd1 100644 --- a/.github/workflows/check-standard.yaml +++ b/.github/workflows/check-standard.yaml @@ -47,5 +47,5 @@ jobs: - uses: r-lib/actions/check-r-package@v2 with: upload-snapshots: true - build_args: 'c("--no-manual", "--no-build-vignettes")' - args: 'c("--no-manual", "--as-cran", "--ignore-vignettes")' + build_args: 'c("--no-manual")' + args: 'c("--no-manual", "--as-cran")'