4/9 vignettes fixed

stan-dev · Feb 2, 2024 · cfe1f74 · cfe1f74
1 parent 90ab04a
commit cfe1f74
Show file tree

Hide file tree

Showing 6 changed files with 37 additions and 40 deletions.
diff --git a/R/loo_moment_matching.R b/R/loo_moment_matching.R
@@ -78,7 +78,7 @@ loo_moment_match.default <- function(x, loo, post_draws, log_lik_i,
   checkmate::assertFunction(log_prob_upars)
   checkmate::assertFunction(log_lik_i_upars)
   checkmate::assertNumber(max_iters)
-  checkmate::assertNumber(k_threshold)
+  checkmate::assertNumber(k_threshold, null.ok=TRUE)
   checkmate::assertLogical(split)
   checkmate::assertLogical(cov)
   checkmate::assertNumber(cores)

diff --git a/tests/testthat/test_loo_moment_matching.R b/tests/testthat/test_loo_moment_matching.R
@@ -147,6 +147,12 @@ test_that("loo_moment_match.default warnings work", {
                               k_thres = 0.5, split = FALSE,
                               cov = TRUE, cores = 1), "The accuracy of self-normalized importance sampling")
 
+  expect_warning(loo_moment_match(x, loo_manual, post_draws_test, log_lik_i_test,
+                              unconstrain_pars_test, log_prob_upars_test,
+                              log_lik_i_upars_test, max_iters = 30L,
+                              split = FALSE,
+                              cov = TRUE, cores = 1), "The accuracy of self-normalized importance sampling")
+
   expect_no_warning(loo_moment_match(x, loo_manual, post_draws_test, log_lik_i_test,
                               unconstrain_pars_test, log_prob_upars_test,
                               log_lik_i_upars_test, max_iters = 30L,

diff --git a/vignettes/loo2-example.Rmd b/vignettes/loo2-example.Rmd
@@ -30,7 +30,7 @@ encourage readers to refer to the following papers for more details:
 
 * Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544).
 
-* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
+* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
 
 
 # Setup
@@ -179,22 +179,20 @@ leave-one-out cross-validation marginal posterior predictive checks [Gabry et al
 (2018)](https://arxiv.org/abs/1709.01449). LOO-PIT values are cumulative
 probabilities for $y_i$ computed using the LOO marginal predictive distributions
 $p(y_i|y_{-i})$. For a good model, the distribution of LOO-PIT values should be
-uniform. In the following plot the distribution (smoothed density estimate) of
-the LOO-PIT values for our model (thick curve) is compared to many
-independently generated samples (each the same size as our dataset) from the
-standard uniform distribution (thin curves).
+uniform. In the following QQ-plot the LOO-PIT values for our model (y-axi) is
+compared to standard uniform distribution (x-axis).
 
 ```{r ppc_loo_pit_overlay}
 yrep <- posterior_predict(fit1)
 
-ppc_loo_pit_overlay(
+ppc_loo_pit_qq(
   y = roaches$y,
   yrep = yrep,
   lw = weights(loo1$psis_object)
 )
 ```
 
-The excessive number of values close to 0 indicates that the model is
+The excessive number of LOO-PIT values close to 0 indicates that the model is
 under-dispersed compared to the data, and we should consider a model that allows
 for greater dispersion.
 
@@ -219,7 +217,8 @@ print(loo2)
 plot(loo2, label_points = TRUE)
 ```
 
-Using the `label_points` argument will label any $k$ values larger than 0.7 with
+Using the `label_points` argument will label any $k$ values larger than the
+diagnostic threshold with
 the index of the corresponding data point. These high values are often the
 result of model misspecification and frequently correspond to data points that
 would be considered ``outliers'' in the data and surprising according to the
@@ -253,7 +252,7 @@ still some degree of model misspecification, but this is much better than the
 For further model checking we again examine the LOO-PIT values.
 ```{r ppc_loo_pit_overlay-negbin}
 yrep <- posterior_predict(fit2)
-ppc_loo_pit_overlay(roaches$y, yrep, lw = weights(loo2$psis_object))
+ppc_loo_pit_qq(roaches$y, yrep, lw = weights(loo2$psis_object))
 ```
 
 The plot for the negative binomial model looks better than the Poisson plot, but
@@ -272,7 +271,7 @@ loo_compare(loo1, loo2)
 
 The difference in ELPD is much larger than several times the estimated standard
 error of the difference again indicating that the negative-binomial model is
-expected to have better predictive performance than the Poisson model. However,
+xpected to have better predictive performance than the Poisson model. However,
 according to the LOO-PIT checks there is still some misspecification, and a
 reasonable guess is that a hurdle or zero-inflated model would be an improvement
 (we leave that for another case study).

diff --git a/vignettes/loo2-large-data.Rmd b/vignettes/loo2-large-data.Rmd
@@ -35,7 +35,7 @@ Proceedings of the 23rd International Conference on Artificial Intelligence and
 
 * Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544).
 
-* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544).
+* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544).
 
 which provide important background for understanding the methods implemented in
 the package.
@@ -195,8 +195,9 @@ p_loo         3.1  0.1            0.4
 looic      3936.9 31.2            0.6
 ------
 Monte Carlo SE of elpd_loo is 0.0.
+MCSE and ESS estimates assume MCMC draws (r_eff in [0.9, 1.0]).
 
-All Pareto k estimates are good (k < 0.5).
+All Pareto k estimates are good (k < 0.7).
 See help('pareto-k-diagnostic') for details.
 ```
 
@@ -246,8 +247,9 @@ p_loo         3.2  0.1            0.4
 looic      3936.7 31.2            0.5
 ------
 Monte Carlo SE of elpd_loo is 0.0.
+MCSE and ESS estimates assume MCMC draws (r_eff in [0.9, 1.0]).
 
-All Pareto k estimates are good (k < 0.5).
+All Pareto k estimates are good (k < 0.7).
 See help('pareto-k-diagnostic') for details.
 ```
 
@@ -290,8 +292,9 @@ p_loo         3.5  0.2            0.5
 looic      3937.9 30.7            1.1
 ------
 Monte Carlo SE of elpd_loo is 0.0.
+MCSE and ESS estimates assume MCMC draws (r_eff in [0.9, 1.0]).
 
-All Pareto k estimates are good (k < 0.5).
+All Pareto k estimates are good (k < 0.7).
 See help('pareto-k-diagnostic') for details.
 ```
 
@@ -343,15 +346,9 @@ looic      3936.8 31.2
 ------
 Posterior approximation correction used.
 Monte Carlo SE of elpd_loo is 0.0.
+MCSE and ESS estimates assume independent draws (r_eff=1).
 
-Pareto k diagnostic values:
-                         Count Pct.    Min. n_eff
-(-Inf, 0.5]   (good)     2989  99.0%   1827      
- (0.5, 0.7]   (ok)         31   1.0%   1996      
-   (0.7, 1]   (bad)         0   0.0%   <NA>      
-   (1, Inf)   (very bad)    0   0.0%   <NA>      
-
-All Pareto k estimates are ok (k < 0.7).
+All Pareto k estimates are good (k < 0.7).
 See help('pareto-k-diagnostic') for details.
 ```
 
@@ -386,15 +383,9 @@ looic      3936.4 31.1            0.8
 ------
 Posterior approximation correction used.
 Monte Carlo SE of elpd_loo is 0.0.
+MCSE and ESS estimates assume independent draws (r_eff=1).
 
-Pareto k diagnostic values:
-                         Count Pct.    Min. n_eff
-(-Inf, 0.5]   (good)     97    97.0%   1971      
- (0.5, 0.7]   (ok)        3     3.0%   1997      
-   (0.7, 1]   (bad)       0     0.0%   <NA>      
-   (1, Inf)   (very bad)  0     0.0%   <NA>      
-
-All Pareto k estimates are ok (k < 0.7).
+All Pareto k estimates are good (k < 0.7).
 See help('pareto-k-diagnostic') for details.
 ```
 
@@ -471,8 +462,9 @@ p_loo         2.6  0.1            0.3
 looic      3903.9 32.4            0.4
 ------
 Monte Carlo SE of elpd_loo is 0.0.
+MCSE and ESS estimates assume MCMC draws (r_eff in [1.0, 1.1]).
 
-All Pareto k estimates are good (k < 0.5).
+All Pareto k estimates are good (k < 0.7).
 See help('pareto-k-diagnostic') for details.
 ```
 
@@ -616,6 +608,6 @@ Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4.
 [online](https://link.springer.com/article/10.1007/s11222-016-9696-4), 
 [arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544).
 
-Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto
+Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto
 smoothed importance sampling.
 [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
diff --git a/vignettes/loo2-lfo.Rmd b/vignettes/loo2-lfo.Rmd
@@ -54,7 +54,7 @@ leave-one-out cross-validation (LOO-CV). For a data set with $N$ observations,
 we refit the model $N$ times, each time leaving out one of the $N$ observations
 and assessing how well the model predicts the left-out observation. LOO-CV is
 very expensive computationally in most realistic settings, but the Pareto
-smoothed importance sampling (PSIS, Vehtari et al, 2017, 2019) algorithm provided by
+smoothed importance sampling (PSIS, Vehtari et al, 2017, 2022) algorithm provided by
 the *loo* package allows for approximating exact LOO-CV with PSIS-LOO-CV.
 PSIS-LOO-CV requires only a single fit of the full model and comes with
 diagnostics for assessing the validity of the approximation.
@@ -179,7 +179,7 @@ variability of the importance ratios $r_i^{(s)}$ will become too large and
 importance sampling will fail. We will refer to this particular value of $i$ as
 $i^\star_1$. To identify the value of $i^\star_1$, we check for which value of
 $i$ does the estimated shape parameter $k$ of the generalized Pareto
-distribution first cross a certain threshold $\tau$ (Vehtari et al, 2019). Only
+distribution first cross a certain threshold $\tau$ (Vehtari et al, 2022). Only
 then do we refit the model using the observations up to $i^\star_1$ and restart
 the process from there by setting $\theta^{(s)} = \theta^{(s)}_{1:i^\star_1}$
 and $i^\star = i^\star_1$ until the next refit.
@@ -188,7 +188,7 @@ In some cases we may only need to refit once and in other cases we will find a
 value $i^\star_2$ that requires a second refitting, maybe an $i^\star_3$ that
 requires a third refitting, and so on. We refit as many times as is required
 (only when $k > \tau$) until we arrive at observation $i = N - M$. 
-For LOO, we recommend to use a threshold of $\tau = 0.7$ (Vehtari et al, 2017, 2019)
+For LOO, assuming posterior sample size is 4000 or larger, we recommend to use a threshold of $\tau = 0.7$ (Vehtari et al, 2017, 2022)
 and it turns out this is a reasonable threshold for LFO as well (Bürkner et al. 2020).
 
 ## Autoregressive models
@@ -640,7 +640,7 @@ Bürkner P. C., Gabry J., & Vehtari A. (2020). Approximate leave-future-out cros
 
 Vehtari A., Gelman A., & Gabry J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. *Statistics and Computing*, 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. [Online](https://link.springer.com/article/10.1007/s11222-016-9696-4). [arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544).
 
-Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
+Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
 
 <br />
 

diff --git a/vignettes/loo2-moment-matching.Rmd b/vignettes/loo2-moment-matching.Rmd
@@ -43,7 +43,7 @@ papers
 
 * Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544).
 
-* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). 
+* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). 
 Pareto smoothed importance sampling. 
 [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
 
@@ -168,7 +168,7 @@ __rstan__. It only requires setting the argument `moment_match` to `TRUE` in the
 `loo()` function. Optionally, you can also set the argument `k_threshold` which
 determines the Pareto $k$ threshold, above which moment matching is used. By
 default, it operates on all observations whose Pareto $k$ value is larger than
-0.7.
+the sample size ($S$) specific threshold $\min(1 - 1 / \log_{10}(S), 0.7)$ (which is $0.7$ for $S>2200$).
 
 ```{r loo_moment_match}
 # available in rstan >= 2.21
@@ -319,4 +319,4 @@ Implicitly adaptive importance sampling. _Statistics and Computing_, 31, 16.
 
 Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544).
 
-Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
+Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).