Skip to content

Commit

Permalink
4/9 vignettes fixed
Browse files Browse the repository at this point in the history
  • Loading branch information
avehtari committed Feb 2, 2024
1 parent 90ab04a commit cfe1f74
Show file tree
Hide file tree
Showing 6 changed files with 37 additions and 40 deletions.
2 changes: 1 addition & 1 deletion R/loo_moment_matching.R
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ loo_moment_match.default <- function(x, loo, post_draws, log_lik_i,
checkmate::assertFunction(log_prob_upars)
checkmate::assertFunction(log_lik_i_upars)
checkmate::assertNumber(max_iters)
checkmate::assertNumber(k_threshold)
checkmate::assertNumber(k_threshold, null.ok=TRUE)
checkmate::assertLogical(split)
checkmate::assertLogical(cov)
checkmate::assertNumber(cores)
Expand Down
6 changes: 6 additions & 0 deletions tests/testthat/test_loo_moment_matching.R
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,12 @@ test_that("loo_moment_match.default warnings work", {
k_thres = 0.5, split = FALSE,
cov = TRUE, cores = 1), "The accuracy of self-normalized importance sampling")

expect_warning(loo_moment_match(x, loo_manual, post_draws_test, log_lik_i_test,
unconstrain_pars_test, log_prob_upars_test,
log_lik_i_upars_test, max_iters = 30L,
split = FALSE,
cov = TRUE, cores = 1), "The accuracy of self-normalized importance sampling")

expect_no_warning(loo_moment_match(x, loo_manual, post_draws_test, log_lik_i_test,
unconstrain_pars_test, log_prob_upars_test,
log_lik_i_upars_test, max_iters = 30L,
Expand Down
19 changes: 9 additions & 10 deletions vignettes/loo2-example.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ encourage readers to refer to the following papers for more details:

* Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544).

* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).


# Setup
Expand Down Expand Up @@ -179,22 +179,20 @@ leave-one-out cross-validation marginal posterior predictive checks [Gabry et al
(2018)](https://arxiv.org/abs/1709.01449). LOO-PIT values are cumulative
probabilities for $y_i$ computed using the LOO marginal predictive distributions
$p(y_i|y_{-i})$. For a good model, the distribution of LOO-PIT values should be
uniform. In the following plot the distribution (smoothed density estimate) of
the LOO-PIT values for our model (thick curve) is compared to many
independently generated samples (each the same size as our dataset) from the
standard uniform distribution (thin curves).
uniform. In the following QQ-plot the LOO-PIT values for our model (y-axi) is
compared to standard uniform distribution (x-axis).

```{r ppc_loo_pit_overlay}
yrep <- posterior_predict(fit1)
ppc_loo_pit_overlay(
ppc_loo_pit_qq(
y = roaches$y,
yrep = yrep,
lw = weights(loo1$psis_object)
)
```

The excessive number of values close to 0 indicates that the model is
The excessive number of LOO-PIT values close to 0 indicates that the model is
under-dispersed compared to the data, and we should consider a model that allows
for greater dispersion.

Expand All @@ -219,7 +217,8 @@ print(loo2)
plot(loo2, label_points = TRUE)
```

Using the `label_points` argument will label any $k$ values larger than 0.7 with
Using the `label_points` argument will label any $k$ values larger than the
diagnostic threshold with
the index of the corresponding data point. These high values are often the
result of model misspecification and frequently correspond to data points that
would be considered ``outliers'' in the data and surprising according to the
Expand Down Expand Up @@ -253,7 +252,7 @@ still some degree of model misspecification, but this is much better than the
For further model checking we again examine the LOO-PIT values.
```{r ppc_loo_pit_overlay-negbin}
yrep <- posterior_predict(fit2)
ppc_loo_pit_overlay(roaches$y, yrep, lw = weights(loo2$psis_object))
ppc_loo_pit_qq(roaches$y, yrep, lw = weights(loo2$psis_object))
```

The plot for the negative binomial model looks better than the Poisson plot, but
Expand All @@ -272,7 +271,7 @@ loo_compare(loo1, loo2)

The difference in ELPD is much larger than several times the estimated standard
error of the difference again indicating that the negative-binomial model is
expected to have better predictive performance than the Poisson model. However,
xpected to have better predictive performance than the Poisson model. However,
according to the LOO-PIT checks there is still some misspecification, and a
reasonable guess is that a hurdle or zero-inflated model would be an improvement
(we leave that for another case study).
Expand Down
36 changes: 14 additions & 22 deletions vignettes/loo2-large-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Proceedings of the 23rd International Conference on Artificial Intelligence and

* Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544).

* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544).
* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544).

which provide important background for understanding the methods implemented in
the package.
Expand Down Expand Up @@ -195,8 +195,9 @@ p_loo 3.1 0.1 0.4
looic 3936.9 31.2 0.6
------
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.9, 1.0]).
All Pareto k estimates are good (k < 0.5).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

Expand Down Expand Up @@ -246,8 +247,9 @@ p_loo 3.2 0.1 0.4
looic 3936.7 31.2 0.5
------
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.9, 1.0]).
All Pareto k estimates are good (k < 0.5).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

Expand Down Expand Up @@ -290,8 +292,9 @@ p_loo 3.5 0.2 0.5
looic 3937.9 30.7 1.1
------
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.9, 1.0]).
All Pareto k estimates are good (k < 0.5).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

Expand Down Expand Up @@ -343,15 +346,9 @@ looic 3936.8 31.2
------
Posterior approximation correction used.
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume independent draws (r_eff=1).
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 2989 99.0% 1827
(0.5, 0.7] (ok) 31 1.0% 1996
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
All Pareto k estimates are ok (k < 0.7).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

Expand Down Expand Up @@ -386,15 +383,9 @@ looic 3936.4 31.1 0.8
------
Posterior approximation correction used.
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume independent draws (r_eff=1).
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 97 97.0% 1971
(0.5, 0.7] (ok) 3 3.0% 1997
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
All Pareto k estimates are ok (k < 0.7).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

Expand Down Expand Up @@ -471,8 +462,9 @@ p_loo 2.6 0.1 0.3
looic 3903.9 32.4 0.4
------
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [1.0, 1.1]).
All Pareto k estimates are good (k < 0.5).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

Expand Down Expand Up @@ -616,6 +608,6 @@ Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4.
[online](https://link.springer.com/article/10.1007/s11222-016-9696-4),
[arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544).

Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto
smoothed importance sampling.
[arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
8 changes: 4 additions & 4 deletions vignettes/loo2-lfo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ leave-one-out cross-validation (LOO-CV). For a data set with $N$ observations,
we refit the model $N$ times, each time leaving out one of the $N$ observations
and assessing how well the model predicts the left-out observation. LOO-CV is
very expensive computationally in most realistic settings, but the Pareto
smoothed importance sampling (PSIS, Vehtari et al, 2017, 2019) algorithm provided by
smoothed importance sampling (PSIS, Vehtari et al, 2017, 2022) algorithm provided by
the *loo* package allows for approximating exact LOO-CV with PSIS-LOO-CV.
PSIS-LOO-CV requires only a single fit of the full model and comes with
diagnostics for assessing the validity of the approximation.
Expand Down Expand Up @@ -179,7 +179,7 @@ variability of the importance ratios $r_i^{(s)}$ will become too large and
importance sampling will fail. We will refer to this particular value of $i$ as
$i^\star_1$. To identify the value of $i^\star_1$, we check for which value of
$i$ does the estimated shape parameter $k$ of the generalized Pareto
distribution first cross a certain threshold $\tau$ (Vehtari et al, 2019). Only
distribution first cross a certain threshold $\tau$ (Vehtari et al, 2022). Only
then do we refit the model using the observations up to $i^\star_1$ and restart
the process from there by setting $\theta^{(s)} = \theta^{(s)}_{1:i^\star_1}$
and $i^\star = i^\star_1$ until the next refit.
Expand All @@ -188,7 +188,7 @@ In some cases we may only need to refit once and in other cases we will find a
value $i^\star_2$ that requires a second refitting, maybe an $i^\star_3$ that
requires a third refitting, and so on. We refit as many times as is required
(only when $k > \tau$) until we arrive at observation $i = N - M$.
For LOO, we recommend to use a threshold of $\tau = 0.7$ (Vehtari et al, 2017, 2019)
For LOO, assuming posterior sample size is 4000 or larger, we recommend to use a threshold of $\tau = 0.7$ (Vehtari et al, 2017, 2022)
and it turns out this is a reasonable threshold for LFO as well (Bürkner et al. 2020).

## Autoregressive models
Expand Down Expand Up @@ -640,7 +640,7 @@ Bürkner P. C., Gabry J., & Vehtari A. (2020). Approximate leave-future-out cros

Vehtari A., Gelman A., & Gabry J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. *Statistics and Computing*, 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. [Online](https://link.springer.com/article/10.1007/s11222-016-9696-4). [arXiv preprint arXiv:1507.04544](https://arxiv.org/abs/1507.04544).

Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).

<br />

Expand Down
6 changes: 3 additions & 3 deletions vignettes/loo2-moment-matching.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ papers

* Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544).

* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019).
* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022).
Pareto smoothed importance sampling.
[arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).

Expand Down Expand Up @@ -168,7 +168,7 @@ __rstan__. It only requires setting the argument `moment_match` to `TRUE` in the
`loo()` function. Optionally, you can also set the argument `k_threshold` which
determines the Pareto $k$ threshold, above which moment matching is used. By
default, it operates on all observations whose Pareto $k$ value is larger than
0.7.
the sample size ($S$) specific threshold $\min(1 - 1 / \log_{10}(S), 0.7)$ (which is $0.7$ for $S>2200$).

```{r loo_moment_match}
# available in rstan >= 2.21
Expand Down Expand Up @@ -319,4 +319,4 @@ Implicitly adaptive importance sampling. _Statistics and Computing_, 31, 16.

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. _Statistics and Computing_. 27(5), 1413--1432. \doi:10.1007/s11222-016-9696-4. Links: [published](https://link.springer.com/article/10.1007/s11222-016-9696-4) | [arXiv preprint](https://arxiv.org/abs/1507.04544).

Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2022). Pareto smoothed importance sampling. [arXiv preprint arXiv:1507.02646](https://arxiv.org/abs/1507.02646).

0 comments on commit cfe1f74

Please sign in to comment.