03.Rmd


```{r, echo = FALSE, cache = FALSE}
knitr::opts_chunk$set(fig.retina = 2.5)
options(width = 100)
```

# Sampling the Imaginary

If you would like to know the probability someone is a vampire given they test positive to the blood-based vampire test, you compute

$$\text{Pr(vampire|positive)} = \frac{\text{Pr(positive|vampire) Pr(vampire)}}{\text{Pr(positive)}}.$$

We'll do so within a tibble.

```{r, message = F, warning = F}
library(tidyverse)

tibble(pr_positive_vampire = .95,
       pr_positive_mortal  = .01,
       pr_vampire          = .001) %>% 
  mutate(pr_positive = pr_positive_vampire * pr_vampire + pr_positive_mortal * (1 - pr_vampire)) %>% 
  mutate(pr_vampire_positive = pr_positive_vampire * pr_vampire / pr_positive) %>% 
  glimpse()
```

Here's the other way of tackling the vampire problem, this time using the frequency format.

```{r}
tibble(pr_vampire          = 100 / 100000,
       pr_positive_vampire = 95 / 100,
       pr_positive_mortal  = 99 / 99900) %>% 
  mutate(pr_positive = 95 + 999) %>% 
  mutate(pr_vampire_positive = pr_positive_vampire * 100 / pr_positive) %>% 
  glimpse()
```

> The posterior distribution is a probability distribution. And like all probability distributions, we can imagine drawing *samples* form it. The sampled events in this case are parameter values. Most parameters have no exact empirical realization. The Bayesian formalism treats parameter distributions as relative plausibility, not as any physical random process. In any event, randomness is always a property of information, never of the real world. But inside the computer, parameters are just as empirical as the outcome of a coin flip or a die toss or an agricultural experiment. The posterior defines the expected frequency that different parameter values will appear, once we start plucking parameters out of it. [@mcelreathStatisticalRethinkingBayesian2015, p. 50, *emphasis* in the original]

## Sampling from a grid-like approximate posterior

Once again, here we use grid approximation to generate samples.

```{r}
# how many grid points would you like?
n <- 1001
n_success <- 6
n_trials  <- 9

(
  d <-
  tibble(p_grid = seq(from = 0, to = 1, length.out = n),
         # note we're still using a flat uniform prior
         prior  = 1) %>% 
  mutate(likelihood = dbinom(n_success, size = n_trials, prob = p_grid)) %>% 
  mutate(posterior = (likelihood * prior) / sum(likelihood * prior))
)
```

Now we'll use the `dplyr::sample_n()` function to sample rows from `d`, saving them as `sample`.

```{r}
# how many samples would you like?
n_samples <- 1e4

# make it reproducible
set.seed(3)

samples <-
  d %>% 
  sample_n(size = n_samples, weight = posterior, replace = T)

glimpse(samples)
```

We'll plot the zigzagging left panel of Figure 3.1 with `geom_line()`. But before we do, we'll need to add a variable numbering the samples.

```{r, fig.width = 3.5, fig.height = 3}
samples %>% 
  mutate(sample_number = 1:n()) %>% 
  
  ggplot(aes(x = sample_number, y = p_grid)) +
  geom_line(linewidth = 1/10) +
  scale_y_continuous("proportion of water (p)", limits = c(0, 1)) +
  xlab("sample number")
```

We'll make the density in the right panel with `geom_density()`.

```{r, fig.width = 3.5, fig.height = 3}
samples %>% 
  ggplot(aes(x = p_grid)) +
  geom_density(fill = "black") +
  scale_x_continuous("proportion of water (p)", limits = c(0, 1))
```

That was based on `1e4` samples. On page 53, McElreath said the density would converge on the idealized shape if we keep increasing the number of samples. Here's what it looks like with `1e6`.

```{r, fig.width = 3.5, fig.height = 3}
set.seed(3)

d %>% 
  sample_n(size = 1e6, weight = posterior, replace = T) %>% 
  ggplot(aes(x = p_grid)) +
  geom_density(fill = "black") +
  scale_x_continuous("proportion of water (p)", limits = 0:1)
```

Yep, that's more ideal.

## Sampling to summarize

"Once your model produces a posterior distribution, the model's work is done. But your work has just begun. It is necessary to summarize and interpret the posterior distribution. Exactly how it is summarized depends upon your purpose" (p. 53).

### Intervals of defined boundaries.

To get the proportion of water less than some value of `p_grid` within the tidyverse, you might first `filter()` by that value and then take the `sum()` within `summarise()`.

```{r}
d %>% 
  filter(p_grid < .5) %>% 
  summarise(sum = sum(posterior))
```

To learn more about `dplyr::summarise()` and related functions, check out Baert's [*Data wrangling part 4: Summarizing and slicing your data*](https://suzan.rbind.io/2018/04/dplyr-tutorial-4/) and [Section 5.6](https://r4ds.had.co.nz/transform.html#grouped-summaries-with-summarise) of *R4DS* [@grolemundDataScience2017].

If what you want is a frequency based on filtering by `samples`, then you might use `n()` within `summarise()`.

```{r}
samples %>%
  filter(p_grid < .5) %>% 
  summarise(sum = n() / n_samples)
```

A more explicit approach for the same computation is to follow up `count()` with `mutate()`.

```{r}
samples %>% 
  count(p_grid < .5) %>% 
  mutate(probability = n / sum(n))
```

An even trickier approach for the same is to insert the logical statement `p_grid < .5` within the `mean()` function.

```{r}
samples %>%
  summarise(sum = mean(p_grid < .5))
```

Much like McElreath discussed in the **Overthinking: Counting with** `sum` box, this works "because R internally converts a logical expression, like `samples < 0/5`, to a vector of `TRUE` and `FALSE` results, one for each element of `samples`, saying whether or not each element matches the criterion" (p. 54). When we inserted that vector of `TRUE` and `FALSE` values within the `mean()` function, they were then internally converted to a vector of 1s and 0s, the mean of which was the probability. Tricky!

To determine the posterior probability between .5 and .75, you can use `&` within `filter()`.

```{r}
samples %>% 
  filter(p_grid > .5 & p_grid < .75) %>% 
  summarise(sum = n() / n_samples)
```

Just multiply that value by 100 to get a percent.

```{r}
samples %>% 
  filter(p_grid > .5 & p_grid < .75) %>% 
  summarise(sum     =       n() / n_samples,
            percent = 100 * n() / n_samples)
```

And, of course, you can do that with our `mean()` trick, too.

```{r}
samples %>%
  summarise(percent = 100 * mean(p_grid > .5 & p_grid < .75))
```

### Intervals of defined mass.

We'll create the upper two panels for Figure 3.2 with `geom_line()`, `geom_ribbon()`, some careful filtering, and a little patchwork syntax.

```{r, fig.width = 6, fig.height = 2.5}
# upper left panel
p1 <-
  d %>% 
  ggplot(aes(x = p_grid)) +
  geom_line(aes(y = posterior)) +
  geom_ribbon(data = d %>% filter(p_grid < .5),
              aes(ymin = 0, ymax = posterior)) +
  labs(x = "proportion of water (p)",
       y = "density")

# upper right panel
p2 <-
  d %>% 
  ggplot(aes(x = p_grid)) +
  geom_line(aes(y = posterior)) +
  # note this next line is the only difference in code from the last plot
  geom_ribbon(data = d %>% filter(p_grid < .75 & p_grid > .5),
              aes(ymin = 0, ymax = posterior)) +
  labs(x = "proportion of water (p)",
       y = "density")

library(patchwork)

p1 + p2
```

We'll come back for the lower two panels in a bit.

Since we saved our `p_grid` samples within the well-named `samples` tibble, we'll have to index with `$` within `quantile`.

```{r}
(q_80 <- quantile(samples$p_grid, prob = .8))
```

That value will come in handy for the lower left panel of Figure 3.2. But anyways, we could `select()` the `samples` vector, extract it from the tibble with `pull()`, and then pump it into `quantile()`.

```{r}
samples %>% 
  pull(p_grid) %>% 
  quantile(prob = .8)
```

And we might also use `quantile()` within `summarise()`.

```{r}
samples %>% 
  summarise(`80th percentile` = quantile(p_grid, p = .8))
```

Here's the `summarise()` approach with two probabilities.

```{r}
samples %>% 
  summarise(`10th percentile` = quantile(p_grid, p = .1),
            `90th percentile` = quantile(p_grid, p = .9))
```

The tidyverse approach is nice in that that family of functions typically returns a data frame. But sometimes you just want your values in a numeric vector for the sake of quick indexing. In that case, base R `quantile()` shines.

```{r}
(q_10_and_90 <- quantile(samples$p_grid, prob = c(.1, .9)))
```

Now we have our cutoff values saved as `q_80` and `q_10_and_90`, we're ready to make the bottom panels of Figure 3.2. For these plots, we'll replace the `geom_ribbon()` approach from above with `geom_area()`.

```{r, fig.width = 6, fig.height = 2.5}
# lower left panel
p1 <-
  d %>% 
  ggplot(aes(x = p_grid, y = posterior)) +
  geom_line() +
  geom_area(data = d %>% filter(p_grid < q_80)) +
  annotate(geom = "text",
           x = .25, y = .0025,
           label = "lower 80%") +
  labs(x = "proportion of water (p)",
       y = "density")

# lower right panel
p2 <-
  d %>% 
  ggplot(aes(x = p_grid, y = posterior)) +
  geom_line() +
  geom_area(data = d %>% filter(p_grid > q_10_and_90[1] & p_grid < q_10_and_90[2])) +
  annotate(geom = "text",
           x = .25, y = .0025,
           label = "middle 80%") +
  labs(x = "proportion of water (p)",
       y = "density")

p1 + p2
```

We've already defined `p_grid` and `prior` within `d`, above. Here we'll reuse them and update the rest of the columns.

```{r}
# here we update the `dbinom()` parameters
n_success <- 3
n_trials  <- 3

# update `d`
d <-
  d %>% 
  mutate(likelihood = dbinom(n_success, size = n_trials, prob = p_grid)) %>% 
  mutate(posterior = (likelihood * prior) / sum(likelihood * prior))

# make the next part reproducible
set.seed(3)

# here's our new samples tibble
(
  samples <-
    d %>% 
    sample_n(size = n_samples, weight = posterior, replace = T)
)
```

The `rethinking::PI()` function works like a nice shorthand for `quantile()`.

```{r}
quantile(samples$p_grid, prob = c(.25, .75))

rethinking::PI(samples$p_grid, prob = .5)
```

Now's a good time to introduce Matthew Kay's [-@R-tidybayes] [**tidybayes** package](https://mjskay.github.io/tidybayes/), which offers an array of convenience functions for summarizing Bayesian models of the type we'll be working with in this ebook. For all the brms-related deets, see Kay's [-@kayExtractingVisualizingTidy2021] vignette, [*Extracting and visualizing tidy draws from brms models*](https://mjskay.github.io/tidybayes/articles/tidy-brms.html). Here we start simple.

```{r, message = F, warning = F}
library(tidybayes)

median_qi(samples$p_grid, .width = .5)
```

The tidybayes package offers a [family of functions](https://mjskay.github.io/tidybayes/articles/tidybayes.html#point-summaries-and-intervals-with-the-point_interval-functions-medianmeanmode_qihdi) that make it easy to summarize a distribution with a measure of central tendency accompanied by intervals. With `median_qi()`, we asked for the median and quantile-based intervals--just like we've been doing with `quantile()`. Note how the `.width` argument within `median_qi()` worked the same way the `prob` argument did within `rethinking::PI()`. With `.width = .5`, we indicated we wanted a quantile-based 50% interval, which was returned in the `ymin` and `ymax` columns. The tidybayes framework makes it easy to request multiple types of intervals. E.g., here we'll request 50%, 80%, and 99% intervals.

```{r}
median_qi(samples$p_grid, .width = c(.5, .8, .99))
```

The `.width` column in the output indexed which line presented which interval. The value in the `y` column remained constant across rows. That's because that column listed the measure of central tendency, the median in this case.

Now let's use the `rethinking::HPDI()` function to return 50% highest posterior density intervals (HPDIs).

```{r}
rethinking::HPDI(samples$p_grid, prob = .5)
```

The reason I introduce tidybayes now is that the functions of the brms package only support percentile-based intervals of the type we computed with `quantile()` and `median_qi()`. But tidybayes also supports HPDIs.

```{r}
mode_hdi(samples$p_grid, .width = .5)
```

This time we used the mode as the measure of central tendency. With this family of tidybayes functions, you specify the measure of central tendency in the prefix (i.e., `mean`, `median`, or `mode`) and then the type of interval you'd like (i.e., `qi` or `hdi`).

If all you want are the intervals without the measure of central tendency or all that other technical information, tidybayes also offers the handy `qi()` and `hdi()` functions.

```{r}
qi(samples$p_grid, .width = .5)
hdi(samples$p_grid, .width = .5)
```

These are nice in that they return simple numeric vectors, making them particularly useful to use as references within ggplot2. Now we have that skill, we can use it to make Figure 3.3.

```{r, fig.width = 5.5, fig.height = 2.5}
# lower left panel
p1 <-
  d %>% 
  ggplot(aes(x = p_grid)) +
  # check out our sweet `qi()` indexing
  geom_ribbon(data = d %>% filter(p_grid > qi(samples$p_grid, .width = .5)[1] & 
                                    p_grid < qi(samples$p_grid, .width = .5)[2]),
              aes(ymin = 0, ymax = posterior),
              fill = "grey75") +
  geom_line(aes(y = posterior)) +
  labs(subtitle = "50% Percentile Interval",
       x = "proportion of water (p)",
       y = "density")

# lower right panel
p2 <-
  d %>% 
  ggplot(aes(x = p_grid, y = posterior)) +
  geom_area(data = . %>% filter(p_grid > hdi(samples$p_grid, .width = .5)[1] & 
                                    p_grid < hdi(samples$p_grid, .width = .5)[2]),
            fill = "grey75") +
  geom_line() +
  labs(subtitle = "50% HPDI",
       x = "proportion of water (p)",
       y = "density")

# combine!
p1 | p2
```

In the `geom_area()` line for the HPDI plot, did you notice how we replaced `d` with `.`? When using the pipe (i.e., `%>%`), you can use the `.` as a placeholder for the original data object. It's an odd and handy trick to have. Go [here](https://magrittr.tidyverse.org/reference/pipe.html) to learn more.

> The HPDI has some advantages over the PI. But in most cases, these two types of interval are very similar. They only look so different in this case because the posterior distribution is highly skewed. If we instead used samples from the posterior distribution for six waters in nine tosses, these intervals would be nearly identical. Try it for yourself, using different probability masses, such as `prob=0.8` and `prob=0.95`. When the posterior is bell shaped, it hardly matters when type of interval you use. (p. 57)

Let's try it out. First we'll update the simulation for six waters in nine tosses.

```{r}
n_success <- 6
n_trials  <- 9

new_d <-
  d %>% 
  mutate(likelihood = dbinom(n_success, size = n_trials, prob = p_grid)) %>% 
  mutate(posterior = (likelihood * prior) / sum(posterior))

set.seed(3)

new_samples <-
  new_d %>% 
  sample_n(size = n_samples, weight = posterior, replace = T)
```

Here are the intervals by `.width` and type of `.interval`.

```{r}
bind_rows(mean_hdi(new_samples$p_grid, .width = c(.8, .95)),
          mean_qi(new_samples$p_grid,  .width = c(.8, .95))) %>% 
  select(.width, .interval, ymin:ymax) %>% 
  arrange(.width) %>% 
  mutate_if(is.double, round, digits = 2)
```

We didn't need that last `mutate_if()` line. It just made it easier to compare the `ymin` and `ymax` values. Anyway, McElreath was right. This time the differences between the HPDIs and QIs were trivial. Here's a look at the posterior.

```{r, fig.width = 3, fig.height = 2.55}
new_d %>% 
  ggplot(aes(x = p_grid)) +
  geom_line(aes(y = posterior)) +
  labs(subtitle = "Six waters in nine tosses made\nfor a more symmetrical posterior",
       x = "proportion of water (p)",
       y = "density")
```

### Point estimates.

We've been calling point estimates measures of central tendency. If we `arrange()` our `d` tibble in descending order by `posterior`, we'll see the corresponding `p_grid` value for its MAP estimate.

```{r}
d %>% 
  arrange(desc(posterior))
```

To emphasize it, we can use `slice()` to select the top row.
 
```{r}
d %>% 
  arrange(desc(posterior)) %>% 
  slice(1)
```

Or we could use the handy `dplyr::top_n()` function.

```{r, message = F}
d %>% 
  select(posterior) %>% 
  top_n(n = 1)
```

We can get the mode with `mode_hdi()` or `mode_qi()`.

```{r, warning = F}
samples %>% mode_hdi(p_grid)
samples %>% mode_qi(p_grid)
```

Those returned a lot of output in addition to the mode. If all you want is the mode itself, you can just use `tidybayes::Mode()`.

```{r}
Mode(samples$p_grid)
```

Medians and means are typical measures of central tendency, too.

```{r}
samples %>% 
  summarise(mean   = mean(p_grid),
            median = median(p_grid))
```

We can inspect the three types of point estimate in the left panel of Figure 3.4. First we'll bundle the three point estimates together in a tibble.

```{r, warning = F}
(
  point_estimates <-
  bind_rows(samples %>% mean_qi(p_grid),
            samples %>% median_qi(p_grid),
            samples %>% mode_qi(p_grid)) %>% 
  select(p_grid, .point) %>% 
  # these last two columns will help us annotate  
  mutate(x = p_grid + c(-.03, .03, -.03),
         y = c(.001, .00175, .0025))
)
```

The plot:

```{r, fig.width = 3.5, fig.height = 3}
d %>% 
  ggplot(aes(x = p_grid)) +
  geom_area(aes(y = posterior),
            fill = "grey75") +
  geom_vline(xintercept = point_estimates$p_grid) +
  geom_text(data = point_estimates,
            aes(x = x, y = y, label = .point),
            angle = 90) +
  labs(x = "proportion of water (p)",
       y = "density") +
  theme(panel.grid = element_blank())
```

As it turns out "*different loss functions imply different point estimates*" (p. 59, *emphasis* in the original).

Let $p$ be the proportion of the Earth covered by water and $d$ be our guess. If McElreath pays us \$100 if we guess exactly right but subtracts money from the prize proportional to how far off we are, then our loss is proportional to $p - d$. If we decide $d = .5$, then our expected loss will be:

```{r}
d %>% 
  mutate(loss = posterior * abs(0.5 - p_grid)) %>% 
  summarise(`expected loss` = sum(loss))
```

What McElreath did with `sapply()`, we'll do with `purrr::map()`. If you haven't used it, `map()` is part of a family of similarly-named functions (e.g., `map2()`) from the [purrr package](https://purrr.tidyverse.org). The `map()` family is the tidyverse alternative to the family of `apply()` functions from the base-R framework. You can learn more about how to use the `map()` family [here](https://purrr.tidyverse.org/reference/map.html) or [here](https://jennybc.github.io/purrr-tutorial/ls01_map-name-position-shortcuts.html) or [here](https://data.library.virginia.edu/getting-started-with-the-purrr-package-in-r/).

```{r}
make_loss <- function(our_d) {
  d %>% 
    mutate(loss = posterior * abs(our_d - p_grid)) %>% 
    summarise(weighted_average_loss = sum(loss))
}

(
  l <-
  d %>% 
  select(p_grid) %>% 
  rename(decision = p_grid) %>% 
  mutate(weighted_average_loss = purrr::map(decision, make_loss)) %>% 
  unnest(weighted_average_loss) 
)
```

Now we're ready for the right panel of Figure 3.4.

```{r, fig.width = 3.5, fig.height = 3}
# this will help us find the x and y coordinates for the minimum value
min_loss <-
  l %>% 
  filter(weighted_average_loss == min(weighted_average_loss)) %>% 
  as.numeric()

# the plot
l %>%   
  ggplot(aes(x = decision, y = weighted_average_loss)) +
  geom_area(fill = "grey75") +
  geom_vline(xintercept = min_loss[1], color = "white", linetype = 3) +
  geom_hline(yintercept = min_loss[2], color = "white", linetype = 3) +
  ylab("expected proportional loss") +
  theme(panel.grid = element_blank())
```

We saved the exact minimum value as `min_loss[1]`, which is `r min_loss[1]`. Within sampling error, this is the posterior median as depicted by our `samples`.

```{r}
samples %>% 
  summarise(posterior_median = median(p_grid))
```

The quadratic loss $(d - p)^2$ suggests we should use the mean instead. Let's investigate.

```{r, fig.width = 3.5, fig.height = 3}
# amend our loss function
make_loss <- function(our_d) {
  d %>% 
    mutate(loss = posterior * (our_d - p_grid)^2) %>% 
    summarise(weighted_average_loss = sum(loss))
}

# remake our `l` data
l <-
  d %>% 
  select(p_grid) %>% 
  rename(decision = p_grid) %>% 
  mutate(weighted_average_loss = purrr::map(decision, make_loss)) %>% 
  unnest(weighted_average_loss)

# update to the new minimum loss coordinates
min_loss <-
  l %>% 
  filter(weighted_average_loss == min(weighted_average_loss)) %>% 
  as.numeric()

# update the plot
l %>%   
  ggplot(aes(x = decision, y = weighted_average_loss)) +
  geom_area(fill = "grey75") +
  geom_vline(xintercept = min_loss[1], color = "white", linetype = 3) +
  geom_hline(yintercept = min_loss[2], color = "white", linetype = 3) +
  ylab("expected proportional loss") +
  theme(panel.grid = element_blank())
```

Based on quadratic loss $(d - p)^2$, the exact minimum value is `r min_loss[1]`. Within sampling error, this is the posterior mean of our `samples`.

```{r}
samples %>% 
  summarise(posterior_meaan = mean(p_grid))
```

> Usually, research scientists don’t think about loss functions. And so any point estimate like the mean or MAP that they may report isn't intended to support any particular decision, but rather to describe the shape of the posterior. You might argue that the decision to make is whether or not to accept an hypothesis. But the challenge then is to say what the relevant costs and benefits would be, in terms of the knowledge gained or lost. Usually it's better to communicate as much as you can about the posterior distribution, as well as the data and the model itself, so that others can build upon your work. Premature decisions to accept or reject hypotheses can cost lives. (p. 61)

In the endnote (58) linked to the end of that quote in the text, McElreath wrote: "See [Hauer](https://statmodeling.stat.columbia.edu/wp-content/uploads/2013/03/1154-Hauer-The-harm-done-by-tests-of-significance.pdf) [-@hauerHarmDoneTests2004] for three tales from transportation safety in which testing resulted in premature incorrect decisions and a demonstrable and continuing loss of human life" (p. 448).

## Sampling to simulate prediction

McElreath's four good reasons for posterior simulation were

1. model checking,
2. software validation,
3. research design, and
4. forecasting.

### Dummy data.

Dummy data for the globe tossing model arise from the binomial likelihood. If you let $w$ be a count of water and $n$ be the number of tosses, the binomial likelihood is

$$\Pr(w|n, p) = \frac{n!}{w!(n - w)!} p^w (1 - p)^{n - w}$$

Letting $n = 2$, $p(w) = .7$, and $w_\text{observed} = 0 \text{ through }2$, the densities are as follows.

```{r}
tibble(n           = 2,
       probability = .7,
       w           = 0:2) %>% 
  mutate(density = dbinom(w, size = n, prob = probability))
```

If we're going to simulate, we should probably [set our seed](https://stackoverflow.com/questions/13605271/reasons-for-using-the-set-seed-function). Doing so makes the results reproducible.

```{r}
set.seed(3)
rbinom(1, size = 2, prob = .7)
```

Here are ten reproducible draws.

```{r}
set.seed(3)
rbinom(10, size = 2, prob = .7)
```

Now generate 100,000 (i.e., `1e5`) reproducible dummy observations.

```{r}
# how many would you like?
n_draws <- 1e5

set.seed(3)
d <- tibble(draws = rbinom(n_draws, size = 2, prob = .7))

d %>% 
  count(draws) %>% 
  mutate(proportion = n / nrow(d))
```

As McElreath mused in the text (p. 63), those simulated `proportion` values are very close to the analytically calculated values in our `density` column a few code blocks up.

Here's the simulation updated so $n = 9$, which we plot in our version of Figure 3.5.

```{r, fig.width = 3.5, fig.height = 3}
set.seed(3)
d <- tibble(draws = rbinom(n_draws, size = 9, prob = .7))

# the histogram
d %>% 
  ggplot(aes(x = draws)) +
  geom_histogram(binwidth = 1, center = 0,
                 color = "grey92", linewidth = 1/10) +
  scale_x_continuous("dummy water count", breaks = 0:4 * 2) +
  ylab("frequency") +
  coord_cartesian(xlim = c(0, 9)) +
  theme(panel.grid = element_blank())
```

McElreath suggested we play around with different values of `size` and `prob`. With the next block of code, we'll simulate nine conditions.

```{r}
n_draws <- 1e5

simulate_binom <- function(n, probability) {
  set.seed(3)
  rbinom(n_draws, size = n, prob = probability) 
}

d <-
  crossing(n           = c(3, 6, 9),
           probability = c(.3, .6, .9)) %>% 
  mutate(draws = map2(n, probability, simulate_binom)) %>% 
  ungroup() %>% 
  mutate(n           = str_c("n = ", n),
         probability = str_c("p = ", probability)) %>% 
  unnest(draws)

head(d)
```

Let's plot the simulation results.

```{r, fig.width = 6, fig.height = 5}
d %>% 
  ggplot(aes(x = draws)) +
  geom_histogram(binwidth = 1, center = 0,
                 color = "grey92", linewidth = 1/10) +
  scale_x_continuous("dummy water count",
                     breaks = seq(from = 0, to = 9, by = 2)) +
  ylab("frequency") +
  coord_cartesian(xlim = c(0, 9)) +
  theme(panel.grid = element_blank()) +
  facet_grid(n ~ probability)
```

### Model checking.

If you're new to applied statistics, you might be surprised how often mistakes arise.

#### Did the software work?

Let this haunt your dreams: "There is no way to really be sure that software works correctly" (p. 64).

If you'd like to dive deeper into these dark waters, check out one my favorite talks from StanCon 2018, [*Esther Williams in the Harold Holt memorial swimming pool*](https://youtu.be/pKZLJPrZLhU?t=26285), by the ineffable [Dan Simpson](https://twitter.com/dan_p_simpson). If Simpson doesn't end up drowning you, see Gabry and Simpson's talk at the Royal Statistical Society 2018, [*Visualization in Bayesian workflow*](https://youtu.be/E8vdXoJId8M), a follow-up blog [*Maybe it's time to let the old ways die; or We broke R-hat so now we have to fix it*](https://statmodeling.stat.columbia.edu/2019/03/19/maybe-its-time-to-let-the-old-ways-die-or-we-broke-r-hat-so-now-we-have-to-fix-it/), and that blog's associated paper by @vehtariRanknormalizationFoldingLocalization2021, [*Rank-normalization, folding, and localization: An improved $\widehat R$ for assessing convergence of MCMC*](https://stat.columbia.edu/~gelman/research/published/rhat.pdf).

#### Is the model adequate?

> The implied predictions of the model are uncertain in two ways, and it's important to be aware of both.
>
> First, there is observation uncertainty. For any unique value of the parameter $p$, there is a unique implied pattern of observations that the model expects. These patterns of observations are the same gardens of forking data that you explored in the previous chapter. These patterns are also what you sampled in the previous section. There is uncertainty in the predicted observations, because even if you know $p$ with certainty, you won’t know the next globe toss with certainty (unless $p = 0$ or $p = 1$).
>
> Second, there is uncertainty about $p$. The posterior distribution over $p$ embodies this uncertainty. And since there is uncertainty about $p$, there is uncertainty about everything that depends upon $p$. The uncertainty in $p$ will interact with the sampling variation, when we try to assess what the model tells us about outcomes.
>
> We'd like to *propagate* the parameter uncertainty--carry it forward--as we evaluate the implied predictions. All that is required is averaging over the posterior density for $p$, while computing the predictions. For each possible value of the parameter $p$, there is an implied distribution of outcomes. So if you were to compute the sampling distribution of outcomes at each value of $p$, then you could average all of these prediction distributions together, using the posterior probabilities of each value of $p$, to get a posterior predictive distribution. (p. 56, *emphasis* in the original)

All this is depicted in Figure 3.6. To get ready to make our version, let's first refresh our original grid approximation `d`.

```{r}
# how many grid points would you like?
n <- 1001
n_success <- 6
n_trials  <- 9

(
  d <-
  tibble(p_grid = seq(from = 0, to = 1, length.out = n),
         # note we're still using a flat uniform prior
         prior  = 1) %>% 
  mutate(likelihood = dbinom(n_success, size = n_trials, prob = p_grid)) %>% 
  mutate(posterior = (likelihood * prior) / sum(likelihood * prior))
)
```

We can make our version of the top of Figure 3.6 with a little tricky `filter`ing.

```{r, fig.width = 7, fig.height = 2}
d %>% 
  ggplot(aes(x = p_grid)) +
  geom_area(aes(y = posterior),
            color = "grey67", fill = "grey67") +
  geom_segment(data = . %>% 
                 filter(p_grid %in% c(seq(from = .1, to = .9, by = .1), 3 / 10)),
               aes(xend = p_grid,
                   y = 0, yend = posterior, linewidth = posterior),
               color = "grey33", show.legend = F) +
  geom_point(data = . %>%
               filter(p_grid %in% c(seq(from = .1, to = .9, by = .1), 3 / 10)),
             aes(y = posterior)) +
  annotate(geom = "text", x = .08, y = .0025,
           label = "Posterior probability") +
  scale_linewidth_continuous(range = c(0, 1)) +
  scale_x_continuous("probability of water", breaks = c(0:10) / 10) +
  scale_y_continuous(NULL, breaks = NULL) +
  theme(panel.grid = element_blank())
```

Note how we weighted the widths of the vertical lines by the `posterior` density.

We'll need to do a bit of wrangling before we're ready to make the plot in the middle panel of Figure 3.6.

```{r}
n_draws <- 1e5

simulate_binom <- function(probability) {
  set.seed(3)
  rbinom(n_draws, size = 9, prob = probability) 
}

d_small <-
  tibble(probability = seq(from = .1, to = .9, by = .1)) %>% 
  mutate(draws = purrr::map(probability, simulate_binom)) %>% 
  unnest(draws) %>% 
  mutate(label = str_c("p = ", probability))

head(d_small)
```

Now we're ready to plot.

```{r, fig.width = 8, fig.height = 1.75, message = F}
d_small %>%
  ggplot(aes(x = draws)) +
  geom_histogram(binwidth = 1, center = 0,
                 color = "grey92", linewidth = 1/10) +
  scale_x_continuous(NULL, breaks = seq(from = 0, to = 9, by = 3)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Sampling distributions") +
  coord_cartesian(xlim = c(0, 9)) +
  theme(panel.grid = element_blank()) +
  facet_wrap(~ label, ncol = 9) 
```

To make the plot at the bottom of Figure 3.6, we'll redefine our `samples`, this time including the `w` variable (see the R code 3.26 block in the text).

```{r}
# how many samples would you like?
n_samples <- 1e4

# make it reproducible
set.seed(3)

samples <-
  d %>% 
  sample_n(size = n_samples, weight = posterior, replace = T) %>% 
  mutate(w = purrr::map_dbl(p_grid, rbinom, n = 1, size = 9))

glimpse(samples)
```

Here's our histogram.

```{r, fig.width = 3, fig.height = 2}
samples %>% 
  ggplot(aes(x = w)) +
  geom_histogram(binwidth = 1, center = 0,
                 color = "grey92", linewidth = 1/10) +
  scale_x_continuous("number of water samples",
                     breaks = seq(from = 0, to = 9, by = 3)) +
  scale_y_continuous(NULL, breaks = NULL) +
  ggtitle("Posterior predictive distribution") +
  coord_cartesian(xlim = c(0, 9),
                  ylim = c(0, 3000)) +
  theme(panel.grid = element_blank())
```

In Figure 3.7, McElreath considered the longest sequence of the sample values. We've been using `rbinom()` with the size parameter set to 9 for our simulations. E.g.,

```{r}
rbinom(10, size = 9, prob = .6)
```

Notice this collapsed (i.e., aggregated) over the sequences within the individual sets of 9. What we need is to simulate nine individual trials many times over. For example, this

```{r}
rbinom(9, size = 1, prob = .6)
```

would be the disaggregated version of just one of the numerals returned by `rbinom()` when `size = 9`. So let's try simulating again with un-aggregated samples. We'll keep adding to our `samples` tibble. In addition to the disaggregated `draws` based on the $p$ values listed in `p_grid`, we'll also want to add a row index for each of those `p_grid` values--it'll come in handy when we plot.

```{r}
# make it reproducible
set.seed(3)

samples <-
  samples %>% 
  mutate(iter  = 1:n(),
         draws = purrr::map(p_grid, rbinom, n = 9, size = 1)) %>% 
  unnest(draws)

glimpse(samples)
```

The main action is in the `draws` column.

Now we have to count the longest sequences. The base R [`rle()` function](https://www.rdocumentation.org/packages/base/versions/3.3/topics/rle) will help with that. Consider McElreath's sequence of tosses.

```{r}
tosses <- c("w", "l", "w", "w", "w", "l", "w", "l", "w")
```

You can plug that into `rle()`.

```{r}
rle(tosses)
```

For our purposes, we're interested in `lengths`. That tells us the length of each sequences of the same value. The `3` corresponds to our run of three `w`s. The `max()` function will help us confirm it's the largest value.

```{r}
rle(tosses)$lengths %>% max()
```

Now let's apply our method to the data and plot.

```{r, fig.width = 3.5, fig.height = 3, message = F}
samples %>% 
  group_by(iter) %>% 
  summarise(longest_run_length = rle(draws)$lengths %>% max()) %>% 
  
  ggplot(aes(x = longest_run_length)) +
  geom_histogram(aes(fill = longest_run_length == 3),
                 binwidth = 1, center = 0,
                 color = "grey92", linewidth = 1/10) +
  scale_fill_viridis_d(option = "D", end = .9) +
  scale_x_continuous("longest run length", breaks = 0:3 * 3) +
  ylab("frequency") +
  coord_cartesian(xlim = c(0, 9)) +
  theme(legend.position = "none",
        panel.grid = element_blank())
```

Let's look at `rle()` again.

```{r}
rle(tosses)
```

We can use the length of the output (i.e., 7 in this example) as the numbers of switches from, in this case, "w" and "l".

```{r}
rle(tosses)$lengths %>% length()
```

With that new trick, we're ready to make the right panel of Figure 3.7.

```{r, fig.width = 3.5, fig.height = 3, message = F}
samples %>% 
  group_by(iter) %>% 
  summarise(longest_run_length = rle(draws)$lengths %>% length()) %>% 
  
  ggplot(aes(x = longest_run_length)) +
  geom_histogram(aes(fill = longest_run_length == 6),
                 binwidth = 1, center = 0,
                 color = "grey92", linewidth = 1/10) +
  scale_fill_viridis_d(option = "D", end = .9) +
  scale_x_continuous("number of switches", breaks = 0:3 * 3) +
  ylab("frequency") +
  coord_cartesian(xlim = c(0, 9)) +
  theme(legend.position = "none",
        panel.grid = element_blank())
```

## ~~Summary~~ Let's practice with brms

Open brms.

```{r, warning = F, message = F}
library(brms)
```

With brms, we'll fit the primary model of $w = 6$ and $n = 9$ much like we did in [Section 2.4.3][Markov chain Monte Carlo.].

```{r b3.1}
b3.1 <-
  brm(data = list(w = 6), 
      family = binomial(link = "identity"),
      w | trials(9) ~ 1,
      # this is a flat prior
      prior(beta(1, 1), class = Intercept, lb = 0, ub = 1),
      seed = 3,
      file = "fits/b03.01")
```

We'll learn more about the beta distribution in [Chapter 11][Monsters and Mixtures]. For now, here's the posterior summary for `b_Intercept`, the probability of a "w".

```{r}
posterior_summary(b3.1)["b_Intercept", ] %>% 
  round(digits = 2)
```

As we'll fully cover in the next chapter, `Estimate` is the posterior mean, the two `Q` columns are the quantile-based 95% intervals, and `Est.Error` is the posterior standard deviation.

Much like the way we used the `samples()` function to simulate probability values, above, we can do so with the `brms::fitted()` function. But we will have to specify `scale = "linear"` in order to return results in the probability metric. By default, `brms::fitted()` will return summary information. Since we want actual simulation draws, we'll specify `summary = F`.

```{r, warning = F}
f <-
  fitted(b3.1, 
         summary = F,
         scale = "linear") %>% 
  as_tibble() %>% 
  set_names("p")

glimpse(f)
```

By default, we have a generically-named vector `V1` of 4,000 samples. We'll explain the defaults in later chapters. For now, notice we can view these in a density.

```{r, fig.width = 7, fig.height = 2, message = F}
f %>% 
  ggplot(aes(x = p)) +
  geom_density(fill = "grey50", color = "grey50") +
  annotate(geom = "text", x = .08, y = 2.5,
           label = "Posterior probability") +
  scale_x_continuous("probability of water",
                     breaks = c(0, .5, 1),
                     limits = 0:1) +
  scale_y_continuous(NULL, breaks = NULL) +
  theme(panel.grid = element_blank())
```

Looks a lot like the posterior probability density at the top of Figure 3.6, doesn't it? Much like we did with `samples`, we can use this distribution of probabilities to predict histograms of `w` counts. With those in hand, we can make an analogue to the histogram in the bottom panel of Figure 3.6.

```{r, fig.width = 3, fig.height = 2, warning = F, message = F}
# the simulation
set.seed(3)

f <-
  f %>% 
  mutate(w = rbinom(n(), size = n_trials,  prob = p))

# the plot
f %>% 
  ggplot(aes(x = w)) +
  geom_histogram(binwidth = 1, center = 0,
                 color = "grey92", linewidth = 1/10) +
  scale_x_continuous("number of water samples", breaks = 0:3 * 3) +
  scale_y_continuous(NULL, breaks = NULL, limits = c(0, 1200)) +
  ggtitle("Posterior predictive distribution") +
  coord_cartesian(xlim = c(0, 9)) +
  theme(panel.grid = element_blank())
```

As you might imagine, we can use the output from `fitted()` to return disaggregated batches of 0's and 1's, too. And we could even use those disaggregated 0's and 1's to examine longest run lengths and numbers of switches as in the analyses for Figure 3.7. I'll leave those as exercises for the interested reader.

## Session info {-}

```{r}
sessionInfo()
```

```{r, echo = F}
rm(n, n_success, n_trials, d, n_samples, samples, p1, p2, q_80, q_10_and_90, new_d, new_samples, point_estimates, make_loss, l, min_loss, n_draws, simulate_binom, d_small, tosses, b3.1, f)
```

```{r, echo = F, message = F, warning = F, results = "hide"}
pacman::p_unload(pacman::p_loaded(), character.only = TRUE)
```