-
Notifications
You must be signed in to change notification settings - Fork 93
/
10.Rmd
2054 lines (1572 loc) · 80.9 KB
/
10.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r, echo = FALSE, cache = FALSE}
knitr::opts_chunk$set(fig.retina = 2.5)
options(width = 110)
```
# Counting and Classification
> All over the world, every day, scientists throw away information. Sometimes this is through the removal of "outliers," cases in the data that offend the model and are exiled. More routinely, counted things are converted to proportions before analysis. Why does analysis of proportions throw away information? Because 10/20 and ½ are the same proportion, one-half, but have very different sample sizes. Once converted to proportions, and treated as outcomes in a linear regression, the information about sample size has been destroyed.
>
> It's easy to retain the information about sample size. All that is needed is to model what has actually been observed, the counts instead of the proportions. [@mcelreathStatisticalRethinkingBayesian2015, p. 291]
In this chapter, we focus on the two most common types of count models: the binomial and the Poisson.
Side note: For a nice Bayesian way to accommodate outliers in your Gaussian models, check out my blog post, [*Robust linear regression with Student's $t$-distribution*](https://solomonkurz.netlify.app/blog/2019-02-02-robust-linear-regression-with-student-s-t-distribution/).
## Binomial regression
The basic binomial model follows the form
$$y \sim \operatorname{Binomial}(n, p),$$
where $y$ is some count variable, $n$ is the number of trials, and $p$ it the probability a given trial was a 1, which is sometimes termed a *success*. When $n = 1$, then $y$ is a vector of 0s and 1s. Presuming the logit link, which we just covered in [Chapter 9][Linking linear models to distributions.], models of this type are commonly termed logistic regression. When $n > 1$, and still presuming the logit link, we might call our model an aggregated logistic regression model, or more generally an aggregated binomial regression model.
### Logistic regression: Prosocial chimpanzees.
Load the @silkChimpanzeesAreIndifferent2005 `chimpanzees` data.
```{r, message = F, warning = F}
library(rethinking)
data(chimpanzees)
d <- chimpanzees
```
Switch from rethinking to brms.
```{r, message = F, warning = F}
detach(package:rethinking, unload = T)
library(brms)
rm(chimpanzees)
```
We start with the simple intercept-only logistic regression model, which follows the statistical formula
\begin{align*}
\text{pulled_left}_i & \sim \operatorname{Binomial}(1, p_i) \\
\operatorname{logit}(p_i) & = \alpha \\
\alpha & \sim \operatorname{Normal}(0, 10).
\end{align*}
In the `brm()` `formula` syntax, including a `|` bar on the left side of a formula indicates we have extra supplementary information about our criterion. In this case, that information is that each `pulled_left` value corresponds to a single trial (i.e., `trials(1)`), which itself corresponds to the $n = 1$ portion of the statistical formula, above.
```{r b10.1}
b10.1 <-
brm(data = d,
family = binomial,
pulled_left | trials(1) ~ 1,
prior = prior(normal(0, 10), class = Intercept),
seed = 10,
file = "fits/b10.01")
```
You might use `fixef()` to get a focused summary of the intercept.
```{r, message = F, warning = F}
library(tidyverse)
fixef(b10.1) %>%
round(digits = 2)
```
The `brms::inv_logit_scaled()` function will be our alternative to the `logistic()` function in rethinking. Here we use it to convert the 89% interval estimates McElreath reported on page 294.
```{r}
c(.18, .46) %>%
inv_logit_scaled()
```
Here we use it to convert our `fixef()` output (which contains 95% intervals).
```{r}
fixef(b10.1)[, -2] %>%
inv_logit_scaled()
```
With the next two chimp models, we add predictors in the usual way.
```{r b10.2}
b10.2 <-
brm(data = d,
family = binomial,
pulled_left | trials(1) ~ 1 + prosoc_left,
prior = c(prior(normal(0, 10), class = Intercept),
prior(normal(0, 10), class = b)),
seed = 10,
file = "fits/b10.02")
b10.3 <-
update(b10.2,
newdata = d,
formula = pulled_left | trials(1) ~ 1 + prosoc_left + condition:prosoc_left,
seed = 10,
file = "fits/b10.03")
```
Compute the WAIC for each model and save the results within the brmfit objects.
```{r, warning = F, message = F}
b10.1 <- add_criterion(b10.1, "waic")
b10.2 <- add_criterion(b10.2, "waic")
b10.3 <- add_criterion(b10.3, "waic")
```
Compare them with the `loo_compare()` and make sure to add the `criterion = "waic"` argument.
```{r}
w <- loo_compare(b10.1, b10.2, b10.3, criterion = "waic")
print(w, simplify = F)
```
Recall our `cbind()` trick to convert the differences from the $\text{elpd}$ metric to the WAIC metric.
```{r}
cbind(waic_diff = w[, 1] * -2,
se = w[, 2] * 2) %>%
round(digits = 2)
```
For this chapter, we'll take our color scheme from the `"Moonrise2"` palette from the [wesanderson package](https://cran.r-project.org/package=wesanderson) [@R-wesanderson].
```{r, message = F, fig.width = 3, fig.height = 1}
library(wesanderson)
wes_palette("Moonrise2")
wes_palette("Moonrise2")[1:4]
```
We'll also take a few formatting cues from Edward Tufte [-@tufteVisualDisplayQuantitative2001], courtesy of the [ggthemes package](https://cran.r-project.org/package=ggthemes). The `theme_tufte()` function will change the default font and remove some chart junk. The `theme_set()` function, below, will make these adjustments the default for all subsequent ggplot2 plots. To undo this, just execute `theme_set(theme_default())`.
```{r, message = F, warning = F}
library(ggthemes)
theme_set(
theme_default() +
theme_tufte() +
theme(plot.background = element_rect(fill = wes_palette("Moonrise2")[3],
color = wes_palette("Moonrise2")[3]))
)
```
Finally, here's our WAIC plot.
```{r, fig.width = 4.5, fig.height = 1.25}
w %>%
data.frame() %>%
rownames_to_column(var = "model") %>%
ggplot() +
geom_pointrange(aes(x = reorder(model, -waic), y = waic,
ymin = waic - se_waic,
ymax = waic + se_waic,
color = model),
shape = 16) +
scale_color_manual(values = wes_palette("Moonrise2")[c(1:2, 4)]) +
coord_flip() +
labs(title = "WAIC",
x = NULL,
y = NULL) +
theme(axis.ticks.y = element_blank(),
legend.position = "none")
```
The full model, `b10.3`, did not have the lowest WAIC value. Though note how wide those standard error bars are relative to the point estimates. There's a lot of model uncertainty there. Here are the WAIC weights.
```{r model_weights, message = F}
model_weights(b10.1, b10.2, b10.3,
weights = "waic")
```
Let's look at the parameter summaries for the theory-based model.
```{r}
print(b10.3)
```
Here's what the odds are multiplied by.
```{r, message = F}
fixef(b10.3)[2] %>%
exp()
```
Given an estimated value of 4, the probability of a pull, all else equal, would be close to 1.
```{r}
inv_logit_scaled(4)
```
Adding the coefficient, `fixef(b10.3)[2]`, would yield an even higher estimate.
```{r}
(4 + fixef(b10.3)[2]) %>%
inv_logit_scaled()
```
For our variant of Figure 10.2, we use `brms::pp_average()` in place of `rethinking::ensemble()`.
```{r, fig.height = 2.75, fig.width = 3, warning = F, message = F}
# the combined `fitted()` results of the three models weighted by their WAICs
ppa <-
pp_average(b10.1, b10.2, b10.3,
weights = "waic",
method = "fitted") %>%
as_tibble() %>%
bind_cols(b10.3$data) %>%
distinct(Estimate, Q2.5, Q97.5, condition, prosoc_left) %>%
mutate(x_axis = str_c(prosoc_left, condition, sep = "/")) %>%
mutate(x_axis = factor(x_axis, levels = c("0/0", "1/0", "0/1", "1/1"))) %>%
rename(pulled_left = Estimate)
# the empirically-based summaries
d_plot <-
d %>%
group_by(actor, condition, prosoc_left) %>%
summarise(pulled_left = mean(pulled_left)) %>%
mutate(x_axis = str_c(prosoc_left, condition, sep = "/")) %>%
mutate(x_axis = factor(x_axis, levels = c("0/0", "1/0", "0/1", "1/1")))
# the plot
ppa %>%
ggplot(aes(x = x_axis)) +
geom_smooth(aes(y = pulled_left, ymin = Q2.5, ymax = Q97.5, group = 0),
stat = "identity",
fill = wes_palette("Moonrise2")[2], color = "black",
alpha = 1, linewidth = 1/2) +
geom_line(data = d_plot,
aes(y = pulled_left, group = actor),
color = wes_palette("Moonrise2")[1], linewidth = 1/3) +
scale_x_discrete("prosoc_left/condition", expand = c(.03, .03)) +
ylab("proportion pulled left") +
coord_cartesian(ylim = 0:1) +
theme(axis.ticks.x = element_blank())
```
McElreath didn't show the actual pairs plot in the text. Here's ours using `mcmc_pairs()`.
```{r, message = F, warning = F, fig.height = 4, fig.width = 4.5}
library(bayesplot)
# this helps us set our custom color scheme
color_scheme_set(wes_palette("Moonrise2")[c(3, 1, 2, 2, 1, 1)])
# the actual plot
mcmc_pairs(x = as_draws_df(b10.3),
pars = vars(b_Intercept:`b_prosoc_left:condition`),
off_diag_args = list(size = 1/10, alpha = 1/6),
diag_fun = "dens")
```
As McElreath observed, the posterior looks multivariate Gaussian.
In equations, the next model follows the form
\begin{align*}
\text{pulled_left}_i & \sim \operatorname{Binomial}(1, p_i) \\
\operatorname{logit}(p_i) & = \alpha_{\text{actor}} + (\beta_1 + \beta_2 \text{condition}_i) \text{prosoc_left}_i \\
\alpha_{\text{actor}} & \sim \operatorname{Normal}(0, 10) \\
\beta_1 & \sim \operatorname{Normal}(0, 10) \\
\beta_2 & \sim \operatorname{Normal}(0, 10).
\end{align*}
Enclosing the `actor` variable within `factor()` will produce the indexing we need to get `actor`-specific intercepts. Also notice we're using the `0 + factor(actor)` part of the model `formula` to suppress the brms default intercept. As such, the priors for all parameters in the model will be of `class = b`. And since we're using the same Gaussian prior for each, we only need one line for the `prior` argument.
```{r b10.4}
b10.4 <-
brm(data = d, family = binomial,
pulled_left | trials(1) ~ 0 + factor(actor) + prosoc_left + condition:prosoc_left ,
prior(normal(0, 10), class = b),
iter = 2500, warmup = 500, chains = 2, cores = 2,
control = list(adapt_delta = 0.9),
seed = 10,
file = "fits/b10.04")
```
Within the tidyverse, the `distinct()` function returns the information you'd otherwise get from `unique()`.
```{r}
d %>%
distinct(actor)
```
With a single-level model like this, we have no need to use something like `depth=2` for our posterior summary.
```{r}
print(b10.4)
```
Correspondingly, `brms::as_draws_df()` returns an object for `b10.4` that doesn't quite follow the same structure as from `rethinking::extract.samples()`.
```{r}
post <- as_draws_df(b10.4)
post %>%
glimpse()
```
Here's our variant of Figure 10.3, the $\alpha$ density for `actor == 2`.
```{r, fig.height = 2.75, fig.width = 3}
post %>%
ggplot(aes(x = b_factoractor2)) +
geom_density(color = "transparent",
fill = wes_palette("Moonrise2")[1]) +
scale_x_continuous(NULL, limits = c(0, NA)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = "Actor 2's large and uncertain intercept",
subtitle = "Once your log-odds are above, like, 4, it's all\npretty much a probability of 1.")
```
Figure 10.4. shows the idiographic trajectories for four of our chimps.
```{r, fig.height = 4, fig.width = 6, warning = F, message = F}
# subset the `d_plot` data
d_plot_4 <-
d_plot %>%
filter(actor %in% c(3, 5:7)) %>%
ungroup() %>%
mutate(actor = str_c("actor ", actor))
# compute the model-implied estimates with `fitted()` and wrangle
f <-
fitted(b10.4) %>%
as_tibble() %>%
bind_cols(b10.4$data) %>%
filter(actor %in% c(3, 5:7)) %>%
distinct(Estimate, Q2.5, Q97.5, condition, prosoc_left, actor) %>%
select(actor, everything()) %>%
mutate(actor = str_c("actor ", actor),
x_axis = str_c(prosoc_left, condition, sep = "/")) %>%
mutate(x_axis = factor(x_axis, levels = c("0/0", "1/0", "0/1", "1/1"))) %>%
rename(pulled_left = Estimate)
# plot
f %>%
ggplot(aes(x = x_axis, y = pulled_left, group = actor)) +
geom_smooth(aes(ymin = Q2.5, ymax = Q97.5),
stat = "identity",
fill = wes_palette("Moonrise2")[2], color = "black",
alpha = 1, linewidth = 1/2) +
geom_line(data = d_plot_4,
color = wes_palette("Moonrise2")[1], linewidth = 1.25) +
scale_x_discrete("prosoc_left/condition", expand = c(.03, .03)) +
scale_y_continuous("proportion pulled left",
breaks = c(0, .5, 1), limits = c(0, 1)) +
theme(axis.ticks.x = element_blank(),
panel.background = element_rect(fill = alpha("white", .075),
color = "transparent")) +
facet_wrap(~actor)
```
McElreath mused: "There are a number of loose ends with this analysis. Does model [`b10.4`], with its 6 additional parameters, still look good after estimating overfitting with WAIC?" (p. 302). We won't be following along with the practice problems at the end of the chapter, but we may as well just check the WAIC real quick.
```{r, warning = F, message = F}
b10.4 <- add_criterion(b10.4, "waic")
loo_compare(b10.1, b10.2, b10.3, b10.4, criterion = "waic") %>%
print(simplify = F)
```
Here are the updated WAIC weights.
```{r}
model_weights(b10.1, b10.2, b10.3, b10.4, weights = "waic") %>%
round(digits = 2)
```
Both the WAIC differences and the WAIC weights suggest our 9-parameter `b10.4` was substantially better than the previous models, even after correcting for overfitting. Some times group averages aren't good enough. When you have data with many occasions within cases, fitting models that allow for individual differences is generally the way to go.
#### Overthinking: Using the ~~by~~ `group_by()` function.
Instead of focusing on base R, let's work within the tidyverse. If you wanted to compute the proportion of trials `pulled_left == 1` for each combination of `prosoc_left`, `condition`, and chimp `actor`, you'd put those last three variables within `group_by()` and then compute the `mean()` of `pulled_left` within `summarise()`.
```{r, message = F}
d %>%
group_by(prosoc_left, condition, actor) %>%
summarise(`proportion pulled_left` = mean(pulled_left))
```
And since we're working within the tidyverse, that operation returns a tibble rather than a list.
### Aggregated binomial: Chimpanzees again, condensed.
With the tidyverse, we use `group_by()` and `summarise()` to achieve what McElreath did with `aggregate()`.
```{r, message = F}
d_aggregated <-
d %>%
select(-recipient, -block, -trial, -chose_prosoc) %>%
group_by(actor, condition, prosoc_left) %>%
summarise(x = sum(pulled_left))
d_aggregated %>%
filter(actor %in% c(1, 2))
```
To fit an aggregated binomial model in brms, we augment the `<criterion> | trials()` syntax where the value that goes in `trials()` is either a fixed number, as in this case, or variable in the data indexing $n$. Either way, at least some of those trials will have an $n > 1$. Here we'll use the hard-code method, just like McElreath did in the text.
```{r b10.5}
b10.5 <-
brm(data = d_aggregated,
family = binomial,
x | trials(18) ~ 1 + prosoc_left + condition:prosoc_left,
prior = c(prior(normal(0, 10), class = Intercept),
prior(normal(0, 10), class = b)),
iter = 2500, warmup = 500, cores = 2, chains = 2,
seed = 10,
file = "fits/b10.05")
```
We might compare `b10.3` with `b10.5` like this.
```{r}
fixef(b10.3) %>% round(digits = 2)
fixef(b10.5) %>% round(digits = 2)
```
A coefficient plot can offer a complimentary perspective.
```{r, fig.width = 4.5, fig.height = 2.5, warning = F, message = F}
# wrangle
bind_rows(
fixef(b10.3) %>%
data.frame() %>%
rownames_to_column("term"),
fixef(b10.5) %>%
data.frame() %>%
rownames_to_column("term")
) %>%
mutate(fit = rep(c("b10.3", "b10.5"), each = n() / 2)) %>%
# plot
ggplot() +
geom_pointrange(aes(x = fit, y = Estimate,
ymin = Q2.5, ymax = Q97.5,
color = term),
shape = 16) +
scale_color_manual(values = wes_palette("Moonrise2")[c(1:2, 4)]) +
labs(x = NULL, y = NULL) +
coord_flip() +
theme(axis.ticks.y = element_blank(),
legend.position = "none") +
facet_wrap(~term, ncol = 1)
```
The two are close within simulation error.
### Aggregated binomial: Graduate school admissions.
Load the infamous `UCBadmit` data [see @bickelSexBiasGraduate1975].
```{r, message = F, warning = F}
# detach(package:brms)
library(rethinking)
data(UCBadmit)
d <- UCBadmit
```
Switch from rethinking to brms.
```{r, message = F, warning = F}
detach(package:rethinking, unload = T)
library(brms)
rm(UCBadmit)
d
```
Now compute our newly-constructed dummy variable, `male`.
```{r}
d <-
d %>%
mutate(male = ifelse(applicant.gender == "male", 1, 0))
```
The univariable logistic model with `male` as the sole predictor of `admit` follows the form
\begin{align*}
n_{\text{admit}_i} & \sim \operatorname{Binomial}(n_i, p_i) \\
\operatorname{logit}(p_i) & = \alpha + \beta \text{male}_i \\
\alpha & \sim \operatorname{Normal}(0, 10) \\
\beta & \sim \operatorname{Normal}(0, 10).
\end{align*}
The second model omits the `male` predictor.
```{r b10.6}
b10.6 <-
brm(data = d,
family = binomial,
admit | trials(applications) ~ 1 + male ,
prior = c(prior(normal(0, 10), class = Intercept),
prior(normal(0, 10), class = b)),
iter = 2500, warmup = 500, cores = 2, chains = 2,
seed = 10,
# this line isn't usually necessary,
# but it will help with our `loo_moment_match()` code below
save_pars = save_pars(all = T),
file = "fits/b10.06")
b10.7 <-
brm(data = d,
family = binomial,
admit | trials(applications) ~ 1,
prior(normal(0, 10), class = Intercept),
iter = 2500, warmup = 500, cores = 2, chains = 2,
seed = 10,
save_pars = save_pars(all = T),
file = "fits/b10.07")
```
Compute the information criteria for each model and save the results within the brmfit objects.
```{r, message = F}
b10.6 <- add_criterion(b10.6, "waic")
b10.7 <- add_criterion(b10.7, "waic")
```
Here's the WAIC comparison.
```{r}
w <- loo_compare(b10.6, b10.7, criterion = "waic")
print(w, simplify = F)
```
If you prefer the difference in the WAIC metric, use our `cbind()` conversion method from above. Here are the WAIC weights.
```{r}
model_weights(b10.6, b10.7, weights = "waic") %>%
round(digits = 2)
```
**Bonus: Information criteria digression.**
Let's see what happens if we switch to the LOO.
```{r}
b10.6 <- add_criterion(b10.6, "loo")
b10.7 <- add_criterion(b10.7, "loo")
```
If you just ape the text and use the WAIC, everything appears fine. But holy smokes look at those nasty warning messages from the `loo()`! One of the frightening but ultimately handy things about working with the PSIS-LOO is that it requires we estimate a Pareto $k$ parameter, which you can learn all about in the `loo-package` section of the [loo reference manual](https://cran.r-project.org/package=loo/loo.pdf) [@loo2022RM]. As it turns out, the Pareto $k$ [can be used as a diagnostic tool](https://cran.r-project.org/package=loo/vignettes/loo2-example.html#plotting-pareto-k-diagnostics) [@vehtariUsingLooPackage2022]. Each case in the data gets its own $k$ value and we like it when those $k$'s are low. The makers of the loo package get worried when those $k$'s exceed 0.7 and as a result, `loo()` spits out a warning message when they do.
First things first, if you explicitly open the loo package, you'll have access to some handy diagnostic functions.
```{r, message = F, warning = F}
library(loo)
```
We'll be leveraging those $k$ values with the `pareto_k_table()` and `pareto_k_ids()` functions. Both functions take objects created by the `loo()` or `psis()` functions. So, before we can get busy, we'll first make two objects with the `loo()`.
```{r}
l_b10.6 <- loo(b10.6)
l_b10.7 <- loo(b10.7)
```
There are those warning messages, again. Using the loo-object for model `b10.6`, which we've named `l_b10.6`, let's take a look at the `pareto_k_table()` function.
```{r}
pareto_k_table(l_b10.6)
```
You may have noticed that this same table pops out when you just do something like `loo(b10.6)`. Recall that this data set has 12 observations (i.e., execute `count(d)`). With `pareto_k_table()`, we see how the Pareto $k$ values have been categorized into bins ranging from "good" to "very bad". Clearly, we like low $k$'s. In this example, our observations are all over the place, with `r pareto_k_ids(l_b10.6, threshold = 0.7) %>% length()` in the "bad" to "very bad" $k$ ranges. We can take a closer look like this:
```{r, fig.width = 4.5, fig.height = 3.5}
plot(l_b10.6)
```
So when you `plot()` a loo object, you get a nice diagnostic plot for those $k$ values, ordered by observation number. Our plot indicates cases 1, 2, 3, 11, and 12 had "very bad" $k$ values for this model. If we wanted to further verify to ourselves which observations those were, we'd use the `pareto_k_ids()` function.
```{r}
pareto_k_ids(l_b10.6, threshold = 1)
```
Note our use of the `threshold` argument. Play around with different values to see how it works. If you want an explicit look at those $k$ values, you execute something like this.
```{r}
l_b10.6$diagnostics
```
The `pareto_k` values can be used to examine cases that are overly-influential on the model parameters, something like a Cook's $D_{i}$. See, for example [this discussion on stackoverflow.com](https://stackoverflow.com/questions/39578834/linear-model-diagnostics-for-bayesian-models-using-rstan/39595436) in which several members of the [Stan team](https://mc-stan.org/) weighed in. The issue is also discussed in @vehtariPracticalBayesianModel2017 and in [this presentation by Aki Vehtari](https://youtu.be/FUROJM3u5HQ).
Anyway, the implication of all this is these values suggest model `b10.6` isn't a great fit for these data.
Part of the warning message for model `b10.6` read:
> It is recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.
Let's do that using the `brms::loo_moment_match()` function.
```{r l_b10.6_mm, warning = F, message = F, results = 'hide'}
l_b10.6_mm <- loo_moment_match(b10.6, loo = l_b10.6)
```
Check the results.
```{r}
l_b10.6_mm
```
Now that looks better. We'll do the same thing for model `b10.7`.
```{r l_b10.7_mm, warning = F, message = F, results = 'hide'}
l_b10.7_mm <- loo_moment_match(b10.7, loo = l_b10.7)
```
Okay, let's compare models with formal $\text{elpd}_\text{loo}$ differences before and after adjusting with the moment match approach.
```{r}
loo_compare(l_b10.6, l_b10.7)
loo_compare(l_b10.6_mm, l_b10.7_mm)
```
In this case, the results are similar. The standard errors for the differences are huge compared to the point estimates, suggesting large uncertainty. Watch out for this in your real-world data analyses, though.
But this has all been a tangent from the central thrust of this section.
**Back from our information criteria digression.**
Let's get back on track with the text. Here's a look at `b10.6`, the unavailable model.
```{r}
print(b10.6)
```
Here's the relative difference in admission odds.
```{r, warning = F, message = F}
fixef(b10.6)[2] %>%
exp() %>%
round(digits = 2)
```
And now we'll compute difference in admission probabilities.
```{r}
post <- as_draws_df(b10.6)
post %>%
mutate(p_admit_male = inv_logit_scaled(b_Intercept + b_male),
p_admit_female = inv_logit_scaled(b_Intercept)) %>%
mutate(diff_admit = p_admit_male - p_admit_female) %>%
summarise(`2.5%` = quantile(diff_admit, probs = .025),
`50%` = median(diff_admit),
`97.5%` = quantile(diff_admit, probs = .975))
```
Instead of the `summarise()` code, we could have also used `tidybayes::median_qi(diff_admit)`. It's good to have options. Here's our version of Figure 10.5.
```{r, fig.height = 2.5, fig.width = 4, message = F}
d <-
d %>%
mutate(case = factor(1:12))
p <-
predict(b10.6) %>%
as_tibble() %>%
bind_cols(d)
text <-
d %>%
group_by(dept) %>%
summarise(case = mean(as.numeric(case)),
admit = mean(admit / applications) + .05)
ggplot(data = d, aes(x = case, y = admit / applications)) +
geom_pointrange(data = p,
aes(y = Estimate / applications,
ymin = Q2.5 / applications ,
ymax = Q97.5 / applications),
color = wes_palette("Moonrise2")[1],
shape = 1, alpha = 1/3) +
geom_point(color = wes_palette("Moonrise2")[2]) +
geom_line(aes(group = dept),
color = wes_palette("Moonrise2")[2]) +
geom_text(data = text,
aes(y = admit, label = dept),
color = wes_palette("Moonrise2")[2],
family = "serif") +
labs(title = "Posterior validation check",
y = "Proportion admitted") +
coord_cartesian(ylim = c(0, 1)) +
theme(axis.ticks.x = element_blank())
```
As alluded to in all that LOO/`pareto_k` talk, above, this is not a great fit. So we'll ditch the last model paradigm for one that answers the new question "*What is the average difference in probability of admission between females and males within departments?*" (p. 307). The statistical formula for the full model follows the form
\begin{align*}
n_{\text{admit}_i} & \sim \operatorname{Binomial}(n_i, p_i) \\
\operatorname{logit}(p_i) & = \alpha_{\text{dept}_i} + \beta \text{male}_i \\
\alpha_{\text{dept}} & \sim \operatorname{Normal}(0, 10) \\
\beta & \sim \operatorname{Normal}(0, 10).
\end{align*}
We don't need to coerce an index like McElreath did in the text. But here are the models.
```{r b10.8}
b10.8 <-
brm(data = d,
family = binomial,
admit | trials(applications) ~ 0 + dept,
prior = prior(normal(0, 10), class = b),
iter = 2500, warmup = 500, cores = 2, chains = 2,
seed = 10,
file = "fits/b10.08")
b10.9 <-
update(b10.8,
newdata = d,
formula = admit | trials(applications) ~ 0 + dept + male,
seed = 10,
file = "fits/b10.09")
```
Let's make two more `loo()` objects, this time using the `reloo = TRUE` approach.
```{r reloo_b10.8_and_b10.9, message = F, warning = F, results = 'hide'}
l_b10.8_reloo <- loo(b10.8, reloo = T)
l_b10.9_reloo <- loo(b10.9, reloo = T)
```
Now compare them.
```{r}
loo_compare(l_b10.6_mm, l_b10.7_mm, l_b10.8_reloo, l_b10.9_reloo)
```
Here are the LOO weights.
```{r}
model_weights(b10.6, b10.7, b10.8, b10.9,
weights = "loo") %>%
round(digits = 3)
```
The parameters summaries for our multivariable model, `b10.9`, look like this.
```{r}
fixef(b10.9) %>% round(digits = 2)
```
And on the proportional odds scale, the posterior mean for `b_male` is:
```{r}
fixef(b10.9)[7, 1] %>% exp()
```
Since we've been using brms, there's no need to fit our version of McElreath's `m10.9stan`. We already have that in our `b10.9`. But just for kicks and giggles, here's another way to get the model summary.
```{r}
b10.9$fit
```
Here's our version of Figure 10.6, the posterior validation check.
```{r, fig.height = 2.5, fig.width = 4}
predict(b10.9) %>%
as_tibble() %>%
bind_cols(d) %>%
ggplot(aes(x = case, y = admit / applications)) +
geom_pointrange(aes(y = Estimate / applications,
ymin = Q2.5 / applications ,
ymax = Q97.5 / applications),
color = wes_palette("Moonrise2")[1],
shape = 1, alpha = 1/3) +
geom_point(color = wes_palette("Moonrise2")[2]) +
geom_line(aes(group = dept),
color = wes_palette("Moonrise2")[2]) +
geom_text(data = text,
aes(y = admit, label = dept),
color = wes_palette("Moonrise2")[2],
family = "serif") +
labs(title = "Posterior validation check",
y = "Proportion admitted") +
coord_cartesian(ylim = c(0, 1)) +
theme(axis.ticks.x = element_blank())
```
The model predictions are imperfect, but way more valid than before. The posterior looks reasonably multivariate Gaussian.
```{r, message = F, warning = F, fig.height = 7, fig.width = 7.5}
pairs(b10.9,
off_diag_args = list(size = 1/10, alpha = 1/6))
```
#### Rethinking: Simpson's paradox is not a paradox.
> This empirical example is a famous one in statistical teaching. It is often used to illustrate a phenomenon known as **Simpson's paradox**. Like most paradoxes, there is no violation of logic, just of intuition. And since different people have different intuition, Simpson's paradox means different things to different people. The poor intuition being violated in this case is that a positive association in the entire population should also hold within each department. (p. 309, **emphasis** in the original)
In my field of clinical psychology, Simpson's paradox is an important, if under-appreciated, phenomenon. If you're in the social sciences as well, I highly recommend spending more time thinking about it. To get you started, I blogged about it [here](https://solomonkurz.netlify.app/blog/2019-10-09-individuals-are-not-small-groups-i-simpson-s-paradox/) and @kievitSimpsonParadoxPsychological2013 wrote a great tutorial paper called [*Simpson's paradox in psychological science: a practical guide*](https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00513/full).
#### Overthinking: WAIC and aggregated binomial models.
McElreath wrote:
> The `WAIC` function in `rethinking` detects aggregated binomial models and automatically splits them apart into 0/1 Bernoulli trials, for the purpose of calculating WAIC. It does this, because WAIC is computed point by point (see [Chapter 6][Overthinking: WAIC calculation.]). So what you define as a "point" affects WAIC's value. In an aggregated binomial each "point" is a bunch of independent trials that happen to share the same predictor values. In order for the disaggregated and aggregated models to agree, it makes sense to use the disaggregated representation. (p. 309)
To my knowledge, `brms::waic()` and `brms::loo()` do not do this, which might well be why some of our values didn't match up with those in the text. If you have additional insight on this, please [share with the rest of the class](https://github.com/ASKurz/Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse/issues).
### Fitting binomial regressions with `glm()`.
We're not here to learn frequentist code, so we're going to skip most of this section. But model `m.good` is worth fitting. Here are the data.
```{r}
# outcome and predictor almost perfectly associated
y <- c(rep(0, 10), rep(1, 10))
x <- c(rep(-1, 9), rep(1, 11))
```
We are going to depart from McElreath's naming convention in the text and call this fit `b10.good`. It'll make it easier to find in this project's [`fits` folder](https://github.com/ASKurz/Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse/tree/master/fits).
```{r b10.good}
b10.good <-
brm(data = list(y = y, x = x),
family = binomial,
y | trials(1) ~ 1 + x,
prior = c(prior(normal(0, 10), class = Intercept),
prior(normal(0, 10), class = b)),
seed = 10,
file = "fits/b10.good")
```
Our model summary will differ a bit from the one in the text. It seems this is because of the MAP/HMC contrast and our choice of priors.
```{r}
print(b10.good)
```
You might experiment with different prior $\textit{SD}$'s to see how they influence the posterior $\textit{SD}$'s. Anyways, here's the `pairs()` plot McElreath excluded from the text.
```{r, fig.width = 3.25, fig.height = 3}
pairs(b10.good,
off_diag_args = list(size = 1/10, alpha = 1/6))
```
That posterior, my friends, is not multivariate Gaussian. The plot deserves an extensive quote from McElreath.
> Inspecting the pairs plot (~~not~~ shown) demonstrates just how subtle even simple models can be, once we start working with GLMs. I don't say this to scare the reader. But it's true that even simple models can behave in complicated ways. *How you fit the model is part of the model, and in principle no GLM is safe for MAP estimation*. (p. 311, *emphasis* added)
## Poisson regression
> When a binomial distribution has a very small probability of an event $p$ and a very large number of trials $n$, then it takes on a special shape. The expected value of a binomial distribution is just $np$, and its variance is $np(1 - p)$. But when $n$ is very large and $p$ is very small, then these are approximately the same. (p. 311)
Data of this kind are often called count data. Here we simulate some.
```{r}
set.seed(10) # make the results reproducible
tibble(y = rbinom(1e5, 1000, 1/1000)) %>%
summarise(y_mean = mean(y),
y_variance = var(y))
```
Yes, those statistics are virtually the same. When dealing with Poisson data, $\mu = \sigma^2$. When you have a number of trials for which $n$ is unknown or much larger than seen in the data, the Poisson likelihood is a useful tool. We define it as
$$y \sim \operatorname{Poisson}(\lambda).$$
As $\lambda$ expresses both mean and variance because, within this model, the variance scales right along with the mean. Since $\lambda$ is constrained to be positive, we typically use the log link. Thus the basic Poisson regression model is
\begin{align*}
y_i & \sim \operatorname{Poisson}(\lambda_i) \\
\log (\lambda_i) & = \alpha + \beta x_i,
\end{align*}
where all model parameters receive priors following the forms we've been practicing. We read further:
> The parameter $\lambda$ is the expected value, but it's also commonly thought of as a rate. Both interpretations are correct, and realizing this allows to make Poisson models for which the exposure varies across cases $i$...
>
> Implicitly, $\lambda$ is equal to an expected number of events, $\mu$, per unit of time or distance, $\tau$. This implies that $\lambda = \mu/\tau$, which lets us redefine the link:
>
> \begin{align*}
> y_i & \sim \operatorname{Poisson}(\lambda_i) \\
> \log \lambda_i & = \log \frac{\mu_i}{\tau_i} = \alpha + \beta x_i
> \end{align*}
>
> Since the logarithm of a ratio is the same as a difference of logarithms, we can also write:
>
> $$\log \lambda_i = \log \mu_i - \log \tau_i = \alpha + \beta x_i$$
>
> These $\tau$ values are the "exposures." So if different observations $i$ have different exposures, this implies that the expected value on row $i$ is given by:
>
> $$\log \mu_i = \log \tau_i + \alpha + \beta x_i$$
>
> When $\tau_i = 1$, then $\log \tau_i = 0$ and we're back where we started. But when the exposure varies across cases, then $\tau_i$ does the important work of correctly scaling the expected number of events for each case $i$. (pp. 312--313)
### Example: Oceanic tool complexity.
Load the `Kline` data [see @klinePopulationSizePredicts2010].
```{r, message = F, warning = F}
library(rethinking)
data(Kline)
d <- Kline
```
Switch from rethinking to brms.
```{r, message = F, warning = F}
detach(package:rethinking, unload = T)
library(brms)
rm(Kline)
d
```
Here are our new columns.
```{r}
d <-
d %>%
mutate(log_pop = log(population),
contact_high = ifelse(contact == "high", 1, 0))
```
Our statistical model will follow the form
\begin{align*}
\text{total_tools}_i & \sim \operatorname{Poisson}(\lambda_i) \\
\log (\lambda_i) & = \alpha + \beta_1 \text{log_pop}_i + \beta_2 \text{contact_high}_i + \beta_3 \text{contact_high}_i \times \text{log_pop}_i \\
\alpha & \sim \operatorname{Normal}(0, 100) \\
\beta_1 & \sim \operatorname{Normal}(0, 1) \\
\beta_2 & \sim \operatorname{Normal}(0, 1) \\
\beta_3 & \sim \operatorname{Normal}(0, 1).
\end{align*}
The only new thing in our model code is `family = poisson`. brms defaults to the `log()` link.
```{r b10.10}
b10.10 <-
brm(data = d,
family = poisson,
total_tools ~ 1 + log_pop + contact_high + contact_high:log_pop,
prior = c(prior(normal(0, 100), class = Intercept),
prior(normal(0, 1), class = b)),
iter = 3000, warmup = 1000, chains = 4, cores = 4,
seed = 10,
file = "fits/b10.10")
```
```{r}
print(b10.10)
```
We can use `vcov()` to extract the correlation matrix for the parameters.
```{r}
vcov(b10.10, cor = T) %>%
round(digits = 2)
```
And here's the coefficient plot via `bayesplot::mcmc_intervals()`.
```{r, fig.width = 7, fig.height = 1.25}
post <- as_draws_df(b10.10)
# we'll set a renewed color theme
color_scheme_set(wes_palette("Moonrise2")[c(2, 1, 4, 2, 1, 1)])
post %>%
rename(b_interaction = `b_log_pop:contact_high`) %>%
mcmc_intervals(pars = vars(starts_with("b_")),
prob = .5,
prob_outer = .95) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank())
```
How plausible is it a high-contact island will have more tools than a low-contact island?
```{r}
post <-
post %>%
mutate(lambda_high = exp(b_Intercept + b_contact_high + (b_log_pop + `b_log_pop:contact_high`) * 8),
lambda_low = exp(b_Intercept + b_log_pop * 8)) %>%
mutate(diff = lambda_high - lambda_low)
post %>%
summarise(sum = sum(diff > 0) / length(diff))
```
Quite, it turns out. Behold the corresponding Figure 10.8.a.
```{r, fig.width = 3, fig.height = 2.9}
post %>%
ggplot(aes(x = diff)) +