-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path14.Rmd
1216 lines (942 loc) · 55 KB
/
14.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Ch 14. Missing Data and Other Opportunities"
author: "A Solomon Kurz"
date: "`r format(Sys.Date())`"
output:
github_document
---
```{r set-options_14, echo = FALSE, cache = FALSE}
knitr::opts_chunk$set(fig.retina = 2.5)
options(width = 100)
```
# Missing Data and Other Opportunities
For the opening example, we're playing with the conditional probability
$$
\text{Pr(burnt down | burnt up)} = \frac{\text{Pr(burnt up, burnt down)}}{\text{Pr(burnt up)}}.
$$
It works out that
$$
\text{Pr(burnt down | burnt up)} = \frac{1/3}{1/2} = \frac{2}{3}.
$$
We might express the math in the middle of page 423 in tibble form like this.
```{r, warning = F, message = F}
library(tidyverse)
p_pancake <- 1/3
(
d <-
tibble(pancake = c("BB", "BU", "UU"),
p_burnt = c(1, .5, 0)) %>%
mutate(p_burnt_up = p_burnt * p_pancake)
)
d %>%
summarise(`p (burnt_down | burnt_up)` = p_pancake / sum(p_burnt_up))
```
I understood McElreath's simulation better after breaking it apart. The first part of `sim_pancake()` takes one random draw from the integers 1, 2, and 3. It just so happens that if we set `set.seed(1)`, the code returns a 1.
```{r}
set.seed(1)
sample(x = 1:3, size = 1)
```
So here's what it looks like if we use seeds `2:11`.
```{r}
take_sample <- function(seed) {
set.seed(seed)
sample(x = 1:3, size = 1)
}
tibble(seed = 2:11) %>%
mutate(value_returned = map_dbl(seed, take_sample))
```
Each of those `value_returned` values stands for one of the three pancakes: 1 = BB, 2 = BU, and 3 = UU. In the next line, McElreath made slick use of a matrix to specify that. Here's what the matrix looks like.
```{r}
matrix(c(1, 1, 1, 0, 0, 0), nrow = 2, ncol = 3)
```
See how the three columns are identified as `[,1]`, `[,2]`, and `[,3]`? If, say, we wanted to subset the values in the second column, we'd execute
```{r}
matrix(c(1, 1, 1, 0, 0, 0), nrow = 2, ncol = 3)[, 2]
```
which returns a numeric vector.
```{r}
matrix(c(1, 1, 1, 0, 0, 0), nrow = 2, ncol = 3)[, 2] %>% str()
```
And that `1 0` corresponds to the pancake with one burnt (i.e., 1) and one unburnt (i.e., 0) side. So when McElreath then executed `sample(sides)`, he randomly sampled from one of those two values. In the case of `pancake == 2`, he randomly sampled one the pancake with one burnt and one unburnt side. Had he sampled from `pancake == 1`, he would have sampled from the pancake with both sides burnt.
Going forward, let's amend McElreath's `sim_pancake()` function so it will take a `seed` argument, which will allow us to make the output reproducible.
```{r}
# simulate a `pancake` and return randomly ordered `sides`
sim_pancake <- function(seed) {
set.seed(seed)
pancake <- sample(x = 1:3, size = 1)
sides <- matrix(c(1, 1, 1, 0, 0, 0), nrow = 2, ncol = 3)[, pancake]
sample(sides)
}
```
Let's take this baby for a whirl.
```{r, echo = F}
# save(list = c("n_sim", "d"), file = "sims/14.sim_1.rda")
load("sims/14.sim_1.rda")
```
```{r, eval = F}
# how many simulations would you like?
n_sim <- 1e4
d <-
tibble(seed = 1:n_sim) %>%
mutate(burnt = map(seed, sim_pancake)) %>%
unnest(burnt) %>%
mutate(side = rep(c("up", "down"), times = n() / 2))
```
Take a look at what we've done.
```{r}
head(d, n = 10)
```
And now we'll `spread()` and `summarise()` to get the value we've been working for.
```{r}
d %>%
spread(key = side, value = burnt) %>%
summarise(`p (burnt_down | burnt_up)` = sum(up == 1 & down == 1) / (sum(up)))
```
The results are within rounding error of the ideal 2/3.
> Probability theory is not difficult mathematically. It's just counting. But it is hard to interpret and apply. Doing so often seems to require some cleverness, and authors have an incentive to solve problems in clever ways, just to show off. But we don’t need that cleverness, if we ruthlessly apply conditional probability...
>
> In this chapter, [we'll] meet two commonplace applications of this assume-and-deduce strategy. The first is the incorporation of measurement error into our models. The second is the estimation of missing data through Bayesian imputation...
>
> In neither application do [we] have to intuit the consequences of measurement errors nor the implications of missing values in order to design the models. All [we] have to do is state [the] information about the error or about the variables with missing values. Logic does the rest. (p. 424)
## Measurement error
Let's grab those `WaffleDivorce` data from back in [Chapter 5][Spurious associations].
```{r, message = F, warning = F}
library(rethinking)
data(WaffleDivorce)
d <- WaffleDivorce
rm(WaffleDivorce)
```
Switch out rethinking for brms.
```{r, message = F}
detach(package:rethinking, unload = T)
library(brms)
```
The brms package currently supports `theme_black()`, which changes the default ggplot2 theme to a black background with white lines, text, and so forth. The origins of this version of the code are from [this post](https://jonlefcheck.net/2013/03/11/black-theme-for-ggplot2-2/) on Jon Lefcheck's website.
Though I like the idea of brms including `theme_black()`, I'm not a fan of some of the default settings (e.g., it includes gridlines). Happily, data scientist [Tyler Rinker](https://github.com/trinker) has some nice alternative `theme_black()` code you can find [here](https://github.com/trinker/plotflow/blob/master/R/theme_black.R). The version of `theme_black()` used for this chapter is based on his version, with a few amendments of my own.
```{r}
theme_black <-
function(base_size=12, base_family="") {
theme_grey(base_size=base_size, base_family=base_family) %+replace%
theme(
# specify axis options
axis.line=element_blank(),
# all text colors used to be "grey55"
axis.text.x=element_text(size=base_size*0.8, color="grey85",
lineheight=0.9, vjust=1),
axis.text.y=element_text(size=base_size*0.8, color="grey85",
lineheight=0.9,hjust=1),
axis.ticks=element_line(color="grey55", size = 0.2),
axis.title.x=element_text(size=base_size, color="grey85", vjust=1,
margin=ggplot2::margin(.5, 0, 0, 0, "lines")),
axis.title.y=element_text(size=base_size, color="grey85", angle=90,
margin=ggplot2::margin(.5, 0, 0, 0, "lines"), vjust=0.5),
axis.ticks.length=grid::unit(0.3, "lines"),
# specify legend options
legend.background=element_rect(color=NA, fill="black"),
legend.key=element_rect(color="grey55", fill="black"),
legend.key.size=grid::unit(1.2, "lines"),
legend.key.height=NULL,
legend.key.width=NULL,
legend.text=element_text(size=base_size*0.8, color="grey85"),
legend.title=element_text(size=base_size*0.8, face="bold",hjust=0,
color="grey85"),
# legend.position="right",
legend.position = "none",
legend.text.align=NULL,
legend.title.align=NULL,
legend.direction="vertical",
legend.box=NULL,
# specify panel options
panel.background=element_rect(fill="black", color = NA),
panel.border=element_rect(fill=NA, color="grey55"),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.spacing=grid::unit(0.25,"lines"),
# specify facetting options
strip.background=element_rect(fill = "black", color="grey10"), # fill="grey30"
strip.text.x=element_text(size=base_size*0.8, color="grey85"),
strip.text.y=element_text(size=base_size*0.8, color="grey85",
angle=-90),
# specify plot options
plot.background=element_rect(color="black", fill="black"),
plot.title=element_text(size=base_size*1.2, color="grey85", hjust = 0), # added hjust = 0
plot.subtitle=element_text(size=base_size*.9, color="grey85", hjust = 0), # added line
# plot.margin=grid::unit(c(1, 1, 0.5, 0.5), "lines")
plot.margin=grid::unit(c(0.5, 0.5, 0.5, 0.5), "lines")
)
}
```
One way to use our `theme_black()` is to make it part of the code for an individual plot, such as `ggplot() + geom_point() + theme_back()`. Another way is to make `theme_black()` the default setting with `ggplot2::theme_set()`. That's the method we'll use.
```{r, message = F, warning = F}
theme_set(theme_black())
# to reset the default ggplot2 theme to its default parameters,
# execute `theme_set(theme_default())`
```
In the [brms reference manual](https://cran.r-project.org/package=brms/brms.pdf), Bürkner recommended complimenting `theme_black()` with color scheme "C" from the [viridis package](https://github.com/sjmgarnier/viridis), which provides a variety of [colorblind-safe color palettes](https://cran.r-project.org/package=viridis/vignettes/intro-to-viridis.html).
```{r, message = F, warning = F}
# install.packages("viridis")
library(viridis)
```
The `viridis_pal()` function gives a list of colors within a given palette. The colors in each palette fall on a spectrum. Within `viridis_pal()`, the `option` argument allows one to select a given spectrum, "C", in our case. The final parentheses, `()`, allows one to determine how many discrete colors one would like to break the spectrum up by. We'll choose 7.
```{r}
viridis_pal(option = "C")(7)
```
With a little data wrangling, we can put the colors of our palette in a tibble and display them in a plot.
```{r, fig.height = 2, fig.width = 4}
tibble(number = 1:7,
color_number = str_c(1:7, ". ", viridis_pal(option = "C")(7))) %>%
ggplot(aes(x = factor(0), y = reorder(color_number, number))) +
geom_tile(aes(fill = factor(number))) +
geom_text(aes(color = factor(number), label = color_number)) +
scale_color_manual(values = c(rep("black", times = 4),
rep("white", times = 3))) +
scale_fill_viridis(option = "C", discrete = T, direction = -1) +
scale_x_discrete(NULL, breaks = NULL) +
scale_y_discrete(NULL, breaks = NULL) +
ggtitle("Behold: viridis C!")
```
Now, let's make use of our custom theme and reproduce/reimagine Figure 14.1.a.
```{r}
color <- viridis_pal(option = "C")(7)[7]
p1 <-
d %>%
ggplot(aes(x = MedianAgeMarriage,
y = Divorce,
ymin = Divorce - Divorce.SE,
ymax = Divorce + Divorce.SE)) +
geom_pointrange(shape = 20, alpha = 2/3, color = color) +
labs(x = "Median age marriage" ,
y = "Divorce rate")
```
Notice how `viridis_pal(option = "C")(7)[7]` called the seventh color in the color scheme, `"#F0F921FF"`. For Figure 14.1.b, we'll select the sixth color in the palette by coding `viridis_pal(option = "C")(7)[6]`. We'll then combine the two subplots with patchwork.
```{r, fig.width = 7.25, fig.height = 3.5}
color <- viridis_pal(option = "C")(7)[6]
p2 <-
d %>%
ggplot(aes(x = log(Population),
y = Divorce,
ymin = Divorce - Divorce.SE,
ymax = Divorce + Divorce.SE)) +
geom_pointrange(shape = 20, alpha = 2/3, color = color) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("log population")
library(patchwork)
p1 | p2
```
Just like in the text, our plot shows states with larger populations tend to have smaller measurement error. The relation between measurement error and `MedianAgeMarriage` is less apparent.
### Error on the outcome.
To get a better sense of what we're about to do, imagine for a moment that each state's divorce rate is normally distributed with a mean of `Divorce` and standard deviation `Divorce.SE`. Those distributions would be:
```{r}
d %>%
mutate(Divorce_distribution = str_c("Divorce ~ Normal(", Divorce, ", ", Divorce.SE, ")")) %>%
select(Loc, Divorce_distribution) %>%
head()
```
As in the text,
> in [the following] example we'll use a Gaussian distribution with mean equal to the observed value and standard deviation equal to the measurement's standard error. This is the logical choice, because if all we know about the error is its standard deviation, then the maximum entropy distribution for it will be Gaussian...
>
> Here's how to define the distribution for each divorce rate. For each observed value $D_{\text{OBS},i}$, there will be one parameter, $D_{\text{EST},i}$, defined by:
>
> $$D_{\text{OBS},i} \sim \text{Normal} (D_{\text{EST},i}, D_{\text{SE},i})$$
>
> All this does is define the measurement $D_{\text{OBS},i}$ as having the specified Gaussian distribution centered on the unknown parameter $D_{\text{EST},i}$. So the above defines a probability for each State $i$'s observed divorce rate, given a known measurement error. (pp. 426--427)
Now we're ready to fit some models. In brms, there are at least two ways to accommodate measurement error in the criterion. The first way uses the `se()` syntax, following the form `<response> | se(<se_response>, sigma = TRUE)`. With this syntax, `se` stands for standard error, the loose frequentist analogue to the Bayesian posterior $SD$. Unless you're [fitting a meta-analysis](https://mvuorre.github.io/post/2016/09/29/meta-analysis-is-a-special-case-of-bayesian-multilevel-modeling/) on summary information, which we'll be doing at the end of this chapter, make sure to specify `sigma = TRUE`. Without that you'll have no posterior for $\sigma$! For more information on the `se()` method, go to the [brms reference manual](https://cran.r-project.org/package=brms/brms.pdf) and find the *Additional response information* subsection of the `brmsformula` section.
The second way uses the `mi()` syntax, following the form `<response> | mi(<se_response>)`. This follows a missing data logic, resulting in Bayesian missing data imputation for the criterion values. The `mi()` syntax is based on the newer missing data capabilities for brms. We will cover that in more detail in the second half of this chapter.
We'll start off using both methods. Our first model, `b14.1_se`, will follow the `se()` syntax; the second model, `b14.1_mi`, will follow the `mi()` syntax.
```{r b14.1_se}
# put the data into a `list()`
dlist <- list(
div_obs = d$Divorce,
div_sd = d$Divorce.SE,
R = d$Marriage,
A = d$MedianAgeMarriage)
# here we specify the initial (i.e., starting) values
inits <- list(Yl = dlist$div_obs)
inits_list <- list(inits, inits)
# fit the models
b14.1_se <-
brm(data = dlist,
family = gaussian,
div_obs | se(div_sd, sigma = TRUE) ~ 0 + Intercept + R + A,
prior = c(prior(normal(0, 10), class = b),
prior(cauchy(0, 2.5), class = sigma)),
iter = 5000, warmup = 1000, cores = 2, chains = 2,
seed = 14,
control = list(adapt_delta = 0.99,
max_treedepth = 12),
inits = inits_list,
file = "fits/b14.01_se")
b14.1_mi <-
brm(data = dlist,
family = gaussian,
div_obs | mi(div_sd) ~ 0 + Intercept + R + A,
prior = c(prior(normal(0, 10), class = b),
prior(cauchy(0, 2.5), class = sigma)),
iter = 5000, warmup = 1000, cores = 2, chains = 2,
seed = 14,
control = list(adapt_delta = 0.99,
max_treedepth = 12),
save_mevars = TRUE, # note this line for the `mi()` model
inits = inits_list,
file = "fits/b14.01_mi")
```
Before we dive into the model summaries, notice how the starting values (i.e., `inits`) differ by model. Even though we coded `inits = inits_list` for both models, the differ by `fit@inits`.
```{r}
b14.1_se$fit@inits
b14.1_mi$fit@inits
```
As we explore further, it should become apparent why. Here are the primary model summaries.
```{r}
print(b14.1_se)
print(b14.1_mi)
```
Based on the `print()`/`summary()` information, the main parameters for the models are about the same. However, the plot deepens when we summarize the models with the `broom::tidy()` method.
```{r}
library(broom)
tidy(b14.1_se) %>%
mutate_if(is.numeric, round, digits = 2)
tidy(b14.1_mi) %>%
mutate_if(is.numeric, round, digits = 2)
# you can get similar output with `b14.1_mi$fit`
```
Again, from `b_Intercept` to `sigma`, the output is about the same. But model `b14.1_mi`, based on the `mi()` syntax, contained posterior summaries for all 50 of the criterion values. The `se()` method gave us similar model result, but no posterior summaries for the 50 criterion values. The rethinking package indexed those additional 50 as `div_est[i]`; with the `mi()` method, brms indexed them as `Yl[i]`--no big deal. So while both brms methods accommodated measurement error, the `mi()` method appears to be the brms analogue to what McElreath did with his model `m14.1` in the text. Thus, it's our `b14.1_mi` model that follows the form
\begin{align*}
\text{Divorce}_{\text{estimated}, i} & \sim \text{Normal} (\mu_i, \sigma) \\
\mu & = \alpha + \beta_1 \text A_i + \beta_2 \text R_i \\
\text{Divorce}_{\text{observed}, i} & \sim \text{Normal} (\text{Divorce}_{\text{estimated}, i}, \text{Divorce}_{\text{standard error}, i}) \\
\alpha & \sim \text{Normal} (0, 10) \\
\beta_1 & \sim \text{Normal} (0, 10) \\
\beta_2 & \sim \text{Normal} (0, 10) \\
\sigma & \sim \text{HalfCauchy} (0, 2.5).
\end{align*}
*Note*. The `normal(0, 10)` prior McElreath used was [quite informative and can lead to discrepancies between the rethinking and brms results](https://github.com/paul-buerkner/brms/issues/114) if you're not careful. A large issue is the default way brms handles intercept priors. From the hyperlink, Bürkner wrote:
> The formula for the original intercept is
`b_intercept = temp_intercept - dot_product(means_X, b)`, where `means_X` is the vector of means of the predictor variables and b is the vector of regression coefficients (fixed effects). That is, when transforming a prior on the intercept to an "equivalent" prior on the temporary intercept, you have to take the means of the predictors and well as the priors on the other coefficients into account.
If this seems confusing, you have an alternative. The `0 + intercept` part of the brm formula kept the intercept in the metric of the untransformed data, leading to similar results to those from rethinking. When your priors are vague, this might not be much of an issue. And since many of the models in *Statistical Rethinking* use only weakly-regularizing priors, this hasn't been much of an issue up to this point. But this model is quite sensitive to the intercept syntax. My general recommendation for applied data analysis is this: **If your predictors aren't mean centered, default to the** `0 + intercept` **syntax for the** `formula` **argument when using** `brms::brm()`. Otherwise, your priors might not be doing what you think they're doing.
Anyway, since our `mi()`-syntax `b14.1_mi` model appears to be the analogue to McElreath's `m14.1`, we'll use that one for our plots. Here's the code for our Figure 14.2.a.
```{r, fig.width = 4, fig.height = 3.5, warning = F}
data_error <-
fitted(b14.1_mi) %>%
as_tibble() %>%
bind_cols(d)
color <- viridis_pal(option = "C")(7)[5]
p1 <-
data_error %>%
ggplot(aes(x = Divorce.SE, y = Estimate - Divorce)) +
geom_hline(yintercept = 0, linetype = 2, color = "white") +
geom_point(alpha = 2/3, size = 2, color = color) +
labs(x = "Observed standard error for divorce",
y = "Divorce (estimate - observed)")
```
Before we make Figure 14.2.b, we need to fit a model that ignores measurement error.
```{r b14.1b}
b14.1b <-
brm(data = dlist,
family = gaussian,
div_obs ~ 0 + Intercept + R + A,
prior = c(prior(normal(0, 50), class = b, coef = Intercept),
prior(normal(0, 10), class = b),
prior(cauchy(0, 2.5), class = sigma)),
chains = 2, iter = 5000, warmup = 1000, cores = 2,
seed = 14,
control = list(adapt_delta = 0.95),
file = "fits/b14.01b")
```
```{r}
print(b14.1b)
```
With the ignore-measurement-error fit in hand, we're ready for Figure 14.2.b.
```{r, fig.width = 7.5, fig.height = 3.5}
nd <-
tibble(R = mean(d$Marriage),
A = seq(from = 22, to = 30.2, length.out = 30),
div_sd = mean(d$Divorce.SE))
# red line
f_error <-
fitted(b14.1_mi, newdata = nd) %>%
as_tibble() %>%
bind_cols(nd)
# yellow line
f_no_error <-
fitted(b14.1b, newdata = nd) %>%
as_tibble() %>%
bind_cols(nd)
# white dots
data_error <-
fitted(b14.1_mi) %>%
as_tibble() %>%
bind_cols(b14.1_mi$data)
color_y <- viridis_pal(option = "C")(7)[7]
color_r <- viridis_pal(option = "C")(7)[4]
# plot
p2 <-
f_no_error %>%
ggplot(aes(x = A, y = Estimate)) +
# `f_no_error`
geom_smooth(aes(ymin = Q2.5, ymax = Q97.5),
stat = "identity",
fill = color_y, color = color_y,
alpha = 1/4, size = 1/2, linetype = 2) +
# `f_error`
geom_smooth(data = f_error,
aes(ymin = Q2.5, ymax = Q97.5),
stat = "identity",
fill = color_r, color = color_r,
alpha = 1/3, size = 1/2, linetype = 1) +
geom_pointrange(data = data_error,
aes(ymin = Estimate - Est.Error,
ymax = Estimate + Est.Error),
color = "white", shape = 20, alpha = 1/2) +
scale_y_continuous(breaks = seq(from = 4, to = 14, by = 2)) +
labs(x = "Median age marriage" ,
y = "Divorce rate (posterior)") +
coord_cartesian(xlim = range(data_error$A),
ylim = c(4, 15))
p1 | p2
```
In our plot on the right, it's the reddish regression line that accounts for measurement error.
### Error on both outcome and predictor.
In brms, you can specify error on predictors with an `me()` statement in the form of `me(predictor, sd_predictor)` where `sd_predictor` is a vector in the data denoting the size of the measurement error, presumed to be in a standard-deviation metric.
```{r b14.2_se}
# the data
dlist <- list(
div_obs = d$Divorce,
div_sd = d$Divorce.SE,
mar_obs = d$Marriage,
mar_sd = d$Marriage.SE,
A = d$MedianAgeMarriage)
# the `inits`
inits <- list(Yl = dlist$div_obs)
inits_list <- list(inits, inits)
# the models
b14.2_se <-
brm(data = dlist,
family = gaussian,
div_obs | se(div_sd, sigma = TRUE) ~ 0 + Intercept + me(mar_obs, mar_sd) + A,
prior = c(prior(normal(0, 10), class = b),
prior(cauchy(0, 2.5), class = sigma)),
iter = 5000, warmup = 1000, chains = 3, cores = 3,
seed = 14,
control = list(adapt_delta = 0.95),
save_mevars = TRUE, # note the lack if `inits`
file = "fits/b14.02_se")
b14.2_mi <-
brm(data = dlist,
family = gaussian,
div_obs | mi(div_sd) ~ 0 + Intercept + me(mar_obs, mar_sd) + A,
prior = c(prior(normal(0, 10), class = b),
prior(cauchy(0, 2.5), class = sigma)),
iter = 5000, warmup = 1000, cores = 2, chains = 2,
seed = 14,
control = list(adapt_delta = 0.99,
max_treedepth = 12),
save_mevars = TRUE,
inits = inits_list,
file = "fits/b14.02_mi")
```
We already know including `inits` values for our `Yl[i]` estimates is a waste of time for our `se()` model. But note how we still defined our `inits` values as `inits <- list(Yl = dlist$div_obs)` for the `mi()` model. Although it's easy in brms to set the starting values for our `Yl[i]` estimates, much the way McElreath did, that is not the case when you have measurement error on the predictors. The brms package uses a non-centered parameterization for these, which requires users to have a deeper understanding of the underlying Stan code. This is where I get off the train, but if you want to go further, execute `stancode(b14.2_mi)`.
Here are the two versions of the model.
```{r}
print(b14.2_se)
print(b14.2_mi)
```
We'll use `broom::tidy()`, again, to get a sense of `depth=2` summaries.
```{r, results = 'hide'}
tidy(b14.2_se) %>%
mutate_if(is.numeric, round, digits = 2)
tidy(b14.2_mi) %>%
mutate_if(is.numeric, round, digits = 2)
```
Due to space concerns, I'm not going to show the results, here. You can do that on your own. Both methods yielded the posteriors for `Xme_memar_obs[1]`, but only the `b14.2_mi` model based on the `mi()` syntax yielded posteriors for the criterion, the `Yl[i]` summaries.
Note that you'll need to specify `save_mevars = TRUE` in the `brm()` function in order to save the posterior samples of error-adjusted variables obtained by using the `me()` argument. Without doing so, functions like `predict()` may give you trouble.
Here is the code for Figure 14.3.a.
```{r}
data_error <-
fitted(b14.2_mi) %>%
as_tibble() %>%
bind_cols(d)
color <- viridis_pal(option = "C")(7)[3]
p1 <-
data_error %>%
ggplot(aes(x = Divorce.SE, y = Estimate - Divorce)) +
geom_hline(yintercept = 0, linetype = 2, color = "white") +
geom_point(alpha = 2/3, size = 2, color = color) +
labs(x = "Observed standard error for marriage rate",
y = "Marriage rate (estimate - observed)")
```
To get the posterior samples for error-adjusted `Marriage` rate, we'll use `posterior_samples`. If you examine the object with `glimpse()`, you'll notice 50 `Xme_memar_obsmar_sd[i]` vectors, with $i$ ranging from 1 to 50, each corresponding to one of the 50 states. With a little data wrangling, you can get the mean of each to put in a plot. Once we have those summaries, we can make our version of Figure 14.4.b.
```{r, message = F, warning = F, fig.width = 7.5, fig.height = 3.5}
color_y <- viridis_pal(option = "C")(7)[7]
color_p <- viridis_pal(option = "C")(7)[2]
p2 <-
posterior_samples(b14.2_mi) %>%
select(starts_with("Xme")) %>%
gather() %>%
# this extracts the numerals from the otherwise cumbersome names in `key` and saves them as integers
mutate(key = str_extract(key, "\\d+") %>% as.integer()) %>%
group_by(key) %>%
summarise(mean = mean(value)) %>%
bind_cols(data_error) %>%
ggplot(aes(x = mean, y = Estimate)) +
geom_segment(aes(xend = Marriage, yend = Divorce),
color = "white", size = 1/4) +
geom_point(size = 2, alpha = 2/3, color = color_y) +
geom_point(aes(x = Marriage, y = Divorce),
size = 2, alpha = 2/3, color = color_p) +
scale_y_continuous(breaks = seq(from = 4, to = 14, by = 2)) +
labs(x = "Marriage rate (posterior)" ,
y = "Divorce rate (posterior)") +
coord_cartesian(ylim = c(4, 14.5))
p1 | p2
```
In the right panel, the yellow points are model-implied; the purple ones are of the original data. It turns out our brms model regularized more aggressively than McElreath's rethinking model. I'm unsure of why. If you understand the difference, [please share with the rest of the class](https://github.com/ASKurz/Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse/issues).
Anyway,
> the big take home point for this section is that when you have a distribution of values, don't reduce it down to a single value to use in a regression. Instead, use the entire distribution. Anytime we use an average value, discarding the uncertainty around that average, we risk overconfidence and spurious inference. This doesn’t only apply to measurement error, but also to cases which data are averaged before analysis.
>
> Do not average. Instead, model. (p. 431)
## Missing data
Starting with [version 2.2.0](https://cran.r-project.org/package=brms/news/news.html) brms now supports Bayesian missing data imputation using adaptations of the [multivariate syntax](https://cran.r-project.org/package=brms/vignettes/brms_multivariate.html). Bürkner's [*Handle Missing Values with brms* vignette](https://cran.r-project.org/package=brms/vignettes/brms_missings.html) is quite helpful.
### Imputing `neocortex`
Once again, here are the `milk` data.
```{r, message = F, warning = F}
library(rethinking)
data(milk)
d <- milk
d <-
d %>%
mutate(neocortex.prop = neocortex.perc / 100,
logmass = log(mass))
```
Now we'll switch out rethinking for brms and do a little data wrangling.
```{r, message = F, warning = F}
detach(package:rethinking, unload = T)
library(brms)
rm(milk)
# prep data
data_list <-
list(kcal = d$kcal.per.g,
neocortex = d$neocortex.prop,
logmass = d$logmass)
```
Here's the structure of our data list.
```{r}
data_list
```
Our statistical model follows the form
\begin{align*}
\text{kcal}_i & \sim \text{Normal} (\mu_i, \sigma) \\
\mu_i & = \alpha + \beta_1 \text{neocortex}_i + \beta_2 \text{logmass}_i \\
\text{neocortex}_i & \sim \text{Normal} (\nu, \sigma_\text{neocortex}) \\
\alpha & \sim \text{Normal} (0, 100) \\
\beta_1 & \sim \text{Normal} (0, 10) \\
\beta_2 & \sim \text{Normal} (0, 10) \\
\sigma & \sim \text{HalfCauchy} (0, 1) \\
\nu & \sim \text{Normal} (0.5, 1) \\
\sigma_\text{neocortex} & \sim \text{HalfCauchy} (0, 1).
\end{align*}
If you look closely, you'll discover the prior McElreath reported in the model equation for the intercept, $\alpha \sim \text{Normal} (0, 10)$, does not match up with the prior he used in R code 14.7, `a ~ dnorm(0,100)`. Here we use the latter.
When writing a multivariate model in brms, I find it easier to save the model code by itself and then insert it into the `brm()` function. Otherwise, things get cluttered in a hurry.
```{r}
b_model <-
# here's the primary `kcal` model
bf(kcal ~ 1 + mi(neocortex) + logmass) +
# here's the model for the missing `neocortex` data
bf(neocortex | mi() ~ 1) +
# here we set the residual correlations for the two models to zero
set_rescor(FALSE)
```
Note the `mi(neocortex)` syntax in the `kcal` model. This indicates that the predictor, `neocortex`, has missing values that are themselves being modeled.
To get a sense of how to specify the priors for such a model, use the `get_prior()` function.
```{r}
get_prior(data = data_list,
family = gaussian,
b_model)
```
With the one-step Bayesian imputation procedure in brms, you might need to use the `resp` argument when specifying non-defaut priors.
Anyway, here we fit the model.
```{r b14.3}
b14.3 <-
brm(data = data_list,
family = gaussian,
b_model, # here we insert the model
prior = c(prior(normal(0, 100), class = Intercept, resp = kcal),
prior(normal(0.5, 1), class = Intercept, resp = neocortex),
prior(normal(0, 10), class = b, resp = kcal),
prior(cauchy(0, 1), class = sigma, resp = kcal),
prior(cauchy(0, 1), class = sigma, resp = neocortex)),
iter = 1e4, chains = 2, cores = 2,
seed = 14,
file = "fits/b14.03")
```
The imputed `neocortex` values are indexed by occasion number from the original data.
```{r}
tidy(b14.3) %>%
mutate_if(is.numeric, round, digits = 2)
```
Here's the model that drops the cases with NAs on `neocortex`.
```{r b14.3cc}
b14.3cc <-
brm(data = data_list,
family = gaussian,
kcal ~ 1 + neocortex + logmass,
prior = c(prior(normal(0, 100), class = Intercept),
prior(normal(0, 10), class = b),
prior(cauchy(0, 1), class = sigma)),
iter = 1e4, chains = 2, cores = 2,
seed = 14,
file = "fits/b14.03cc")
```
The parameters:
```{r}
tidy(b14.3cc) %>%
mutate_if(is.numeric, round, digits = 2)
```
In order to make our versions of Figure 14.4, we'll need to do a little data wrangling with `fitted()`.
```{r, warning = F}
nd <-
tibble(neocortex = seq(from = .5, to = .85, length.out = 30),
logmass = median(data_list$logmass))
f_b14.3 <-
fitted(b14.3, newdata = nd) %>%
as_tibble() %>%
bind_cols(nd)
f_b14.3 %>%
glimpse()
```
To include the imputed `neocortex` values in the plot, we'll extract the information from `broom::tidy()`.
```{r}
f_b14.3_mi <-
tidy(b14.3) %>%
filter(str_detect(term, "Ymi")) %>%
bind_cols(data_list %>%
as_tibble() %>%
filter(is.na(neocortex)))
f_b14.3_mi %>% head()
```
Data wrangling done--here's our code for Figure 14.4.a.
```{r}
color <- viridis_pal(option = "D")(7)[4]
p1 <-
f_b14.3 %>%
ggplot(aes(x = neocortex)) +
geom_smooth(aes(y = Estimate.kcal, ymin = Q2.5.kcal, ymax = Q97.5.kcal),
stat = "identity",
fill = color, color = color, alpha = 1/3, size = 1/2) +
geom_point(data = data_list %>% as_tibble(),
aes(y = kcal),
color = "white") +
geom_point(data = f_b14.3_mi,
aes(x = estimate, y = kcal),
color = color, shape = 1) +
geom_segment(data = f_b14.3_mi,
aes(x = lower, xend = upper,
y = kcal, yend = kcal),
color = color, size = 1/4) +
labs(subtitle = "Note: For the regression line in this plot,\nlog(mass) has been set to its median, 1.244.",
x = "neocortex proportion",
y = "kcal per gram") +
coord_cartesian(xlim = c(.55, .8),
ylim = range(data_list$kcal, na.rm = T))
```
Here we make Figure 14.4.b, combine it with Figure 14.4.a, and plot.
```{r, fig.width = 7.5, fig.height = 3.75, warning = F}
color <- viridis_pal(option = "D")(7)[4]
p2 <-
data_list %>%
as_tibble() %>%
ggplot(aes(x = logmass, y = neocortex)) +
geom_point(color = "white") +
geom_pointrange(data = f_b14.3_mi,
aes(y = estimate, ymin = lower, ymax = upper),
color = color, size = 1/3, shape = 1) +
scale_x_continuous("log(mass)", breaks = -2:4) +
ylab("neocortex proportion") +
coord_cartesian(xlim = range(data_list$logmass, na.rm = T),
ylim = c(.55, .8))
p1 | p2
```
### Improving the imputation model
Like McElreath, we'll update the imputation line of our statistical model to:
\begin{align*}
\text{neocortex}_i & \sim \text{Normal} (\nu_i, \sigma_\text{neocortex}) \\
\nu_i & = \alpha_\text{neocortex} + \gamma_1 \text{logmass}_i,
\end{align*}
which includes the updated priors
\begin{align*}
\alpha_\text{neocortex} & \sim \text{Normal} (0.5, 1) \\
\gamma_1 & \sim \text{Normal} (0, 10).
\end{align*}
As far as the brms code goes, adding `logmass` as a predictor to the `neocortex` submodel is pretty simple.
```{r b14.4}
# define the model
b_model <-
bf(kcal ~ 1 + mi(neocortex) + logmass) +
bf(neocortex | mi() ~ 1 + logmass) + # here's the big difference
set_rescor(FALSE)
# fit the model
b14.4 <-
brm(data = data_list,
family = gaussian,
b_model,
prior = c(prior(normal(0, 100), class = Intercept, resp = kcal),
prior(normal(0.5, 1), class = Intercept, resp = neocortex),
prior(normal(0, 10), class = b, resp = kcal),
prior(normal(0, 10), class = b, resp = neocortex),
prior(cauchy(0, 1), class = sigma, resp = kcal),
prior(cauchy(0, 1), class = sigma, resp = neocortex)),
iter = 1e4, chains = 2, cores = 2,
seed = 14,
file = "fits/b14.04")
```
Behold the parameter estimates.
```{r}
tidy(b14.4) %>%
mutate_if(is.numeric, round, digits = 2)
```
Here's our pre-Figure 14.5 data wrangling.
```{r}
f_b14.4 <-
fitted(b14.4, newdata = nd) %>%
as_tibble() %>%
bind_cols(nd)
f_b14.4_mi <-
tidy(b14.4) %>%
filter(str_detect(term, "Ymi")) %>%
bind_cols(
data_list %>%
as_tibble() %>%
filter(is.na(neocortex))
)
f_b14.4 %>%
glimpse()
f_b14.4_mi %>%
glimpse()
```
For our final plots, let's play around with colors from `viridis_pal(option = "D")`. Here's the code for Figure 14.5.a.
```{r, fig.width = 4, fig.height = 3.75, warning = F}
color <- viridis_pal(option = "D")(7)[3]
p1 <-
f_b14.4 %>%
ggplot(aes(x = neocortex)) +
geom_smooth(aes(y = Estimate.kcal, ymin = Q2.5.kcal, ymax = Q97.5.kcal),
stat = "identity",
fill = color, color = color, alpha = 1/2, size = 1/2) +
geom_point(data = data_list %>% as_tibble(),
aes(y = kcal),
color = "white") +
geom_point(data = f_b14.4_mi,
aes(x = estimate, y = kcal),
color = color, shape = 1) +
geom_segment(data = f_b14.4_mi,
aes(x = lower, xend = upper,
y = kcal, yend = kcal),
color = color, size = 1/4) +
labs(subtitle = "Note: For the regression line in this plot,\nlog(mass) has been set to its median, 1.244.",
x = "neocortex proportion",
y = "kcal per gram") +
coord_cartesian(xlim = c(.55, .8),
ylim = range(data_list$kcal, na.rm = T))
```
Make the code for Figure 14.5.b, combine it with Figure 14.5.a, and plot.
```{r, fig.width = 7.5, fig.height = 3.75, warning = F}
color <- viridis_pal(option = "D")(7)[3]
p2 <-
data_list %>%
as_tibble() %>%
ggplot(aes(x = logmass, y = neocortex)) +
geom_point(color = "white") +
geom_pointrange(data = f_b14.4_mi,
aes(y = estimate, ymin = lower, ymax = upper),
color = color, size = 1/3, shape = 1) +
scale_x_continuous("log(mass)", breaks = -2:4) +
ylab("neocortex proportion") +
coord_cartesian(xlim = range(data_list$logmass, na.rm = T),
ylim = c(.55, .8))
p1 | p2
```
If modern missing data methods are new to you, you might also check out van Burren's great online text, [*Flexible Imputation of Missing Data. Second Edition*](https://stefvanbuuren.name/fimd/). I'm also a fan of Enders's [*Applied Missing Data Analysis*](http://www.appliedmissingdata.com), for which you can find a free sample chapter [here](http://www.appliedmissingdata.com/sample-chapter.pdf). I'll also quickly mention that [brms accommodates multiple imputation](https://cran.r-project.org/package=brms/vignettes/brms_missings.html), too.
## ~~Summary~~ Bonus: Meta-analysis
If your mind isn't fully blown by those measurement-error and missing-data models, let's keep building. As it turns out, meta-analyses are often just special kinds of multilevel measurement-error models. Thus, you can use `brms::brm()` to fit Bayesian meta-analyses, too.
Before we proceed, I should acknowledge that this section is heavily influenced by [Matti Vourre](https://mvuorre.github.io/#about)'s great blog post, [*Meta-analysis is a special case of Bayesian multilevel modeling*](https://mvuorre.github.io/post/2016/09/29/meta-analysis-is-a-special-case-of-bayesian-multilevel-modeling/). And since McElreath's text doesn’t directly address meta-analyses, we'll also have to borrow a bit from Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin's [*Bayesian data analysis, Third edition*](https://stat.columbia.edu/~gelman/book/). We'll let Gelman and colleagues introduce the topic:
> Discussions of meta-analysis are sometimes imprecise about the estimands of interest in the analysis, especially when the primary focus is on testing the null hypothesis of no effect in any of the studies to be combined. Our focus is on estimating meaningful parameters, and for this objective there appear to be three possibilities, accepting the overarching assumption that the studies are comparable in some broad sense. The first possibility is that we view the studies as identical replications of each other, in the sense we regard the individuals in all the studies as independent samples from a common population, with the same outcome measures and so on. A second possibility is that the studies are so different that the results of any one study provide no information about the results of any of the others. A third, more general, possibility is that we regard the studies as exchangeable but not necessarily either identical or completely unrelated; in other words we allow differences from study to study, but such that the differences are not expected *a priori* to have predictable effects favoring one study over another.... this third possibility represents a continuum between the two extremes, and it is this exchangeable model (with unknown hyperparameters characterizing the population distribution) that forms the basis of our Bayesian analysis…
>
> The first potential estimand of a meta-analysis, or a hierarchically structured problem in general, is the mean of the distribution of effect sizes, since this represents the overall 'average' effect across all studies that could be regarded as exchangeable with the observed studies. Other possible estimands are the effect size in any of the observed studies and the effect size in another, comparable (exchangeable) unobserved study. (pp. 125—126, *emphasis* in the original)
The basic version of a Bayesian meta-analysis follows the form
$$y_i \sim \text{Normal}(\theta_i, \sigma_i),$$
where $y_i$ = the point estimate for the effect size of a single study, $i$, which is presumed to have been a draw from a Normal distribution centered on $\theta_i$. The data in meta-analyses are typically statistical summaries from individual studies. The one clear lesson from this chapter is that those estimates themselves come with error and those errors should be fully expressed in the meta-analytic model. Which we do. The standard error from study $i$ is specified $\sigma_i$, which is also a stand-in for the standard deviation of the Normal distribution from which the point estimate was drawn. Do note, we’re not estimating $\sigma_i$, here. Those values we take directly from the original studies.
Building on the model, we further presume that study $i$ is itself just one draw from a population of related studies, each of which have their own effect sizes. As such. we presume $\theta_i$ itself has a distribution following the form
$$\theta_i \sim \text{Normal} (\mu, \tau),$$
where $\mu$ is the meta-analytic effect (i.e., the population mean) and $\tau$ is the variation around that mean, what you might also think of as $\sigma_\tau$.
Since there's no example of a meta-analysis in the text, we'll have to get our data elsewhere. We'll focus on Gershoff and Grogan-Kaylor's (2016) paper, [*Spanking and child outcomes: Old controversies and new meta-analyses*](https://pdfs.semanticscholar.org/0d03/a2e9f085f0a268b4c0a52f5ac31c17a3e5f3.pdf). From their introduction, we read:
> Around the world, most children (80%) are spanked or otherwise physically punished by their parents ([UNICEF, 2014](https://www.unicef.org/publications/index_74865.html)). The question of whether parents should spank their children to correct misbehaviors sits at a nexus of arguments from ethical, religious, and human rights perspectives both in the U.S. and around the world ([Gershoff, 2013](https://onlinelibrary.wiley.com/doi/abs/10.1111/cdep.12038)). Several hundred studies have been conducted on the associations between parents' use of spanking or physical punishment and children's behavioral, emotional, cognitive, and physical outcomes, making spanking one of the most studied aspects of parenting. What has been learned from these hundreds of studies? (p. 453)
Our goal will be to learn Bayesian meta-analysis by answering part of that question. I've transcribed the values directly from Gershoff and Grogan-Kaylor’s paper and saved them as a file called `spank.xlsx`.
You can find the data in [this project's GitHub repository](https://github.com/ASKurz/Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse). Let's load them and `glimpse()`.
```{r}
spank <- readxl::read_excel("spank.xlsx")
glimpse(spank)
```
In this paper, the effect size of interest is a Cohen's $d$, derived from the formula
$$d = \frac{\mu_\text{treatment} - \mu_\text{comparison}}{\sigma_\text{pooled}},$$
where
$$\sigma_\text{pooled} = \sqrt{\frac{((n_1 - 1) \sigma_1^2) + ((n_2 - 1) \sigma_2^2)}{n_1 + n_2 -2}}.$$