-
Notifications
You must be signed in to change notification settings - Fork 12
/
06-statistical-testing.Rmd
869 lines (654 loc) · 52 KB
/
06-statistical-testing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
# Statistical testing {#c06-statistical-testing}
\index{Statistical testing|(} \index{Hypothesis testing|see {Statistical testing}}
```{r}
#| label: stattest-styler
#| include: false
knitr::opts_chunk$set(tidy = 'styler')
```
::: {.prereqbox-header}
`r if (knitr:::is_html_output()) '### Prerequisites {- #prereq6}'`
:::
::: {.prereqbox data-latex="{Prerequisites}"}
For this chapter, load the following packages:
```{r}
#| label: stattest-setup
#| error: FALSE
#| warning: FALSE
#| message: FALSE
library(tidyverse)
library(survey)
library(srvyr)
library(srvyrexploR)
library(broom)
library(gt)
library(prettyunits)
```
We are using data from ANES and RECS described in Chapter \@ref(c04-getting-started). As a reminder, here is the code to create the design objects for each to use throughout this chapter. For ANES, we need to adjust the weight so it sums to the population instead of the sample (see the ANES documentation and Chapter \@ref(c04-getting-started) for more information).
```{r}
#| label: stattest-anes-des
#| eval: FALSE
targetpop <- 231592693
anes_adjwgt <- anes_2020 %>%
mutate(Weight = Weight / sum(Weight) * targetpop)
anes_des <- anes_adjwgt %>%
as_survey_design(
weights = Weight,
strata = Stratum,
ids = VarUnit,
nest = TRUE
)
```
For RECS, details are included in the RECS documentation and Chapters \@ref(c04-getting-started) and \@ref(c10-sample-designs-replicate-weights).
```{r}
#| label: stattest-recs-des
#| eval: FALSE
recs_des <- recs_2020 %>%
as_survey_rep(
weights = NWEIGHT,
repweights = NWEIGHT1:NWEIGHT60,
type = "JK1",
scale = 59/60,
mse = TRUE
)
```
:::
## Introduction
When analyzing survey results, the point estimates described in Chapter \@ref(c05-descriptive-analysis) help us understand the data at a high level. Still, we often want to make comparisons between different groups. These comparisons are calculated through statistical testing.
The general idea of statistical testing is the same for data obtained through surveys and data obtained through other methods, where we compare the point estimates and uncertainty estimates of each statistic to see if statistically significant differences exist. However, statistical testing for complex surveys involves additional considerations due to the need to account for the sampling design in order to obtain accurate uncertainty estimates.
Statistical testing, also called hypothesis testing, involves declaring a null and alternative hypothesis. A null hypothesis is denoted as $H_0$ and the alternative hypothesis is denoted as $H_A$. The null hypothesis is the default assumption in that there are no differences in the data, or that the data are operating under "standard" behaviors. On the other hand, the alternative hypothesis is the break from the "standard," and we are trying to determine if the data support this alternative hypothesis.
Let's review an example outside of survey data. If we are flipping a coin, a null hypothesis would be that the coin is fair and that each side has an equal chance of being flipped. In other words, the probability of the coin landing on each side is 1/2, whereas an alternative hypothesis could be that the coin is unfair and that one side has a higher probability of being flipped (e.g., a probability of 1/4 to get heads but a probability of 3/4 to get tails). We write this set of hypotheses as:
- $H_0: \rho_{heads} = \rho_{tails}$, where $\rho_{x}$ is the probability of flipping the coin and having it land on heads ($\rho_{heads}$) or tails ($\rho_{tails}$)
- $H_A: \rho_{heads} \neq \rho_{tails}$
\index{p-value|(}
When we conduct hypothesis testing, the statistical models calculate a p-value, which shows how likely we are to observe the data if the null hypothesis is true. If the p-value (a probability between 0 and 1) is small, we have strong evidence to reject the null hypothesis, as it is unlikely to see the data we observe if the null hypothesis is true. However, if the p-value is large, we say we do not have evidence to reject the null hypothesis. The size of the p-value for this cut-off is determined by Type 1 error known as $\alpha$. A common Type 1 error value for statistical testing is to use $\alpha = 0.05$^[For more information on statistical testing, we recommend reviewing introduction to statistics textbooks.]. Explanations of statistical testing often refer to confidence level. The confidence level is the inverse of the Type 1 error. Thus, if $\alpha = 0.05$, the confidence level would be 95%.
\index{p-value|)}
The functions in the {survey} package allow for the correct estimation of the uncertainty estimates (e.g., standard deviations and confidence intervals). This chapter covers the following statistical tests with survey data and the following functions from the {survey} package [@lumley2010complex]:
* Comparison of proportions (`svyttest()`)
* Comparison of means (`svyttest()`)
* Goodness-of-fit tests (`svygofchisq()`)
* Tests of independence (`svychisq()`)
* Tests of homogeneity (`svychisq()`)
## Dot notation {#dot-notation}
\index{Dot notation|(}
Up to this point, we have shown functions that use wrappers from the {srvyr} package. This means that the functions work with tidyverse syntax. However, the functions in this chapter do not have wrappers in the {srvyr} package and are instead used directly from the {survey} package. Therefore, the design object is not the first argument, and to use these functions with the magrittr pipe (`%>%`) and tidyverse syntax, we need to use dot (`.`) notation^[This could change in the future if another package is built or {srvyr} is expanded to work with {tidymodels} packages, but no such plans are known at this time.].
Functions that work with the magrittr pipe (`%>%`) have the dataset as the first argument. When we run a function with the pipe, it automatically places anything to the left of the pipe into the first argument of the function to the right of the pipe. For example, if we wanted to take the `towny` data from the {gt} package and filter to municipalities with the Census Subdivision Type of "city," we can write the code in at least four different ways:
1. `filter(towny, csd_type == "city")`
2. `towny %>% filter(csd_type == "city")`
3. `towny %>% filter(., csd_type == "city")`
4. `towny %>% filter(.data = ., csd_type == "city")`
Each of these lines of code produces the same output since the argument that takes the dataset is in the first spot in `filter()`. The first two are probably familiar to those who have worked with the tidyverse. The third option functions the same way as the second one but is explicit that `towny` goes into the first argument, and the fourth option indicates that `towny` is going into the named argument of `.data`. Here, we are telling R to take what is on the left side of the pipe (`towny`) and pipe it into the spot with the dot (`.`) --- the first argument.
In functions that are not part of the tidyverse, the data argument may not be in the first spot. For example, in \index{Functions in survey!svyttest|(}`svyttest()`, the data argument is in the second spot, which means we need to place the dot (`.`) in the second spot and not the first. For example: \index{svyttest|see {Functions in survey}}
```r
svydata_des %>%
svyttest(x ~ y, .)
```
By default, the pipe places the left-hand object in the first argument spot. Placing the dot (`.`) in the second argument spot indicates that the survey design object `svydata_des` should be used in the second argument and not the first.
Alternatively, named arguments could be used to place the dot first, as named arguments can appear at any location as in the following:
```r
svydata_des %>%
svyttest(design = ., x ~ y)
```
However, the following code does not work as the `svyttest()` function expects the formula as the first argument when arguments are not named:
```r
svydata_des %>%
svyttest(., x ~ y)
```
\index{Functions in survey!svyttest|)} \index{Dot notation|)}
## Comparison of proportions and means {#stattest-ttest}
\index{t-test|(}
We use t-tests to compare two proportions or means. T-tests allow us to determine if one proportion or mean is statistically different from another. \index{t-test!one-sample t-test|(}They are commonly used to determine if a single estimate differs from a known value (e.g., 0 or 50%) or to compare two group means (e.g., North versus South). Comparing a single estimate to a known value is called a one-sample t-test, and we can set up the hypothesis test as follows:
- $H_0: \mu = 0$ where $\mu$ is the mean outcome and $0$ is the value we are comparing it to
- $H_A: \mu \neq 0$
\index{t-test!one-sample t-test|)}
\index{t-test!two-sample t-test|(}
For comparing two estimates, this is called a two-sample t-test. We can set up the hypothesis test as follows:
- $H_0: \mu_1 = \mu_2$ where $\mu_i$ is the mean outcome for group $i$
- $H_A: \mu_1 \neq \mu_2$
\index{t-test!unpaired two-sample t-test|(} \index{t-test!paired two-sample t-test|(}
Two-sample t-tests can also be paired or unpaired. If the data come from two different populations (e.g., North versus South), the t-test run is an unpaired or independent samples t-test. Paired t-tests occur when the data come from the same population. This is commonly seen with data from the same population in two different time periods (e.g., before and after an intervention).
\index{t-test!two-sample t-test|)} \index{t-test!unpaired two-sample t-test|)} \index{t-test!paired two-sample t-test|)}
The difference between t-tests with non-survey data and survey data is based on the underlying variance estimation difference. Chapter \@ref(c10-sample-designs-replicate-weights) provides a detailed overview of the math behind the mean and sampling error calculations for various sample designs. The functions in the {survey} package account for these nuances, provided the design object is correctly defined.
### Syntax {#stattest-ttest-syntax}
\index{Functions in survey!svyttest|(}
When we do not have survey data, we can use the `t.test()` function from the {stats} package to run t-tests. This function does not allow for weights or the variance structure that need to be accounted for with survey data. Therefore, we need to use the `svyttest()` function from {survey} when using survey data. Many of the arguments are the same between the two functions, but there are a few key differences:
- We need to use the survey design object instead of the original data frame
- We can only use a formula and not separate x and y data
- The confidence level cannot be specified and is always set to 95%. However, we show examples of how the confidence level can be changed after running the `svyttest()` function by using the `confint()` function.
Here is the syntax for the `svyttest()` function:
```r
svyttest(formula,
design,
...)
```
The arguments are:
* `formula`: Formula, `outcome~group` for two-sample, `outcome~0` or `outcome~1` for one-sample. The group variable must be a factor or character with two levels, or be coded 0/1 or 1/2. We give more details on formula set-up below for different types of tests.
* `design`: survey design object
* `...`: This passes options on for one-sided tests only, and thus, we can specify `na.rm=TRUE`
\index{Dot notation|(}Notice that the first argument here is the `formula` and not the `design`. This means we must use the dot `(.)` if we pipe in the survey design object (as described in Section \@ref(dot-notation)).\index{Dot notation|)}
The `formula` argument can take several different forms depending on what we are measuring. Here are a few common scenarios: \index{Formula notation|(}
1. \index{t-test!one-sample t-test|(}One-sample t-test:
a. Comparison to 0: `var ~ 0`, where `var` is the measure of interest, and we compare it to the value `0`. For example, we could test if the population mean of household debt is different from `0` given the sample data collected.
b. Comparison to a different value: `var - value ~ 0`, where `var` is the measure of interest and `value` is what we are comparing to. For example, we could test if the proportion of the population that has blue eyes is different from `25%` by using `var - 0.25 ~ 0`. Note that specifying the formula as `var ~ 0.25` is not equivalent and results in a syntax error.\index{t-test!one-sample t-test|)}
2. \index{t-test!two-sample t-test|(}Two-sample t-test:
a. \index{t-test!unpaired two-sample t-test|(}Unpaired:
- 2 level grouping variable: `var ~ groupVar`, where `var` is the measure of interest and `groupVar` is a variable with two categories. For example, we could test if the average age of the population who voted for president in 2020 differed from the age of people who did not vote. In this case, age would be used for `var`, and a binary variable indicating voting activity would be the `groupVar`.
- 3+ level grouping variable: `var ~ groupVar == level`, where `var` is the measure of interest, `groupVar` is the categorical variable, and `level` is the category level to isolate. For example, we could test if the test scores in one classroom differed from all other classrooms where `groupVar` would be the variable holding the values for classroom IDs and `level` is the classroom ID we want to compare to the others.\index{t-test!unpaired two-sample t-test|)}
b. \index{t-test!paired two-sample t-test|(}Paired: `var_1 - var_2 ~ 0`, where `var_1` is the first variable of interest and `var_2` is the second variable of interest. For example, we could test if test scores on a subject differed between the start and the end of a course, so `var_1` would be the test score at the beginning of the course, and `var_2` would be the score at the end of the course.\index{t-test!two-sample t-test|)}\index{t-test!paired two-sample t-test|)}
\index{Formula notation|)}
The `na.rm` argument defaults to `FALSE`, which means if any data values are missing, the t-test does not compute. Throughout this chapter, we always set `na.rm = TRUE`, but before analyzing the survey data, review the notes provided in Chapter \@ref(c11-missing-data) to better understand how to handle missing data.
Let's walk through a few examples using the RECS data.
### Examples {#stattest-ttest-examples}
#### Example 1: One-sample t-test for mean {.unnumbered #stattest-ttest-ex1}
\index{Residential Energy Consumption Survey (RECS)|(} \index{t-test!one-sample t-test|(}
RECS asks respondents to indicate what temperature they set their house to during the summer at night^[Question text: "During the summer, what is your home’s typical indoor temperature inside your home at night?" [@recs-svy]]. In our data, we have called this variable `SummerTempNight`. If we want to see if the average U.S. household sets its temperature at a value different from 68$^\circ$F^[This is the temperature that Stephanie prefers at night during the summer, and she wanted to see if she was different from the population.], we could set up the hypothesis as follows:
- $H_0: \mu = 68$ where $\mu$ is the average temperature U.S. households set their thermostat to in the summer at night
- $H_A: \mu \neq 68$
\index{Formula notation|(}
To conduct this in R, we use `svyttest()` and subtract the temperature on the left-hand side of the formula:
```{r}
#| label: stattest-ttest-ex1
ttest_ex1 <- recs_des %>%
svyttest(
formula = SummerTempNight - 68 ~ 0,
design = .,
na.rm = TRUE
)
ttest_ex1
```
\index{Formula notation|)}
To pull out specific output, we can use R's built-in `$` operator. For instance, to obtain the estimate $\mu - 68$, we run `ttest_ex1$estimate`.
If we want the average, we take our t-test estimate and add it to 68:
```{r}
#| label: stattest-ttest-ex1-add
ttest_ex1$estimate + 68
```
Or, we can use the \index{Functions in srvyr!survey\_mean|(} `survey_mean()` function described in Chapter \@ref(c05-descriptive-analysis): \index{Functions in srvyr!summarize|(}
```{r}
#| label: stattest-ttest-ex1-svymean
recs_des %>%
summarize(mu = survey_mean(SummerTempNight, na.rm = TRUE))
```
\index{Functions in srvyr!summarize|)} \index{Functions in srvyr!survey\_mean|)}
The result is the same in both methods, so we see that the average temperature U.S. households set their thermostat to in the summer at night is `r signif(ttest_ex1$estimate + 68,3)`$^\circ$F. Looking at the output from `svyttest()`, the t-statistic is `r signif(ttest_ex1$statistic, 3)`, and \index{p-value|(} the p-value is `r pretty_p_value(ttest_ex1[["p.value"]])`, indicating that the average is statistically different from 68$^\circ$F at an $\alpha$ level of $0.05$. \index{p-value|)}
If we want an 80% confidence interval for the test statistic, we can use the function `confint()` to change the confidence level. Below, we print the default confidence interval (95%), the confidence interval explicitly specifying the level as 95%, and the 80% confidence interval. When the confidence level is 95% either by default or explicitly, R returns a vector with both row and column names. However, when we specify any other confidence level, an unnamed vector is returned, with the first element being the lower bound and the second element being the upper bound of the confidence interval.
```{r}
#| label: stattest-ttest-ex1-ci80
confint(ttest_ex1)
confint(ttest_ex1, level = 0.95)
confint(ttest_ex1, level = 0.8)
```
In this case, neither confidence interval contains 0, and we draw the same conclusion from either that the average temperature households set their thermostat in the summer at night is significantly higher than 68$^\circ$F.
#### Example 2: One-sample t-test for proportion {.unnumbered #stattest-ttest-ex2}
\index{Categorical data|(}
RECS asked respondents if they use air conditioning (A/C) in their home^[Question text: "Is any air conditioning equipment used in your home?" [@recs-svy]]. In our data, we call this variable `ACUsed`. Let's look at the proportion of U.S. households that use A/C in their homes using the `survey_prop()` function we learned in Chapter \@ref(c05-descriptive-analysis). \index{Functions in srvyr!survey\_prop|(} \index{Functions in srvyr!summarize|(}
```{r}
#| label: stattest-ttest-acused
acprop <- recs_des %>%
group_by(ACUsed) %>%
summarize(p = survey_prop())
acprop
```
\index{Functions in srvyr!summarize|)} \index{Functions in srvyr!survey\_prop|)}
Based on this, `r signif((acprop %>% filter(ACUsed==TRUE) %>% pull(p))*100,3)`% of U.S. households use A/C in their homes. If we wanted to know if this differs from 90%, we could set up our hypothesis as follows:
- $H_0: p = 0.90$ where $p$ is the proportion of U.S. households that use A/C in their homes
- $H_A: p \neq 0.90$
To conduct this in R, we use the `svyttest()` function as follows:
```{r}
#| label: stattest-ttest-ex2
ttest_ex2 <- recs_des %>%
svyttest(
formula = (ACUsed == TRUE) - 0.90 ~ 0,
design = .,
na.rm = TRUE
)
ttest_ex2
```
The output from the `svyttest()` function can be a bit hard to read. Using the `tidy()` function from the {broom} package, we can clean up the output into a tibble to more easily understand what the test tells us [@R-broom].
```{r}
#| label: stattest-ttest-ex2-broom
tidy(ttest_ex2)
```
\index{gt package|(} \index{p-value|(}
The 'tidied' output can also be piped into the {gt} package to create a table ready for publication (see Table \@ref(tab:stattest-ttest-ex2-gt-tab)). We go over the {gt} package in Chapter \@ref(c08-communicating-results). The function `pretty_p_value()` comes from the {prettyunits} package and converts numeric p-values to characters and, by default, prints four decimal places and displays any p-value less than 0.0001 as `"<0.0001"`, though another minimum display p-value can be specified [@R-prettyunits].
```{r}
#| label: stattest-ttest-ex2-gt
#| eval: FALSE
tidy(ttest_ex2) %>%
mutate(p.value = pretty_p_value(p.value)) %>%
gt() %>%
fmt_number()
```
(ref:stattest-ttest-ex2-gt-tab) One-sample t-test output for estimates of U.S. households use A/C in their homes differing from 90%, RECS 2020
```{r}
#| label: stattest-ttest-ex2-gt-tab
#| echo: FALSE
#| warning: FALSE
tidy(ttest_ex2) %>%
mutate(p.value = pretty_p_value(p.value)) %>%
gt() %>%
fmt_number() %>%
print_gt_book(knitr::opts_current$get()[["label"]])
```
\index{gt package|)} \index{p-value|)}
The estimate differs from Example 1 in that it does not display \(p - 0.90\) but rather \(p\), or the difference between the U.S. households that use A/C and our comparison proportion. We can see that there is a difference of --`r signif(ttest_ex2$estimate*100,3)` percentage points. Additionally, the t-statistic value in the `statistic` column is --`r signif(ttest_ex2$statistic,3)`, and the p-value is `r pretty_p_value(ttest_ex2$p.value)`. These results indicate that fewer than 90% of U.S. households use A/C in their homes.
\index{Categorical data|)} \index{t-test!one-sample t-test|)}
#### Example 3: Unpaired two-sample t-test {.unnumbered #stattest-ttest-ex3}
\index{t-test!two-sample t-test|(} \index{t-test!unpaired two-sample t-test|(}
In addition to `ACUsed`, another variable in the RECS data is a household's total electric cost in dollars (`DOLLAREL`).To see if U.S. households with A/C had higher electrical bills than those without, we can set up the hypothesis as follows:
- $H_0: \mu_{AC} = \mu_{noAC}$ where $\mu_{AC}$ is the electrical bill cost for U.S. households that used A/C, and $\mu_{noAC}$ is the electrical bill cost for U.S. households that did not use A/C
- $H_A: \mu_{AC} \neq \mu_{noAC}$
Let's take a quick look at the data to see how they are formatted: \index{Functions in srvyr!survey\_mean|(} \index{Functions in srvyr!summarize|(}
```{r}
#| label: stattest-ttest-ex3-desc
recs_des %>%
group_by(ACUsed) %>%
summarize(mean = survey_mean(DOLLAREL, na.rm = TRUE))
```
\index{Functions in srvyr!summarize|)} \index{Functions in srvyr!survey\_mean|)}
To conduct this in R, we use `svyttest()`:
```{r stattest-ttest-ex3}
#| label: stattest-ttest-ex3
ttest_ex3 <- recs_des %>%
svyttest(formula = DOLLAREL ~ ACUsed,
design = .,
na.rm = TRUE)
```
```{r}
#| label: stattest-ttest-ex3-gt
#| eval: FALSE
tidy(ttest_ex3) %>%
mutate(p.value = pretty_p_value(p.value)) %>%
gt() %>%
fmt_number()
```
(ref:stattest-ttest-ex3-gt-tab) Unpaired two-sample t-test output for estimates of U.S. households electrical bills by A/C use, RECS 2020
```{r}
#| label: stattest-ttest-ex3-gt-tab
#| echo: FALSE
#| warning: FALSE
tidy(ttest_ex3) %>%
mutate(p.value = pretty_p_value(p.value)) %>%
gt() %>%
fmt_number() %>%
print_gt_book(knitr::opts_current$get()[["label"]])
```
The results in Table \@ref(tab:stattest-ttest-ex3-gt-tab) indicate that the difference in electrical bills for those who used A/C and those who did not is, on average, \$`r round(ttest_ex3$estimate,2)`. The difference appears to be statistically significant as the t-statistic is `r signif(ttest_ex3$statistic, 3)` and the p-value is `r pretty_p_value(ttest_ex3[["p.value"]])`. Households that used A/C spent, on average, $`r round(ttest_ex3[["estimate"]], 2) %>% unname()` more in 2020 on electricity than households without A/C.
\index{t-test!unpaired two-sample t-test|)}
#### Example 4: Paired two-sample t-test {.unnumbered #stattest-ttest-ex4}
\index{t-test!paired two-sample t-test|(}
Let's say we want to test whether the temperature at which U.S. households set their thermostat at night differs depending on the season (comparing summer and winter^[Question text: "During the winter, what is your home’s typical indoor temperature inside your home at night?" [@recs-svy]] temperatures). We could set up the hypothesis as follows:
- $H_0: \mu_{summer} = \mu_{winter}$ where $\mu_{summer}$ is the temperature that U.S. households set their thermostat to during summer nights, and $\mu_{winter}$ is the temperature that U.S. households set their thermostat to during winter nights
- $H_A: \mu_{summer} \neq \mu_{winter}$
To conduct this in R, we use `svyttest()` by calculating the temperature difference on the left-hand side as follows:
```{r}
#| label: stattest-ttest-ex4
ttest_ex4 <- recs_des %>%
svyttest(
design = .,
formula = SummerTempNight - WinterTempNight ~ 0,
na.rm = TRUE
)
```
```{r}
#| label: stattest-ttest-ex4-gt
#| eval: FALSE
tidy(ttest_ex4) %>%
mutate(p.value = pretty_p_value(p.value)) %>%
gt() %>%
fmt_number()
```
(ref:stattest-ttest-ex4-gt-tab) Paired two-sample t-test output for estimates of U.S. households thermostat temperature by season, RECS 2020
```{r}
#| label: stattest-ttest-ex4-gt-tab
#| echo: FALSE
#| warning: FALSE
tidy(ttest_ex4) %>%
mutate(p.value = pretty_p_value(p.value)) %>%
gt() %>%
fmt_number() %>%
print_gt_book(knitr::opts_current$get()[["label"]])
```
\index{p-value|(}
The results displayed in Table \@ref(tab:stattest-ttest-ex4-gt-tab) indicate that U.S. households set their thermostat on average `r signif(ttest_ex4$estimate,2)`$^\circ$F warmer in summer nights than winter nights, which is statistically significant (t = `r signif(ttest_ex4$statistic, 3)`, p-value is `r pretty_p_value(ttest_ex4[["p.value"]])`). \index{Functions in survey!svyttest|)} \index{Residential Energy Consumption Survey (RECS)|(} \index{p-value|)} \index{t-test|)} \index{t-test!two-sample t-test|(} \index{t-test!paired two-sample t-test|(}
## Chi-squared tests {#stattest-chi}
\index{Chi-squared test|(} \index{Categorical data|(}
Chi-squared tests ($\chi^2$) allow us to examine multiple proportions using a goodness-of-fit test, a test of independence, or a test of homogeneity. These three tests have the same $\chi^2$ distributions but with slightly different underlying assumptions.
\index{Categorical data|)}
\index{Chi-squared test!Goodness-of-fit test|(}
First, goodness-of-fit tests are used when comparing observed data to expected data. For example, this could be used to determine if respondent demographics (the observed data in the sample) match known population information (the expected data). In this case, we can set up the hypothesis test as follows:
- $H_0: p_1 = \pi_1, ~ p_2 = \pi_2, ~ ..., ~ p_k = \pi_k$ where $p_i$ is the observed proportion for category $i$, $\pi_i$ is the expected proportion for category $i$, and $k$ is the number of categories
- $H_A:$ at least one level of $p_i$ does not match $\pi_i$
\index{Chi-squared test!Goodness-of-fit test|)}
\index{Chi-squared test!Test of independence|(}
Second, tests of independence are used when comparing two types of observed data to see if there is a relationship. For example, this could be used to determine if the proportion of respondents who voted for each political party in the presidential election matches the proportion of respondents who voted for each political party in a local election. In this case, we can set up the hypothesis test as follows:
- $H_0:$ The two variables/factors are independent
- $H_A:$ The two variables/factors are not independent
\index{Chi-squared test!Test of independence|)}
\index{Chi-squared test!Test of homogeneity|(}
Third, tests of homogeneity are used to compare two distributions to see if they match. For example, this could be used to determine if the highest education achieved is the same for both men and women. In this case, we can set up the hypothesis test as follows:
- $H_0: p_{1a} = p_{1b}, ~ p_{2a} = p_{2b}, ~ ..., ~ p_{ka} = p_{kb}$ where $p_{ia}$ is the observed proportion of category $i$ for subgroup $a$, $p_{ib}$ is the observed proportion of category $i$ for subgroup $a$, and $k$ is the number of categories
- $H_A:$ at least one category of $p_{ia}$ does not match $p_{ib}$
\index{Chi-squared test!Test of homogeneity|)}
As with t-tests, the difference between using $\chi^2$ tests with non-survey data and survey data is based on the underlying variance estimation. The functions in the {survey} package account for these nuances, provided the design object is correctly defined. For basic variance estimation formulas for different survey design types, refer to Chapter \@ref(c10-sample-designs-replicate-weights).
### Syntax {#stattest-chi-syntax}
When we do not have survey data, we may be able to use the `chisq.test()` function from the {stats} package in base R to run chi-squared tests [@R-base]. However, this function does not allow for weights or the variance structure to be accounted for with survey data. Therefore, when using survey data, we need to use one of two functions: \index{Functions in survey!svygofchisq|(} \index{Functions in survey!svychisq|(} \index{svygofchisq|see {Functions in survey}} \index{svychisq|see {Functions in survey}}
- \index{Chi-squared test!Goodness-of-fit test|(}`svygofchisq()`: For goodness-of-fit tests\index{Chi-squared test!Goodness-of-fit test|)}
- \index{Chi-squared test!Test of homogeneity|(}\index{Chi-squared test!Test of independence|(}`svychisq()`: For tests of independence and homogeneity\index{Chi-squared test!Test of homogeneity|)}\index{Chi-squared test!Test of independence|)}
The non-survey data function of `chisq.test()` requires either a single set of counts and given proportions (for goodness-of-fit tests) or two sets of counts for tests of independence and homogeneity. \index{Formula notation|(}The functions we use with survey data require respondent-level data and formulas instead of counts. This ensures that the variances are correctly calculated.\index{Formula notation|)}
\index{Chi-squared test!Goodness-of-fit test|(}
First, the function for the goodness-of-fit tests is `svygofchisq()`:
```r
svygofchisq(formula,
p,
design,
na.rm = TRUE,
...)
```
The arguments are:
* `formula`: Formula specifying a single factor variable
* `p`: Vector of probabilities for the categories of the factor in the correct order. If the probabilities do not sum to 1, they are rescaled to sum to 1.
* `design`: Survey design object
* ...: Other arguments to pass on, such as `na.rm`
\index{Dot notation|(} \index{Formula notation|(}
Based on the order of the arguments, we again must use the dot `(.)` notation if we pipe in the survey design object or explicitly name the arguments as described in Section \@ref(dot-notation).\index{Dot notation|)} For the goodness-of-fit tests, the formula is a single variable `formula = ~var` as we compare the observed data from this variable to the expected data. The expected probabilities are then entered in the `p` argument and need to be a vector of the same length as the number of categories in the variable. For example, if we want to know if the proportion of males and females matches a distribution of 30/70, then the sex variable (with two categories) would be used `formula = ~SEX`, and the proportions would be included as `p = c(.3, .7)`. \index{Factor|(}It is important to note that the variable entered into the formula should be formatted as either a factor or a character. The examples below provide more detail and tips on how to make sure the levels match up correctly. \index{Functions in srvyr!drop\_na|)} \index{Factor|)} \index{Chi-squared test!Goodness-of-fit test|)}\index{Formula notation|)}
\index{Chi-squared test!Test of homogeneity|(} \index{Chi-squared test!Test of independence|(}
For tests of homogeneity and independence, the `svychisq()` function should be used. The syntax is as follows:
```r
svychisq(
formula,
design,
statistic = c("F", "Chisq", "Wald", "adjWald",
"lincom", "saddlepoint"),
na.rm = TRUE
)
```
The arguments are:
* `formula`: Model formula specifying the table (shown in examples)
* `design`: Survey design object
* `statistic`: Type of test statistic to use in test (details below)
* `na.rm`: Remove missing values
\index{Cross-tabulation|(}
There are six statistics that are accepted in this formula. For tests of homogeneity (when comparing cross-tabulations), the `F` or `Chisq` statistics should be used^[These two statistics can also be used for goodness-of-fit tests if the `svygofchisq()` function is not used.]. The `F` statistic is the default and uses the Rao-Scott second-order correction. This correction is designed to assist with complicated sampling designs (i.e., those other than a simple random sample) [@Scott2007]. The `Chisq` statistic is an adjusted version of the Pearson $\chi^2$ statistic. The version of this statistic in the `svychisq()` function compares the design effect \index{Design effect} estimate from the provided survey data to what the $\chi^2$ distribution would have been if the data came from a simple random sampling.
\index{Cross-tabulation|)} \index{Chi-squared test!Test of homogeneity|)}
\index{Primary sampling unit|(}
For tests of independence, the `Wald` and `adjWald` are recommended as they provide a better adjustment for variable comparisons [@lumley2010complex]. \index{Degrees of freedom|(}If the data have a small number of primary sampling units (PSUs) compared to the degrees of freedom, then the `adjWald` statistic should be used to account for this.\index{Degrees of freedom|)} The `lincom` and `saddlepoint` statistics are available for more complicated data structures.
\index{Primary sampling unit|)} \index{Chi-squared test!Test of independence|)}
\index{Formula notation|(}
The formula argument is always one-sided, unlike the `svyttest()` function. The two variables of interest should be included with a plus sign: `formula = ~ var_1 + var_2`. As with the `svygofchisq()` function, the variables entered into the formula should be formatted as either a factor or a character.
\index{Formula notation|)}
Additionally, as with the t-test function, both `svygofchisq()` and `svychisq()` have the `na.rm` argument. If any data values are missing, the $\chi^2$ tests assume that `NA` is a category and include it in the calculation. Throughout this chapter, we always set `na.rm = TRUE`, but before analyzing the survey data, review the notes provided in Chapter \@ref(c11-missing-data) to better understand how to handle missing data. \index{Functions in survey!svychisq|)}
### Examples {#stattest-chi-examples}
\index{American National Election Studies (ANES)|(}
Let's walk through a few examples using the ANES data.
#### Example 1: Goodness-of-fit test {.unnumbered #stattest-chi-ex1}
\index{American Community Survey (ACS)|(} \index{Chi-squared test!Goodness-of-fit test|(}
ANES asked respondents about their highest education level^[Question text: "What is the highest level of school you have completed or the highest degree you have received?" [@anes-svy]]. Based on the data from the 2020 American Community Survey (ACS) 5-year estimates^[Data was pulled from data.census.gov using the S1501 Education Attainment 2020: ACS 5-Year Estimates Subject Tables.], the education distribution of those aged 18+ in the United States (among the 50 states and the District of Columbia) is as follows:
- 11% had less than a high school degree
- 27% had a high school degree
- 29% had some college or an associate's degree
- 33% had a bachelor's degree or higher
If we want to see if the weighted distribution from the ANES 2020 data matches this distribution, we could set up the hypothesis as follows:
- $H_0: p_1 = 0.11, ~ p_2 = 0.27, ~ p_3 = 0.29, ~ p_4 = 0.33$
- $H_A:$ at least one of the education levels does not match between the ANES and the ACS
To conduct this in R, let's first look at the education variable (`Education`) we have on the ANES data. Using the \index{Functions in srvyr!survey\_mean|(} `survey_mean()` function discussed in Chapter \@ref(c05-descriptive-analysis), we can see the education levels and estimated proportions. \index{Functions in srvyr!summarize|(} \index{Functions in srvyr!drop\_na|(}
```{r}
#| label: stattest-chi-ex1-educmean
anes_des %>%
drop_na(Education) %>%
group_by(Education) %>%
summarize(p = survey_mean())
```
\index{Functions in srvyr!summarize|)} \index{Functions in srvyr!survey\_mean|)}
\index{Formula notation|(}
Based on this output, we can see that we have different levels from the ACS data. Specifically, the education data from ANES include two levels for bachelor's degree or higher (bachelor's and graduate), so these two categories need to be collapsed into a single category to match the ACS data. For this, among other methods, we can use the {forcats} package from the tidyverse [@R-forcats]. The package's `fct_collapse()` function helps us create a new variable by collapsing categories into a single one. Then, we use the `svygofchisq()` function to compare the ANES data to the ACS data, where we specify the updated design object, the formula using the collapsed education variable, the ACS estimates for education levels as p, and removing `NA` values.
```{r}
#| label: stattest-chi-ex1
anes_des_educ <- anes_des %>%
mutate(Education2 =
fct_collapse(Education,
"Bachelor or Higher" = c("Bachelor's",
"Graduate")))
anes_des_educ %>%
drop_na(Education2) %>%
group_by(Education2) %>%
summarize(p = survey_mean())
chi_ex1 <- anes_des_educ %>%
svygofchisq(
formula = ~ Education2,
p = c(0.11, 0.27, 0.29, 0.33),
design = .,
na.rm = TRUE
)
chi_ex1
```
\index{Formula notation|)}
The output from the `svygofchisq()` indicates that at least one proportion from ANES does not match the ACS data ($\chi^2 =$ `r prettyNum(chi_ex1$statistic, big.mark=",")`; p-value is `r pretty_p_value(chi_ex1[["p.value"]])`). To get a better idea of the differences, we can use the `expected` output along with `survey_mean()` to create a comparison table: \index{Functions in srvyr!survey\_mean|(} \index{Functions in srvyr!summarize|(} \index{Functions in srvyr!drop\_na|(}
```{r}
#| label: stattest-chi-ex1-table
ex1_table <- anes_des_educ %>%
drop_na(Education2) %>%
group_by(Education2) %>%
summarize(Observed = survey_mean(vartype = "ci")) %>%
rename(Education = Education2) %>%
mutate(Expected=c(0.11, 0.27, 0.29, 0.33)) %>%
select(Education, Expected, everything())
ex1_table
```
\index{Functions in srvyr!drop\_na|)} \index{Functions in srvyr!summarize|)} \index{Functions in srvyr!survey\_mean|)}
This output includes our expected proportions from the ACS that we provided the `svygofchisq()` function along with the output of the observed proportions and their confidence intervals. This table shows that the "high school" and "post HS" categories have nearly identical proportions, but that the other two categories are slightly different. Looking at the confidence intervals, we can see that the ANES data skew to include fewer people in the "less than HS" category and more people in the "bachelor or higher" category. This may be easier to see if we plot this. The code below uses the tabular output to create Figure \@ref(fig:stattest-chi-ex1-graph).
```{r}
#| label: stattest-chi-ex1-graph
#| fig.cap: Expected and observed proportions of education with confidence intervals
#| fig.alt: Expected and observed proportions of education, showing the confidence intervals for the expected proportions and whether the observed proportions lie within them. The x-axis has labels 'Less than HS', 'High school', 'Post HS', and 'Bachelor or Higher'. The only ones where expected proportion is outside of the intervals is 'Less than HS' and 'Bachelor or Higher'.
ex1_table %>%
pivot_longer(
cols = c("Expected", "Observed"),
names_to = "Names",
values_to = "Proportion"
) %>%
mutate(
Observed_low = if_else(Names == "Observed", Observed_low, NA_real_),
Observed_upp = if_else(Names == "Observed", Observed_upp, NA_real_),
Names = if_else(Names == "Observed",
"ANES (observed)", "ACS (expected)")
) %>%
ggplot(aes(x = Education, y = Proportion, color = Names)) +
geom_point(alpha = 0.75, size = 2) +
geom_errorbar(aes(ymin = Observed_low, ymax = Observed_upp),
width = 0.25) +
theme_bw() +
scale_color_manual(name = "Type", values = book_colors[c(4, 1)]) +
theme(legend.position = "bottom", legend.title = element_blank())
```
\index{Functions in survey!svygofchisq|)} \index{American Community Survey (ACS)|)} \index{Chi-squared test!Goodness-of-fit test|)}
#### Example 2: Test of independence {.unnumbered #stattest-chi-ex2}
\index{Functions in survey!svychisq|(} \index{Chi-squared test!Test of independence|(}
ANES asked respondents two questions about trust:
- Question text: "How often can you trust the federal government to do what is right?" [@anes-svy]
- Question text: "How often can you trust other people?" [@anes-svy]
If we want to see if the distributions of these two questions are similar or not, we can conduct a test of independence. Here is how the hypothesis could be set up:
- $H_0:$ People's trust in the federal government and their trust in other people are independent (i.e., not related)
- $H_A:$ People's trust in the federal government and their trust in other people are not independent (i.e., they are related)
To conduct this in R, we use the `svychisq()` function to compare the two variables:
```{r}
#| label: stattest-chi-ex2
chi_ex2 <- anes_des %>%
svychisq(
formula = ~ TrustGovernment + TrustPeople,
design = .,
statistic = "Wald",
na.rm = TRUE
)
chi_ex2
```
\index{Cross-tabulation|(}
The output from `svychisq()` indicates that the distribution of people's trust in the federal government and their trust in other people are not independent, meaning that they are related. Let's output the distributions in a table to see the relationship. The `observed` output from the test provides a cross-tabulation of the counts for each category:
```{r}
#| label: stattest-chi-ex2-counts
chi_ex2$observed
```
\index{gt package|(}
However, we often want to know about the proportions, not just the respondent counts from the survey. There are a couple of different ways that we can do this. The first is using the counts from `chi_ex2$observed` to calculate the proportion. We can then pivot the table to create a cross-tabulation similar to the counts table above. Adding `group_by()` to the code means that we obtain the proportions within each variable level. In this case, we are looking at the distribution of `TrustGovernment` for each level of `TrustPeople`. The resulting table is shown in Table \@ref(tab:stattest-chi-ex2-prop1-tab).
```{r}
#| label: stattest-chi-ex2-prop1
#| warning: false
chi_ex2_table<-chi_ex2$observed %>%
as_tibble() %>%
group_by(TrustPeople) %>%
mutate(prop = round(n / sum(n), 3)) %>%
select(-n) %>%
pivot_wider(names_from = TrustPeople, values_from = prop) %>%
gt(rowname_col = "TrustGovernment") %>%
tab_stubhead(label = "Trust in Government") %>%
tab_spanner(label = "Trust in People",
columns = everything()) %>%
cols_label(`Most of the time` = md("Most of<br />the time"),
`About half the time` = md("About half<br />the time"),
`Some of the time` = md("Some of<br />the time"))
```
```{r}
#| label: stattest-chi-ex2-prop1-noeval
#| eval: false
chi_ex2_table
```
(ref:stattest-chi-ex2-prop1-tab) Proportion of adults in the U.S. by levels of trust in people and government, ANES 2020
```{r}
#| label: stattest-chi-ex2-prop1-tab
#| echo: FALSE
#| warning: FALSE
chi_ex2_table %>%
print_gt_book(knitr::opts_current$get()[["label"]])
```
In Table \@ref(tab:stattest-chi-ex2-prop1-tab), each column sums to 1. For example, we can say that it is estimated that of people who always trust in people, `r round(chi_ex2$observed[1,1]/sum(chi_ex2$observed[,1])*100, 1)`% also always trust in the government based on the top-left cell, but `r round(chi_ex2$observed[5,1]/sum(chi_ex2$observed[,1])*100, 1)`% never trust in the government. \index{Cross-tabulation|)}
The second option is to use the `group_by()` and `survey_mean()` functions to calculate the proportions from the ANES design object. Remember that with more than one variable listed in the `group_by()` statement, the proportions are within the first variable listed. As mentioned above, we are looking at the distribution of `TrustGovernment` for each level of `TrustPeople`. \index{Functions in srvyr!survey\_mean|(} \index{Functions in srvyr!summarize|(} \index{Functions in srvyr!drop\_na|(}
```{r}
#| label: stattest-chi-ex2-prop2
chi_ex2_obs <- anes_des %>%
drop_na(TrustPeople, TrustGovernment) %>%
group_by(TrustPeople, TrustGovernment) %>%
summarize(Observed = round(survey_mean(vartype = "ci"), 3),
.groups="drop")
chi_ex2_obs_table<-chi_ex2_obs %>%
mutate(prop = paste0(Observed, " (", Observed_low, ", ",
Observed_upp, ")")) %>%
select(TrustGovernment, TrustPeople, prop) %>%
pivot_wider(names_from = TrustPeople, values_from = prop) %>%
gt(rowname_col = "TrustGovernment") %>%
tab_stubhead(label = "Trust in Government") %>%
tab_spanner(label = "Trust in People",
columns = everything()) %>%
tab_options(page.orientation = "landscape")
```
\index{Functions in srvyr!summarize|)} \index{Functions in srvyr!survey\_mean|)}
```{r}
#| label: stattest-chi-ex2-prop2-noeval
#| eval: false
chi_ex2_obs_table
```
\index{Functions in srvyr!drop\_na|)}
(ref:stattest-chi-ex2-prop2-tab) Proportion of adults in the U.S. by levels of trust in people and government with confidence intervals, ANES 2020
```{r}
#| label: stattest-chi-ex2-prop2-tab
#| echo: FALSE
#| warning: FALSE
chi_ex2_obs_table %>%
print_gt_book(knitr::opts_current$get()[["label"]])
```
Both methods produce the same output as the `svychisq()` function. However, calculating the proportions directly from the design object allows us to obtain the variance information. In this case, the output in Table \@ref(tab:stattest-chi-ex2-prop2-tab) displays the survey estimate followed by the confidence intervals. Based on the output, we can see that of those who never trust people, `r round(chi_ex2$observed[5,5]/sum(chi_ex2$observed[,5])*100, 1)`% also never trust the government, while the proportions of never trusting the government are much lower for each of the other levels of trusting people. \index{gt package|)}
We may find it easier to look at these proportions graphically. We can use `ggplot()` and facets to provide an overview to create Figure \@ref(fig:stattest-chi-ex2-graph) below:
```{r}
#| label: stattest-chi-ex2-graph
#| fig.cap: Proportion of adults in the U.S. by levels of trust in people and government with confidence intervals, ANES 2020
#| fig.alt: Proportion of adults in the U.S. by levels of trust in people and government with confidence intervals, ANES 2020. This presents the same information as the previous table in graphical form.
chi_ex2_obs %>%
mutate(TrustPeople=
fct_reorder(str_c("Trust in People:\n", TrustPeople),
order(TrustPeople))) %>%
ggplot(
aes(x = TrustGovernment, y = Observed, color = TrustGovernment)) +
facet_wrap( ~ TrustPeople, ncol = 5) +
geom_point() +
geom_errorbar(aes(ymin = Observed_low, ymax = Observed_upp)) +
ylab("Proportion") +
xlab("") +
theme_bw() +
scale_color_manual(name="Trust in Government",
values=book_colors) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position = "bottom") +
guides(col = guide_legend(nrow=2))
```
\index{Chi-squared test!Test of independence|)}
#### Example 3: Test of homogeneity {.unnumbered #stattest-chi-ex3}
\index{Chi-squared test!Test of homogeneity|(}
Researchers and politicians often look at specific demographics each election cycle to understand how each group is leaning or voting toward candidates. The ANES data are collected post-election, but we can still see if there are differences in how specific demographic groups voted.
If we want to see if there is a difference in how each age group voted for the 2020 candidates, this would be a test of homogeneity, and we can set up the hypothesis as follows:
\begin{align*}
H_0: p_{1_{Biden}} &= p_{1_{Trump}} = p_{1_{Other}},\\
p_{2_{Biden}} &= p_{2_{Trump}} = p_{2_{Other}},\\
p_{3_{Biden}} &= p_{3_{Trump}} = p_{3_{Other}},\\
p_{4_{Biden}} &= p_{4_{Trump}} = p_{4_{Other}},\\
p_{5_{Biden}} &= p_{5_{Trump}} = p_{5_{Other}},\\
p_{6_{Biden}} &= p_{6_{Trump}} = p_{6_{Other}}
\end{align*}
where $p_{i_{Biden}}$ is the observed proportion of each age group ($i$) that voted for Joseph Biden, $p_{i_{Trump}}$ is the observed proportion of each age group ($i$) that voted for Donald Trump, and $p_{i_{Other}}$ is the observed proportion of each age group ($i$) that voted for another candidate.
- $H_A:$ at least one category of $p_{i_{Biden}}$ does not match $p_{i_{Trump}}$ or $p_{i_{Other}}$
To conduct this in R, we use the `svychisq()` function to compare the two variables: \index{Functions in srvyr!drop\_na|(}
```{r}
#| label: stattest-chi-ex3
chi_ex3 <- anes_des %>%
drop_na(VotedPres2020_selection, AgeGroup) %>%
svychisq(
formula = ~ AgeGroup + VotedPres2020_selection,
design = .,
statistic = "Chisq",
na.rm = TRUE
)
chi_ex3
```
\index{Functions in srvyr!drop\_na|)}
The output from `svychisq()` indicates a difference in how each age group voted in the 2020 election. To get a better idea of the different distributions, let's output proportions to see the relationship. As we learned in Example 2 above, we can use `chi_ex3$observed`, or if we want to get the variance information (which is crucial with survey data), we can use `survey_mean()`. Remember, when we have two variables in `group_by()`, we obtain the proportions within each level of the variable listed. In this case, we are looking at the distribution of `AgeGroup` for each level of `VotedPres2020_selection`. \index{Functions in srvyr!survey\_mean|(} \index{Functions in srvyr!filter|(} \index{Functions in srvyr!summarize|(} \index{Functions in srvyr!drop\_na|(}
```{r}
#| label: stattest-chi-ex3-table
chi_ex3_obs <- anes_des %>%
filter(VotedPres2020 == "Yes") %>%
drop_na(VotedPres2020_selection, AgeGroup) %>%
group_by(VotedPres2020_selection, AgeGroup) %>%
summarize(Observed = round(survey_mean(vartype = "ci"), 3))
chi_ex3_obs_table<-chi_ex3_obs %>%
mutate(prop = paste0(Observed, " (", Observed_low, ", ",
Observed_upp, ")")) %>%
select(AgeGroup, VotedPres2020_selection, prop) %>%
pivot_wider(names_from = VotedPres2020_selection,
values_from = prop) %>%
gt(rowname_col = "AgeGroup") %>%
tab_stubhead(label = "Age Group")
```
\index{Functions in srvyr!drop\_na|)} \index{Functions in srvyr!filter|)} \index{Functions in srvyr!summarize|)} \index{Functions in srvyr!survey\_mean|)}
```{r}
#| label: stattest-chi-ex3-table-noeval
#| eval: false
chi_ex3_obs_table
```
(ref:stattest-chi-ex3-tab) Distribution of age group by presidential candidate selection with confidence intervals
```{r}
#| label: stattest-chi-ex3-tab
#| echo: FALSE
#| warning: FALSE
chi_ex3_obs_table %>%
print_gt_book(knitr::opts_current$get()[["label"]])
```
In Table \@ref(tab:stattest-chi-ex3-tab) we can see that the age group distribution that voted for Biden and other candidates was younger than those that voted for Trump. For example, of those who voted for Biden, 20.4% were in the 18--29 age group, compared to only 11.4% of those who voted for Trump were in that age group. Conversely, 23.4% of those who voted for Trump were in the 50--59 age group compared to only 15.4% of those who voted for Biden. \index{Functions in survey!svychisq|)} \index{Chi-squared test|)} \index{American National Election Studies (ANES)|)} \index{Statistical testing|)}
\index{Chi-squared test!Test of homogeneity|)}
## Exercises {#stattest-exercises}
The exercises use the design objects `anes_des` and `recs_des` as provided in the Prerequisites box at the [beginning of the chapter](#c06-statistical-testing). Here are some exercises for practicing conducting t-tests using `svyttest()`:
1. Using the RECS data, do more than 50% of U.S. households use A/C (`ACUsed`)?
2. Using the RECS data, does the average temperature at which U.S. households set their thermostats differ between the day and night in the winter (`WinterTempDay` and `WinterTempNight`)?
3. Using the ANES data, does the average age (`Age`) of those who voted for Joseph Biden in 2020 (`VotedPres2020_selection`) differ from those who voted for another candidate?
4. If we wanted to determine if the political party affiliation differed for males and females, what test would we use?
a. Goodness-of-fit test (`svygofchisq()`)
b. Test of independence (`svychisq()`)
c. Test of homogeneity (`svychisq()`)
5. In the RECS data, is there a relationship between the type of housing unit (`HousingUnitType`) and the year the house was built (`YearMade`)?
6. In the ANES data, is there a difference in the distribution of gender (`Gender`) across early voting status in 2020 (`EarlyVote2020`)?