-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
149 lines (98 loc) · 6.21 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
# srvyrexploR <img src="man/figures/srvyrExplore.png" alt="Hex sticker for srvyrexploR package - a fake world with hills made out of pie charts, fields that look like matrix plots, and buildings that have error bars. A dragon in the water." align="right" height="149" width="149"/>
```{r, include = FALSE}
#| label: setup
#| include: false
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "75%",
warning = FALSE,
message = FALSE,
fig.retina = 2,
fig.align = 'center',
tidy = 'styler'
)
library(tidyverse)
```
The **srvyexploR** package provides datasets used in the book [Exploring Complex Survey Data Analysis Using R: A Tidy Introduction with {srvyr} and {survey}](https://tidy-survey-r.github.io/tidy-survey-book/). This will help readers follow along with the examples and work through the exercises.
## Installation
To install the development version from [GitHub](https://github.com/), use:
```r
# install.packages("pak")
pak::pak("tidy-survey-r/srvyrexploR")
```
To load the package, use:
```{r}
#| label: load-package
library(srvyrexploR)
```
## About the data
This package includes data from three surveys including the American National Election Studies (ANES), the National Crime Victimization Survey (NCVS), and the Residential Energy Consumption Survey (RECS).
### ANES
The ANES data is based on the publicly available 2020 ANES data with additional derived variables and is subset to people who completed both pre and post-election interviews. The ANES Times Series Studies collect data on political polling in the United States and has been conducted since 1948.For more information about the 2020 study, see the [American National Election Studies website](https://electionstudies.org/data-center/2020-time-series-study/). On the ANES website, you can learn more about the study, see codebooks and methodology reports, and download the data (after registering). We received permission to distribute this data for the purpose of the book. Once the package is loaded, you can use the data immediately as follows:
```{r}
#| label: show-anes
head(anes_2020)
```
See `?anes_2020` for more information about the data.
Also, included in the package is a Stata version of the ANES data with a subset of the columns and is subset to people who completed both pre and post-election interviews. To load this dataset, we recommend using the {haven} package as follows:
```{r}
#| label: anes-stata
anes_stata <- haven::read_dta(system.file("extdata", "anes_2020_stata_example.dta", package="srvyrexploR"))
```
### NCVS
The NCVS data is based off of publicly available data for the 2021 NCVS. The NCVS is a survey conducted by the Bureau of Justice Statistics and asks people age 12 and over about their crime victimizations. The study has been conducted continuously since 1992. This package includes three datasets - one for household-level data (`ncvs_2021_household`), one for person-level data (`ncvs_2021_person`), and one for incident-level data (`ncvs_2021_incident`) where each includes a subset of the columns of the full data available from 2021 at [ICPSR](https://www.icpsr.umich.edu/web/NACJD/studies/38429). This data is reproduced here with permission from ICPSR.
```{r}
#| label: ncvs-show
head(ncvs_2021_household)
head(ncvs_2021_person)
head(ncvs_2021_incident)
```
### RECS
Three files are included associated with RECS - a dataset with the 2015 data with some derived variables created for the book (`recs_2015`), the 2020 data with some derived variables created for the book (`recs_2020`), and the 2020 data with the original variables (`recs_2020_raw`). RECS is a survey about energy consumption and expenditure among residential households in the United States and has been conducted since 1979 by the Energy Information Administration. More information about the original data is available at the [RECS website](https://www.eia.gov/consumption/residential/data/2020/).
```{r}
#| label: recs-show
head(recs_2015)
head(recs_2020)
head(recs_2020_raw)
```
## Examples
To analyze the survey data, we recommend using the {srvyr} package as follows:
```r
# install.packages("pak")
pak::pak("gergness/srvyr")
```
```{r}
#| label: srvyr-examp
library(srvyr)
recs_des <- recs_2020 %>%
as_survey_rep(weights = NWEIGHT, repweights = NWEIGHT1:NWEIGHT60,
type = "JK1", scale = 59/60, mse = TRUE,
variables=c(ACUsed, Region))
recs_des
recs_des %>%
group_by(Region) %>%
summarize(
p=survey_mean(ACUsed, vartype="ci", proportion = TRUE, prop_method = "logit")
)
```
The above example estimates the proportion of residential households that use air-conditioning by region with a 95% confidence interval.
## License
Data are available by [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license. Additionally, re-distributing the ANES or NCVS datasets is subject to their policies.
## Additional data use information
Anyone interested in redistributing the NCVS data should refer to [ICPSR: Requests for Permission to Redistribute ICPSR Data](https://www.icpsr.umich.edu/web/pages/datamanagement/policies/redistribute.html).
Anyone interested in redistributing the ANES data should refer to the [ANES FAQ - disseminate](https://electionstudies.org/faq/).
## References
**Data citations:**
ANES:
+ American National Election Studies. 2021. ANES 2020 Time Series Study Full Release [dataset and documentation]. July 19, 2021 version. https://www.electionstudies.org
NCVS:
+ United States. Bureau of Justice Statistics. National Crime Victimization Survey, [United States], 2021. Inter-university Consortium for Political and Social Research [distributor], 2022-09-19. https://doi.org/10.3886/ICPSR38429.v1
RECS:
+ U.S. Energy Information Administration. 2024. Residential Energy Consumption 2020 Survey Data. [dataset and documentation]. January 2024 version. https://www.eia.gov/consumption/residential/data/2020/index.php?view=microdata
+ U.S. Energy Information Administration. 2018 Residential Energy Consumption 2015 Survey Data. [dataset and documentation]. December 2018 version. https://www.eia.gov/consumption/residential/data/2015/index.php?view=microdata