diff --git a/404.html b/404.html index 8dc149bf..0a9f124a 100644 --- a/404.html +++ b/404.html @@ -411,7 +411,8 @@
The ANES is a study that collects data from election surveys dating back to 1948. These surveys contain information on public opinion and voting behavior in U.S. presidential elections. They cover topics such as party affiliation, voting choice, and level of trust in the government. The 2020 survey, the data we use in the book, was fielded online, through live video interviews, or via computer-assisted telephone interviews (CATI).
When working with new survey data, analysts should review the survey documentation (see Chapter 3) to understand the data collection methods. The original ANES data contains variables starting with V20
(DeBell 2010), so to assist with our analysis throughout the book, we created descriptive variable names. For example, the respondent’s age is now in a variable called Age
, and gender is in a variable called Gender
. These descriptive variables are included in the {srvyrexploR} package, and Table 4.1 displays the list of these renamed variables. A complete overview of all variables can be found in Appendix A.
RECS is a study that measures energy consumption and expenditure in American households. Funded by the Energy Information Administration, the RECS data are collected through interviews with household members and energy suppliers. These interviews take place in person, over the phone, via mail, and on the web. The survey has been fielded 14 times between 1950 and 2020. It includes questions about appliances, electronics, heating, air conditioning (A/C), temperatures, water heating, lighting, energy bills, respondent demographics, and energy assistance.
As mentioned above, analysts should read the survey documentation (see Chapter 3) to understand how the data was collected and implemented. Table 4.2 displays the list of variables in the RECS data (not including the weights, which start with NWEIGHT
and will be described in more detail in Chapter 10). An overview of all variables can be found in Appendix B.
We will use data from NCVS. Here is the code to read in the three datasets from the {srvyrexploR} package:
+We will use data from the United States National Crime Victimization Survey (NCVS). Here is the code to read in the three datasets from the {srvyrexploR} package:
The United States National Crime Victimization Survey (NCVS) is a household survey sponsored by the Bureau of Justice Statistics (BJS), which collects data on criminal victimization, including characteristics of the crimes, offenders, and victims. Crime types include both household and personal crimes, as well as violent and non-violent crimes. The target population of this survey is all people in the United States age 12 and older living in housing units and noninstitutional group quarters.
+The NCVS is a household survey sponsored by the Bureau of Justice Statistics (BJS), which collects data on criminal victimization, including characteristics of the crimes, offenders, and victims. Crime types include both household and personal crimes, as well as violent and non-violent crimes. The target population of this survey is all people in the United States age 12 and older living in housing units and noninstitutional group quarters.
The NCVS has been ongoing since 1992. An earlier survey, the National Crime Survey, was run from 1972 to 1991 (Bureau of Justice Statistics 2017). The survey is administered using a rotating panel. When an address enters the sample, the residents of that address are interviewed every six months for a total of seven interviews. If the initial residents move away from the address during the period, the new residents are included in the survey, as people are not followed when they move.
NCVS data is publicly available and distributed by Inter-university Consortium for Political and Social Research (ICPSR)37, with data going back to 1992. The vignette in this book will include data from 2021 (United States. Bureau of Justice Statistics 2022). The NCVS data structure is complicated, and the User’s Guide contains examples for analysis in SAS, SUDAAN, SPSS, and Stata, but not R (Shook-Sa, Bonnie, Couzens, G. Lance, and Berzofsky, Marcus 2015). This vignette will adapt those examples for R.
Each record on the incident file represents one victimization, which is not the same as one incident. Some victimizations have several instances that make it difficult for the victim to differentiate the details of these incidents, labeled as “series crimes”. Appendix A of the User’s Guide indicates how to calculate the series weight in other statistical languages.
-Here, we adapt that code for R. Essentially, if a victimization is a series crime, its series weight is top-coded at 10 based on the number of actual victimizations. If an incident is a series crime, but the number of occurrences is unknown, the series weight is set to 6. A description of the variables used to create indicators of series and the associated weights is included in Table 13.1.
+Here, we adapt that code for R. Essentially, if a victimization is a series crime, its series weight is top-coded at 10 based on the number of actual victimizations, that is that even if the crime repeatedly occurred more than 10 times, it is counted as 10 times to reduce the influence of extreme outliers. If an incident is a series crime, but the number of occurrences is unknown, the series weight is set to 6. A description of the variables used to create indicators of series and the associated weights is included in Table 13.1.
We want to create four variables to indicate if an incident is a series crime. First, we create a variable called series using V4017
, V4018
, and V4019
. Next, we top code the number of incidents (V4016
). Finally, we create the series weight using our new top-coded variable and the existing weight.
We want to create four variables to indicate if an incident is a series crime. First, we create a variable called series using V4017
, V4018
, and V4019
where an incident is considered a series crime if there are 6 or more incidents (V4107
), the incidents are similar in detail (V4018
), or there is not enough detail to distinguish the incidents (V4019
). Next, we top-code the number of incidents (V4016
) by creating a variable n10v4016
which is set to 10 if V4016 > 10
. Finally, we create the series weight using our new top-coded variable and the existing weight.
inc_series <- ncvs_2021_incident %>%
mutate(
series = case_when(V4017 %in% c(1, 8) ~ 1,
V4018 %in% c(2, 8) ~ 1,
V4019 %in% c(1, 8) ~ 1,
- TRUE ~ 2 # series
+ TRUE ~ 2
),
n10v4016 = case_when(V4016 %in% c(997, 998) ~ NA_real_,
V4016 > 10 ~ 10,
@@ -1967,21 +1968,21 @@ 13.6.1 Estimation 1: Victimizatio
## Violent_Vzn Violent_Vzn_se
## <dbl> <dbl>
## 1 4598306. 198115.
-The number of victimizations estimated using the incident file is equivalent to the person and household file method. There are 1.1682^{7} property incidents and 4.5983^{6} violent incidents in a six-month period.
+The number of victimizations estimated using the incident file is equivalent to the person and household file method. There are 11,682,056 property incidents and 4,598,306 violent incidents in a six-month period.
Victimization proportions are proportions describing features of a victimization. The key here is that these are questions among victimizations, not among the population. These types of estimates can only be calculated using the incident design object (inc_des
).
For example, we could be interested in the percentage of property victimizations reported to the police:
+For example, we could be interested in the percentage of property victimizations reported to the police as shown in the following code with an estimate, the standard error, and 95% confidence interval:
prop1 <- inc_des %>%
filter(Property) %>%
- summarize(Pct = survey_mean(ReportPolice, na.rm = TRUE) * 100)
+ summarize(Pct = survey_mean(ReportPolice, na.rm = TRUE, proportion=TRUE, vartype=c("se", "ci")) * 100)
prop1
## # A tibble: 1 × 2
-## Pct Pct_se
-## <dbl> <dbl>
-## 1 30.8 0.798
+## # A tibble: 1 × 4
+## Pct Pct_se Pct_low Pct_upp
+## <dbl> <dbl> <dbl> <dbl>
+## 1 30.8 0.798 29.2 32.4
Or, the percentage of violent victimizations that are in urban areas:
prop2 <- inc_des %>%
filter(Violent) %>%
@@ -2032,7 +2033,7 @@ 13.6.3 Estimation 3: Victimizatio
## 1 0.249 0.0595 0.860 0.101 0.455
## # ℹ 3 more variables: AAST_Knife_se <dbl>, AAST_Other <dbl>,
## # AAST_Other_se <dbl>
-A common desire is to calculate victimization rates by several characteristics. For example, we may want to calculate the violent victimization rate and aggravated assault rate by sex, race/Hispanic origin, age group, marital status, and household income. This requires a group_by()
statement for each categorization separately. Thus, we make a function to do this and then use map_df()
from the {purrr} package (part of the tidyverse) to loop through the variables. Finally, the {gt} package is used to make a publishable table shown in Table 13.5.
A common desire is to calculate victimization rates by several characteristics. For example, we may want to calculate the violent victimization rate and aggravated assault rate by sex, race/Hispanic origin, age group, marital status, and household income. This requires a group_by()
statement for each categorization separately. Thus, we make a function to do this and then use map_df()
from the {purrr} package (part of the tidyverse) to loop through the variables. This function takes a demographic variable as its input (byarvar
) and calculates the violent and aggravated assault vicitimization rate for each level. It then creates some columns with the variable, the level of each variable, and a numeric version of the variable (LevelNum
) for sorting later. The function is run across multiple variables using map()
and then stacks the results into a single output using bind_rows()
.
pers_est_by <- function(byvar) {
pers_des %>%
rename(Level := {{byvar}}) %>%
@@ -2052,65 +2053,66 @@ 13.6.3 Estimation 3: Victimizatio
pers_est_df <-
c("Sex", "RaceHispOrigin", "AgeGroup", "MaritalStatus", "Income") %>%
- map_df(pers_est_by)
-
-vr_gt<-pers_est_df %>%
- mutate(
- Variable = case_when(
- Variable == "RaceHispOrigin" ~ "Race/Hispanic origin",
- Variable == "MaritalStatus" ~ "Marital status",
- Variable == "AgeGroup" ~ "Age",
- TRUE ~ Variable
- )
- ) %>%
- select(-LevelNum) %>%
- group_by(Variable) %>%
- gt(rowname_col = "Level") %>%
- tab_spanner(
- label = "Violent crime",
- id = "viol_span",
- columns = c("Violent", "Violent_se")
- ) %>%
- tab_spanner(label = "Aggravated assault",
- columns = c("AAST", "AAST_se")) %>%
- cols_label(
- Violent = "Rate",
- Violent_se = "SE",
- AAST = "Rate",
- AAST_se = "SE",
- ) %>%
- fmt_number(
- columns = c("Violent", "Violent_se", "AAST", "AAST_se"),
- decimals = 1
- ) %>%
- tab_footnote(
- footnote = "Includes rape or sexual assault, robbery,
- aggravated assault, and simple assault.",
- locations = cells_column_spanners(spanners = "viol_span")
- ) %>%
- tab_footnote(
- footnote = "Excludes persons of Hispanic origin",
- locations =
- cells_stub(rows = Level %in%
- c("White", "Black", "Asian", NHOPI, "Other"))) %>%
- tab_footnote(
- footnote = "Includes persons who identified as
- Native Hawaiian or Other Pacific Islander only.",
- locations = cells_stub(rows = Level == NHOPI)
- ) %>%
- tab_footnote(
- footnote = "Includes persons who identified as American Indian or
- Alaska Native only or as two or more races.",
- locations = cells_stub(rows = Level == "Other")
- ) %>%
- tab_source_note(
- source_note = "Note: Rates per 1,000 persons age 12 or older.") %>%
- tab_source_note(source_note = "Source: Bureau of Justice Statistics,
- National Crime Victimization Survey, 2021.") %>%
- tab_stubhead(label = "Victim demographic") %>%
- tab_caption("Rate and standard error of violent victimization,
- by type of crime and demographic characteristics, 2021")
The output from all the estimates is cleanded to create better labels such as going from “RaceHispOrigin” to “Race/Hispanic Origin”. Finally, the {gt} package is used to make a publishable table (Table 13.5). Using the functions from the {gt} package, column labels and footnotes are added and estimates are presented to the first decimal place.
+vr_gt<-pers_est_df %>%
+ mutate(
+ Variable = case_when(
+ Variable == "RaceHispOrigin" ~ "Race/Hispanic origin",
+ Variable == "MaritalStatus" ~ "Marital status",
+ Variable == "AgeGroup" ~ "Age",
+ TRUE ~ Variable
+ )
+ ) %>%
+ select(-LevelNum) %>%
+ group_by(Variable) %>%
+ gt(rowname_col = "Level") %>%
+ tab_spanner(
+ label = "Violent crime",
+ id = "viol_span",
+ columns = c("Violent", "Violent_se")
+ ) %>%
+ tab_spanner(label = "Aggravated assault",
+ columns = c("AAST", "AAST_se")) %>%
+ cols_label(
+ Violent = "Rate",
+ Violent_se = "SE",
+ AAST = "Rate",
+ AAST_se = "SE",
+ ) %>%
+ fmt_number(
+ columns = c("Violent", "Violent_se", "AAST", "AAST_se"),
+ decimals = 1
+ ) %>%
+ tab_footnote(
+ footnote = "Includes rape or sexual assault, robbery,
+ aggravated assault, and simple assault.",
+ locations = cells_column_spanners(spanners = "viol_span")
+ ) %>%
+ tab_footnote(
+ footnote = "Excludes persons of Hispanic origin",
+ locations =
+ cells_stub(rows = Level %in%
+ c("White", "Black", "Asian", NHOPI, "Other"))) %>%
+ tab_footnote(
+ footnote = "Includes persons who identified as
+ Native Hawaiian or Other Pacific Islander only.",
+ locations = cells_stub(rows = Level == NHOPI)
+ ) %>%
+ tab_footnote(
+ footnote = "Includes persons who identified as American Indian or
+ Alaska Native only or as two or more races.",
+ locations = cells_stub(rows = Level == "Other")
+ ) %>%
+ tab_source_note(
+ source_note = "Note: Rates per 1,000 persons age 12 or older.") %>%
+ tab_source_note(source_note = "Source: Bureau of Justice Statistics,
+ National Crime Victimization Survey, 2021.") %>%
+ tab_stubhead(label = "Victim demographic") %>%
+ tab_caption("Rate and standard error of violent victimization,
+ by type of crime and demographic characteristics, 2021")