From 52480e2c1c47a2a02707c91a23fcd792e50fdf2e Mon Sep 17 00:00:00 2001
From: szimmer
@@ -277,7 +275,7 @@
@@ -294,7 +292,7 @@
-
@@ -395,7 +393,7 @@
@@ -277,7 +275,7 @@
@@ -294,7 +292,7 @@
-
@@ -395,7 +393,7 @@
@@ -277,7 +275,7 @@
@@ -294,7 +292,7 @@
-
@@ -395,7 +393,7 @@
1.1 What to expect2: An overview of surveys and the process of designing surveys. This is only an overview, and we include many references for more in-depth knowledge.
@@ -277,7 +275,7 @@
@@ -294,7 +292,7 @@
-
@@ -395,7 +393,7 @@
@@ -277,7 +275,7 @@
@@ -294,7 +292,7 @@
-
@@ -395,7 +393,7 @@
Methodology(DeBell, Matthew and Amsbary, Michelle and Brader, Ted and Brock, Shelley and Good, Cindy and Kamens, Justin and Maisel, Natalya and Pinto, Sarah 2022). This file states that we can use the population total from the Current Population Survey (CPS), a monthly survey sponsored by the U.S. Census Bureau and the U.S. Bureau of Labor Statistics. The CPS provides a more accurate population estimate for a specific month. Therefore, we can use the CPS to get the total population number for March 2020, the time in which the ANES was conducted. Chapter 4 goes into detailed instructions on how to calculate and adjust this value in the data.
The documentation suggests that the population should equal around 231 million, but this is a very imprecise count. Upon further investigation in the available resources, we can find the methodology file titled “Methodology Report for the ANES 2020 Time Series Study” (DeBell, Matthew and Amsbary, Michelle and Brader, Ted and Brock, Shelley and Good, Cindy and Kamens, Justin and Maisel, Natalya and Pinto, Sarah 2022). This file states that we can use the population total from the Current Population Survey (CPS), a monthly survey sponsored by the U.S. Census Bureau and the U.S. Bureau of Labor Statistics. The CPS provides a more accurate population estimate for a specific month. Therefore, we can use the CPS to get the total population number for March 2020, the time in which the ANES was conducted. Chapter 4 goes into detailed instructions on how to calculate and adjust this value in the data.
@@ -662,7 +660,7 @@This chapter provides an overview of the packages, data, and design objects we use throughout this book. For a streamlined learning experience, we recommend taking the time to walk through the code provided and making sure everything is installed. As mentioned in Chapter 2, understanding how a survey was conducted helps us make sense of the results and interpret findings. So, we provide background on the datasets used in examples and exercises. Finally, we walk through how to create the survey design objects necessary to begin analysis. If you have questions or face issues while going through the book, please report them in the book’s GitHub repository: https://github.com/tidy-survey-r/tidy-survey-book.
-We use several packages throughout the book, but let’s install and load specific ones for this chapter. Many functions in the examples and exercises are from three packages: {tidyverse}, {survey}, and {srvyr}. If they are not already installed, use the code below. The {tidyverse} and {survey} package can both be installed from the Comprehensive R Archive Network (CRAN). We use the GitHub development version of {srvyr} because of its additional functionality compared to the one on CRAN. Install the package directly from GitHub using the {remotes} package:
-install.packages(c("tidyverse", "survey"))
-remotes::install_github("https://github.com/gergness/srvyr")
This chapter provides an overview of the packages, data, and design objects we use frequently throughout this book. As mentioned in Chapter 2, understanding how a survey was conducted helps us make sense of the results and interpret findings. Therefore, we provide background on the datasets used in examples and exercises. Next, we walk through how to create the survey design objects necessary to begin analysis. Finally, we provide an overview of the {srvyr} package and the steps needed for analysis. If you have questions or face issues while going through the book, please report them in the book’s GitHub repository.
+The Setup section provides details on the required packages and data, as well as the steps for preparing survey design objects. For a streamlined learning experience, we recommend taking the time to walk through the code provided and making sure everything is properly set up.
+We use several packages throughout the book, but let’s install and load specific ones for this chapter. Many functions in the examples and exercises are from three packages: {tidyverse}, {survey}, and {srvyr}. If they are not already installed, use the code below. The {tidyverse} and {survey} packages can both be installed from the Comprehensive R Archive Network (CRAN). We use the GitHub development version of {srvyr} because of its additional functionality compared to the one on CRAN. Install the package directly from GitHub using the {remotes} package:
+We bundled the datasets used in the book in an R package, {srvyrexploR}. Install it directly from GitHub using the {remotes} package:
- +After installing these packages, load them using the library()
function:
The packages {broom}, {gt}, and {gtsummary} play a role in displaying output and creating formatted tables. Install them with the provided code5:
- +After installing these packages, load them using the library()
function:
Install and load the {censusapi} package to access the Current Population Survey (CPS), which we use to ensure accurate weighting of a key dataset in the book. Run the code below to install {censusapi}:
- +After installing this package, load it using the library()
function:
Note that the {censusapi} package requires a Census API key, available for free from the U.S. Census Bureau website (refer to the package documentation for more information). It’s recommended to include the Census API key in our R environment instead of directly in the code. After obtaining the API key, save it in your R environment by running Sys.setenv()
:
Note that the {censusapi} package requires a Census API key, available for free from the U.S. Census Bureau website (refer to the package documentation for more information). We recommend storing the Census API key in our R environment instead of directly in the code. After obtaining the API key, save it in your R environment by running Sys.setenv()
:
Then, restart the R session. Once the Census API key is stored, we can retrieve it in our R code with Sys.getenv("CENSUS_KEY")
.
There are other packages used throughout the book. We list them in the Prerequisite boxes at the beginning of each chapter. As we work through the book, make sure to check the Prerequisite box and install any missing packages before proceeding.
+There are a few other packages used in the book in limited frequency. We list them in the Prerequisite boxes at the beginning of each chapter. As we work through the book, make sure to check the Prerequisite box and install any missing packages before proceeding.
As mentioned above, the {srvyrexploR} package contains the datasets used in the book. Once installed and loaded, explore the documentation using the help()
function. Read the descriptions of the datasets to understand what they contain:
This book uses two main datasets: the American National Election Studies (ANES – DeBell 2010) and the Residential Energy Consumption Survey (RECS – U.S. Energy Information Administration 2023a). We can load these datasets individually with the data()
function by specifying the dataset name as an argument. In the code below, we load the anes_2020
and recs_2020
datasets into objects with their respective names:
The ANES is a study that collects data from election surveys dating back to 1948. These surveys contain information on public opinion and voting behavior in U.S. presidential elections. They cover topics such as party affiliation, voting choice, and level of trust in the government. The 2020 survey, the data we use in the book, was fielded online, through live video interviews, or via computer-assisted telephone interviews (CATI).
-When working with new survey data, analysts should review the survey documentation (see Chapter 3) to understand the data collection methods. The original ANES data contains variables starting with V20
(DeBell 2010), so to assist with our analysis throughout the book, we created descriptive variable names. For example, the respondent’s age is now in a variable called Age
, and gender is in a variable called Gender
. These descriptive variables are included in the {srvyrexploR} package, and Table 4.1 displays the list of these renamed variables. A complete overview of all variables can be found in Appendix A.
We want to create four variables to indicate if an incident is a series crime. First, we create a variable called series using V4017
, V4018
, and V4019
where an incident is considered a series crime if there are 6 or more incidents (V4107
), the incidents are similar in detail (V4018
), or there is not enough detail to distinguish the incidents (V4019
). Next, we top-code the number of incidents (V4016
) by creating a variable n10v4016
which is set to 10 if V4016 > 10
. Finally, we create the series weight using our new top-coded variable and the existing weight.
inc_series <- ncvs_2021_incident %>%
- mutate(
- series = case_when(V4017 %in% c(1, 8) ~ 1,
- V4018 %in% c(2, 8) ~ 1,
- V4019 %in% c(1, 8) ~ 1,
- TRUE ~ 2
- ),
- n10v4016 = case_when(V4016 %in% c(997, 998) ~ NA_real_,
- V4016 > 10 ~ 10,
- TRUE ~ V4016),
- serieswgt = case_when(series == 2 & is.na(n10v4016) ~ 6,
- series == 2 ~ n10v4016,
- TRUE ~ 1),
- NEWWGT = WGTVICCY * serieswgt
- )
inc_series <- ncvs_2021_incident %>%
+ mutate(
+ series = case_when(V4017 %in% c(1, 8) ~ 1,
+ V4018 %in% c(2, 8) ~ 1,
+ V4019 %in% c(1, 8) ~ 1,
+ TRUE ~ 2
+ ),
+ n10v4016 = case_when(V4016 %in% c(997, 998) ~ NA_real_,
+ V4016 > 10 ~ 10,
+ TRUE ~ V4016),
+ serieswgt = case_when(series == 2 & is.na(n10v4016) ~ 6,
+ series == 2 ~ n10v4016,
+ TRUE ~ 1),
+ NEWWGT = WGTVICCY * serieswgt
+ )
The next step in preparing the files for estimation is to create indicators on the victimization file for characteristics of interest. Almost all BJS publications limit the analysis to records where the victimization occurred in the United States, where V4022
is not equal to 1, and we will do this for all estimates as well. A brief codebook of variables for this task is located in Table 13.2
hh_vsum_der <- hh_vsum %>%
- mutate(
- Tenure = factor(case_when(V2015 == 1 ~ "Owned",
- !is.na(V2015) ~ "Rented"),
- levels = c("Owned", "Rented")),
- Urbanicity = factor(case_when(V2143 == 1 ~ "Urban",
- V2143 == 2 ~ "Suburban",
- V2143 == 3 ~ "Rural"),
- levels = c("Urban", "Suburban", "Rural")),
- SC214A_num = as.numeric(as.character(SC214A)),
- Income = case_when(SC214A_num <= 8 ~ "Less than $25,000",
- SC214A_num <= 12 ~ "$25,000-49,999",
- SC214A_num <= 15 ~ "$50,000-99,999",
- SC214A_num <= 17 ~ "$100,000-199,999",
- SC214A_num <= 18 ~ "$200,000 or more"),
- Income = fct_reorder(Income, SC214A_num, .na_rm = FALSE),
- PlaceSize = case_match(as.numeric(as.character(V2126B)),
- 0 ~ "Not in a place",
- 13 ~ "Under 10,000",
- 16 ~ "10,000-49,999",
- 17 ~ "50,000-99,999",
- 18 ~ "100,000-249,999",
- 19 ~ "250,000-499,999",
- 20 ~ "500,000-999,999",
- c(21, 22, 23) ~ "1,000,000 or more"),
- PlaceSize = fct_reorder(PlaceSize, as.numeric(V2126B)),
- Region = case_match(as.numeric(V2127B),
- 1 ~ "Northeast",
- 2 ~ "Midwest",
- 3 ~ "South",
- 4 ~ "West"),
- Region = fct_reorder(Region, as.numeric(V2127B))
- )
hh_vsum_der <- hh_vsum %>%
+ mutate(
+ Tenure = factor(case_when(V2015 == 1 ~ "Owned",
+ !is.na(V2015) ~ "Rented"),
+ levels = c("Owned", "Rented")),
+ Urbanicity = factor(case_when(V2143 == 1 ~ "Urban",
+ V2143 == 2 ~ "Suburban",
+ V2143 == 3 ~ "Rural"),
+ levels = c("Urban", "Suburban", "Rural")),
+ SC214A_num = as.numeric(as.character(SC214A)),
+ Income = case_when(SC214A_num <= 8 ~ "Less than $25,000",
+ SC214A_num <= 12 ~ "$25,000-49,999",
+ SC214A_num <= 15 ~ "$50,000-99,999",
+ SC214A_num <= 17 ~ "$100,000-199,999",
+ SC214A_num <= 18 ~ "$200,000 or more"),
+ Income = fct_reorder(Income, SC214A_num, .na_rm = FALSE),
+ PlaceSize = case_match(as.numeric(as.character(V2126B)),
+ 0 ~ "Not in a place",
+ 13 ~ "Under 10,000",
+ 16 ~ "10,000-49,999",
+ 17 ~ "50,000-99,999",
+ 18 ~ "100,000-249,999",
+ 19 ~ "250,000-499,999",
+ 20 ~ "500,000-999,999",
+ c(21, 22, 23) ~ "1,000,000 or more"),
+ PlaceSize = fct_reorder(PlaceSize, as.numeric(V2126B)),
+ Region = case_match(as.numeric(V2127B),
+ 1 ~ "Northeast",
+ 2 ~ "Midwest",
+ 3 ~ "South",
+ 4 ~ "West"),
+ Region = fct_reorder(Region, as.numeric(V2127B))
+ )
As before, we want to check to make sure the recoded variables we create match the existing data as expected.
- +## # A tibble: 4 × 3
## Tenure V2015 n
## <fct> <fct> <int>
@@ -1512,14 +1510,14 @@ 13.4.2.1 Household Variables
-
+
## # A tibble: 3 × 3
## Urbanicity V2143 n
## <fct> <fct> <int>
## 1 Urban 1 26878
## 2 Suburban 2 173491
## 3 Rural 3 56091
-
+
## # A tibble: 18 × 3
## Income SC214A n
## <fct> <fct> <int>
@@ -1541,7 +1539,7 @@ 13.4.2.1 Household Variables
-
+
## # A tibble: 10 × 3
## PlaceSize V2126B n
## <fct> <fct> <int>
@@ -1555,7 +1553,7 @@ 13.4.2.1 Household Variables
-
+
## # A tibble: 4 × 3
## Region V2127B n
## <fct> <fct> <int>
@@ -1760,49 +1758,49 @@ 13.4.2.2 Person Variables
-# Set label for usage later
-NHOPI <- "Native Hawaiian or Other Pacific Islander"
-
-pers_vsum_der <- pers_vsum %>%
- mutate(
- Sex = factor(case_when(V3018 == 1 ~ "Male",
- V3018 == 2 ~ "Female")),
- RaceHispOrigin = factor(case_when(V3024 == 1 ~ "Hispanic",
- V3023A == 1 ~ "White",
- V3023A == 2 ~ "Black",
- V3023A == 4 ~ "Asian",
- V3023A == 5 ~ NHOPI,
- TRUE ~ "Other"),
- levels = c("White", "Black", "Hispanic",
- "Asian", NHOPI, "Other")),
- V3014_num = as.numeric(as.character(V3014)),
- AgeGroup = case_when(V3014_num <= 17 ~ "12-17",
- V3014_num <= 24 ~ "18-24",
- V3014_num <= 34 ~ "25-34",
- V3014_num <= 49 ~ "35-49",
- V3014_num <= 64 ~ "50-64",
- V3014_num <= 90 ~ "65 or older"),
- AgeGroup = fct_reorder(AgeGroup, V3014_num),
- MaritalStatus = factor(case_when(V3015 == 1 ~ "Married",
- V3015 == 2 ~ "Widowed",
- V3015 == 3 ~ "Divorced",
- V3015 == 4 ~ "Separated",
- V3015 == 5 ~ "Never married"),
- levels = c("Never married", "Married",
- "Widowed","Divorced",
- "Separated"))
- ) %>%
- left_join(hh_vsum_der %>% select(YEARQ, IDHH,
- V2117, V2118, Tenure:Region),
- by = c("YEARQ", "IDHH"))
+# Set label for usage later
+NHOPI <- "Native Hawaiian or Other Pacific Islander"
+
+pers_vsum_der <- pers_vsum %>%
+ mutate(
+ Sex = factor(case_when(V3018 == 1 ~ "Male",
+ V3018 == 2 ~ "Female")),
+ RaceHispOrigin = factor(case_when(V3024 == 1 ~ "Hispanic",
+ V3023A == 1 ~ "White",
+ V3023A == 2 ~ "Black",
+ V3023A == 4 ~ "Asian",
+ V3023A == 5 ~ NHOPI,
+ TRUE ~ "Other"),
+ levels = c("White", "Black", "Hispanic",
+ "Asian", NHOPI, "Other")),
+ V3014_num = as.numeric(as.character(V3014)),
+ AgeGroup = case_when(V3014_num <= 17 ~ "12-17",
+ V3014_num <= 24 ~ "18-24",
+ V3014_num <= 34 ~ "25-34",
+ V3014_num <= 49 ~ "35-49",
+ V3014_num <= 64 ~ "50-64",
+ V3014_num <= 90 ~ "65 or older"),
+ AgeGroup = fct_reorder(AgeGroup, V3014_num),
+ MaritalStatus = factor(case_when(V3015 == 1 ~ "Married",
+ V3015 == 2 ~ "Widowed",
+ V3015 == 3 ~ "Divorced",
+ V3015 == 4 ~ "Separated",
+ V3015 == 5 ~ "Never married"),
+ levels = c("Never married", "Married",
+ "Widowed","Divorced",
+ "Separated"))
+ ) %>%
+ left_join(hh_vsum_der %>% select(YEARQ, IDHH,
+ V2117, V2118, Tenure:Region),
+ by = c("YEARQ", "IDHH"))
As before, we want to check to make sure the recoded variables we create match the existing data as expected.
-
+
## # A tibble: 2 × 3
## Sex V3018 n
## <fct> <fct> <int>
## 1 Female 2 150956
## 2 Male 1 140922
-
+
## # A tibble: 11 × 3
## RaceHispOrigin V3024 n
## <fct> <fct> <int>
@@ -1817,10 +1815,10 @@ 13.4.2.2 Person Variables
-pers_vsum_der %>%
- filter(RaceHispOrigin != "Hispanic" |
- is.na(RaceHispOrigin)) %>%
- count(RaceHispOrigin, V3023A)
+pers_vsum_der %>%
+ filter(RaceHispOrigin != "Hispanic" |
+ is.na(RaceHispOrigin)) %>%
+ count(RaceHispOrigin, V3023A)
## # A tibble: 20 × 3
## RaceHispOrigin V3023A n
## <fct> <fct> <int>
@@ -1844,10 +1842,10 @@ 13.4.2.2 Person Variables
-pers_vsum_der %>% group_by(AgeGroup) %>%
- summarize(minAge = min(V3014),
- maxAge = max(V3014),
- .groups = "drop")
+pers_vsum_der %>% group_by(AgeGroup) %>%
+ summarize(minAge = min(V3014),
+ maxAge = max(V3014),
+ .groups = "drop")
## # A tibble: 6 × 3
## AgeGroup minAge maxAge
## <fct> <dbl> <dbl>
@@ -1857,7 +1855,7 @@ 13.4.2.2 Person Variables
-
+
## # A tibble: 6 × 3
## MaritalStatus V3015 n
## <fct> <fct> <int>
@@ -1868,36 +1866,36 @@ 13.4.2.2 Person Variables
We then create tibbles that contain only the variables we need, which makes it easier for analyses.
-hh_vsum_slim <- hh_vsum_der %>%
- select(YEARQ:V2118,
- WGTVICCY:ADJINC_WT,
- Tenure,
- Urbanicity,
- Income,
- PlaceSize,
- Region)
-
-pers_vsum_slim <- pers_vsum_der %>%
- select(YEARQ:WGTPERCY, WGTVICCY:ADJINC_WT, Sex:Region)
+hh_vsum_slim <- hh_vsum_der %>%
+ select(YEARQ:V2118,
+ WGTVICCY:ADJINC_WT,
+ Tenure,
+ Urbanicity,
+ Income,
+ PlaceSize,
+ Region)
+
+pers_vsum_slim <- pers_vsum_der %>%
+ select(YEARQ:WGTPERCY, WGTVICCY:ADJINC_WT, Sex:Region)
To calculate estimates about types of crime, such as what percentage of violent crimes are reported to the police, we must use the incident file. The incident file is not guaranteed to have every pseudostratum and half-sample code, so dummy records are created to append before estimation. Finally, we merge demographic variables onto the incident tibble.
-dummy_records <- hh_vsum_slim %>%
- distinct(V2117, V2118) %>%
- mutate(Dummy = 1,
- WGTVICCY = 1,
- NEWWGT = 1)
-
-inc_analysis <- inc_ind %>%
- mutate(Dummy = 0) %>%
- left_join(select(pers_vsum_slim, YEARQ, IDHH, IDPER, Sex:Region),
- by = c("YEARQ", "IDHH", "IDPER")) %>%
- bind_rows(dummy_records) %>%
- select(YEARQ:IDPER,
- WGTVICCY,
- NEWWGT,
- V4529,
- WeapCat,
- ReportPolice,
- Property:Region)
+dummy_records <- hh_vsum_slim %>%
+ distinct(V2117, V2118) %>%
+ mutate(Dummy = 1,
+ WGTVICCY = 1,
+ NEWWGT = 1)
+
+inc_analysis <- inc_ind %>%
+ mutate(Dummy = 0) %>%
+ left_join(select(pers_vsum_slim, YEARQ, IDHH, IDPER, Sex:Region),
+ by = c("YEARQ", "IDHH", "IDPER")) %>%
+ bind_rows(dummy_records) %>%
+ select(YEARQ:IDPER,
+ WGTVICCY,
+ NEWWGT,
+ V4529,
+ WeapCat,
+ ReportPolice,
+ Property:Region)
The tibbles hh_vsum_slim
, pers_vsum_slim
, and inc_analysis
can now be used to create design objects and calculate crime rate estimates.
All the data prep above is necessary to prepare the data for survey analysis. At this point, we can create the design objects and finally begin analysis. We will create three design objects for different types of analysis as they depend on which type of estimate we are creating. For the incident data, the weight of analysis is NEWWGT
, which we constructed previously. The household and person-level data use WGTHHCY
and WGTPERCY
, respectively. For all analyses, V2117
is the strata variable, and V2118
is the cluster/PSU variable for analysis.
inc_des <- inc_analysis %>%
- as_survey(
- weight = NEWWGT,
- strata = V2117,
- ids = V2118,
- nest = TRUE
- )
-
-hh_des <- hh_vsum_slim %>%
- as_survey(
- weight = WGTHHCY,
- strata = V2117,
- ids = V2118,
- nest = TRUE
- )
-
-pers_des <- pers_vsum_slim %>%
- as_survey(
- weight = WGTPERCY,
- strata = V2117,
- ids = V2118,
- nest = TRUE
- )
inc_des <- inc_analysis %>%
+ as_survey(
+ weight = NEWWGT,
+ strata = V2117,
+ ids = V2118,
+ nest = TRUE
+ )
+
+hh_des <- hh_vsum_slim %>%
+ as_survey(
+ weight = WGTHHCY,
+ strata = V2117,
+ ids = V2118,
+ nest = TRUE
+ )
+
+pers_des <- pers_vsum_slim %>%
+ as_survey(
+ weight = WGTPERCY,
+ strata = V2117,
+ ids = V2118,
+ nest = TRUE
+ )
There are two ways to calculate victimization totals. Using the incident design object (inc_des
) is the most straightforward method, but the person (pers_des
) and household (hh_des
) design objects can be used as well if the adjustment factor (ADJINC_WT
) is incorporated. In the example below, the total number of property and violent victimizations is first calculated using the incident file and then using the household and person design objects. The incident file is smaller, and thus, estimation is faster using that file, but the estimates will be the same as illustrated below:
vt1 <- inc_des %>%
- summarize(Property_Vzn = survey_total(Property, na.rm = TRUE),
- Violent_Vzn = survey_total(Violent, na.rm = TRUE))
-
-vt2a <- hh_des %>%
- summarize(Property_Vzn = survey_total(Property * ADJINC_WT,
- na.rm = TRUE))
-
-vt2b <- pers_des %>%
- summarize(Violent_Vzn = survey_total(Violent * ADJINC_WT,
- na.rm = TRUE))
-
-vt1
vt1 <- inc_des %>%
+ summarize(Property_Vzn = survey_total(Property, na.rm = TRUE),
+ Violent_Vzn = survey_total(Violent, na.rm = TRUE))
+
+vt2a <- hh_des %>%
+ summarize(Property_Vzn = survey_total(Property * ADJINC_WT,
+ na.rm = TRUE))
+
+vt2b <- pers_des %>%
+ summarize(Violent_Vzn = survey_total(Violent * ADJINC_WT,
+ na.rm = TRUE))
+
+vt1
## # A tibble: 1 × 4
## Property_Vzn Property_Vzn_se Violent_Vzn Violent_Vzn_se
## <dbl> <dbl> <dbl> <dbl>
## 1 11682056. 263844. 4598306. 198115.
-
+
## # A tibble: 1 × 2
## Property_Vzn Property_Vzn_se
## <dbl> <dbl>
## 1 11682056. 263844.
-
+
## # A tibble: 1 × 2
## Violent_Vzn Violent_Vzn_se
## <dbl> <dbl>
@@ -1974,21 +1972,21 @@ 13.6.1 Estimation 1: Victimizatio
13.6.2 Estimation 2: Victimization Proportions
Victimization proportions are proportions describing features of a victimization. The key here is that these are questions among victimizations, not among the population. These types of estimates can only be calculated using the incident design object (inc_des
).
For example, we could be interested in the percentage of property victimizations reported to the police as shown in the following code with an estimate, the standard error, and 95% confidence interval:
-prop1 <- inc_des %>%
- filter(Property) %>%
- summarize(Pct = survey_mean(ReportPolice, na.rm = TRUE, proportion=TRUE, vartype=c("se", "ci")) * 100)
-
-prop1
+prop1 <- inc_des %>%
+ filter(Property) %>%
+ summarize(Pct = survey_mean(ReportPolice, na.rm = TRUE, proportion=TRUE, vartype=c("se", "ci")) * 100)
+
+prop1
## # A tibble: 1 × 4
## Pct Pct_se Pct_low Pct_upp
## <dbl> <dbl> <dbl> <dbl>
## 1 30.8 0.798 29.2 32.4
Or, the percentage of violent victimizations that are in urban areas:
-prop2 <- inc_des %>%
- filter(Violent) %>%
- summarize(Pct = survey_mean(Urbanicity=="Urban", na.rm = TRUE) * 100)
-
-prop2
+prop2 <- inc_des %>%
+ filter(Violent) %>%
+ summarize(Pct = survey_mean(Urbanicity=="Urban", na.rm = TRUE) * 100)
+
+prop2
## # A tibble: 1 × 2
## Pct Pct_se
## <dbl> <dbl>
@@ -1999,34 +1997,34 @@ 13.6.2 Estimation 2: Victimizatio
13.6.3 Estimation 3: Victimization Rates
Victimization rates measure the number of victimizations per population. They are not an estimate of the proportion of households or persons who are victimized, which is a prevalence rate described in section 13.6.4. Victimization rates are estimated using the household (hh_des
) or person (pers_des
) design objects depending on the type of crime, and the adjustment factor (ADJINC_WT
) must be incorporated. We return to the example of property and violent victimizations used in the example for victimization totals (section 13.6.1). In the following example, the property victimization totals are calculated as above, as well as the property victimization rate (using survey_mean()
) and the population size using survey_total()
.
As mentioned in the introduction, victimization rates use the incident weight in the numerator and the person or household weight in the denominator. This is accomplished by calculating the rates with the weight adjustment (ADJINC_WT
) multiplied by the estimate of interest. Let’s look at an example of property victimization.
-vr_prop <- hh_des %>%
- summarize(
- Property_Vzn = survey_total(Property * ADJINC_WT,
- na.rm = TRUE),
- Property_Rate = survey_mean(Property * ADJINC_WT * 1000,
- na.rm = TRUE),
- PopSize = survey_total(1, vartype = NULL)
- )
-
-vr_prop
+vr_prop <- hh_des %>%
+ summarize(
+ Property_Vzn = survey_total(Property * ADJINC_WT,
+ na.rm = TRUE),
+ Property_Rate = survey_mean(Property * ADJINC_WT * 1000,
+ na.rm = TRUE),
+ PopSize = survey_total(1, vartype = NULL)
+ )
+
+vr_prop
## # A tibble: 1 × 5
## Property_Vzn Property_Vzn_se Property_Rate Property_Rate_se PopSize
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 11682056. 263844. 90.3 1.95 129319232.
In the output above, we see the estimate for property victimization rate in 2021 was 90.3 per 1,000 households, which is consistent with calculating as the number of victimizations per 1,000 population as demonstrated in the next chunk:
-
+
## # A tibble: 1 × 4
## Property_Vzn Property_Rate PopSize Property_Rate_manual
## <dbl> <dbl> <dbl> <dbl>
## 1 11682056. 90.3 129319232. 90.3
Victimization rates can also be calculated for particular characteristics of the victimization. In the following example, the rate of aggravated assault with no weapon, with a firearm, with a knife, and with another weapon.
-pers_des %>%
- summarize(across(
- starts_with("AAST_"),
- ~ survey_mean(. * ADJINC_WT * 1000, na.rm = TRUE)
- ))
+pers_des %>%
+ summarize(across(
+ starts_with("AAST_"),
+ ~ survey_mean(. * ADJINC_WT * 1000, na.rm = TRUE)
+ ))
## # A tibble: 1 × 8
## AAST_NoWeap AAST_NoWeap_se AAST_Firearm AAST_Firearm_se AAST_Knife
## <dbl> <dbl> <dbl> <dbl> <dbl>
@@ -2034,85 +2032,85 @@ 13.6.3 Estimation 3: Victimizatio
## # ℹ 3 more variables: AAST_Knife_se <dbl>, AAST_Other <dbl>,
## # AAST_Other_se <dbl>
A common desire is to calculate victimization rates by several characteristics. For example, we may want to calculate the violent victimization rate and aggravated assault rate by sex, race/Hispanic origin, age group, marital status, and household income. This requires a group_by()
statement for each categorization separately. Thus, we make a function to do this and then use map_df()
from the {purrr} package (part of the tidyverse) to loop through the variables. This function takes a demographic variable as its input (byarvar
) and calculates the violent and aggravated assault vicitimization rate for each level. It then creates some columns with the variable, the level of each variable, and a numeric version of the variable (LevelNum
) for sorting later. The function is run across multiple variables using map()
and then stacks the results into a single output using bind_rows()
.
-pers_est_by <- function(byvar) {
- pers_des %>%
- rename(Level := {{byvar}}) %>%
- filter(!is.na(Level)) %>%
- group_by(Level) %>%
- summarize(
- Violent = survey_mean(Violent * ADJINC_WT * 1000, na.rm = TRUE),
- AAST = survey_mean(AAST * ADJINC_WT * 1000, na.rm = TRUE)
- ) %>%
- mutate(
- Variable = byvar,
- LevelNum = as.numeric(Level),
- Level = as.character(Level)
- ) %>%
- select(Variable, Level, LevelNum, everything())
-}
-
-pers_est_df <-
- c("Sex", "RaceHispOrigin", "AgeGroup", "MaritalStatus", "Income") %>%
- map(pers_est_by) %>%
- bind_rows()
+pers_est_by <- function(byvar) {
+ pers_des %>%
+ rename(Level := {{byvar}}) %>%
+ filter(!is.na(Level)) %>%
+ group_by(Level) %>%
+ summarize(
+ Violent = survey_mean(Violent * ADJINC_WT * 1000, na.rm = TRUE),
+ AAST = survey_mean(AAST * ADJINC_WT * 1000, na.rm = TRUE)
+ ) %>%
+ mutate(
+ Variable = byvar,
+ LevelNum = as.numeric(Level),
+ Level = as.character(Level)
+ ) %>%
+ select(Variable, Level, LevelNum, everything())
+}
+
+pers_est_df <-
+ c("Sex", "RaceHispOrigin", "AgeGroup", "MaritalStatus", "Income") %>%
+ map(pers_est_by) %>%
+ bind_rows()
The output from all the estimates is cleanded to create better labels such as going from “RaceHispOrigin” to “Race/Hispanic Origin”. Finally, the {gt} package is used to make a publishable table (Table 13.5). Using the functions from the {gt} package, column labels and footnotes are added and estimates are presented to the first decimal place.
-vr_gt<-pers_est_df %>%
- mutate(
- Variable = case_when(
- Variable == "RaceHispOrigin" ~ "Race/Hispanic origin",
- Variable == "MaritalStatus" ~ "Marital status",
- Variable == "AgeGroup" ~ "Age",
- TRUE ~ Variable
- )
- ) %>%
- select(-LevelNum) %>%
- group_by(Variable) %>%
- gt(rowname_col = "Level") %>%
- tab_spanner(
- label = "Violent crime",
- id = "viol_span",
- columns = c("Violent", "Violent_se")
- ) %>%
- tab_spanner(label = "Aggravated assault",
- columns = c("AAST", "AAST_se")) %>%
- cols_label(
- Violent = "Rate",
- Violent_se = "SE",
- AAST = "Rate",
- AAST_se = "SE",
- ) %>%
- fmt_number(
- columns = c("Violent", "Violent_se", "AAST", "AAST_se"),
- decimals = 1
- ) %>%
- tab_footnote(
- footnote = "Includes rape or sexual assault, robbery,
- aggravated assault, and simple assault.",
- locations = cells_column_spanners(spanners = "viol_span")
- ) %>%
- tab_footnote(
- footnote = "Excludes persons of Hispanic origin",
- locations =
- cells_stub(rows = Level %in%
- c("White", "Black", "Asian", NHOPI, "Other"))) %>%
- tab_footnote(
- footnote = "Includes persons who identified as
- Native Hawaiian or Other Pacific Islander only.",
- locations = cells_stub(rows = Level == NHOPI)
- ) %>%
- tab_footnote(
- footnote = "Includes persons who identified as American Indian or
- Alaska Native only or as two or more races.",
- locations = cells_stub(rows = Level == "Other")
- ) %>%
- tab_source_note(
- source_note = "Note: Rates per 1,000 persons age 12 or older.") %>%
- tab_source_note(source_note = "Source: Bureau of Justice Statistics,
- National Crime Victimization Survey, 2021.") %>%
- tab_stubhead(label = "Victim demographic") %>%
- tab_caption("Rate and standard error of violent victimization,
- by type of crime and demographic characteristics, 2021")
-
+vr_gt<-pers_est_df %>%
+ mutate(
+ Variable = case_when(
+ Variable == "RaceHispOrigin" ~ "Race/Hispanic origin",
+ Variable == "MaritalStatus" ~ "Marital status",
+ Variable == "AgeGroup" ~ "Age",
+ TRUE ~ Variable
+ )
+ ) %>%
+ select(-LevelNum) %>%
+ group_by(Variable) %>%
+ gt(rowname_col = "Level") %>%
+ tab_spanner(
+ label = "Violent crime",
+ id = "viol_span",
+ columns = c("Violent", "Violent_se")
+ ) %>%
+ tab_spanner(label = "Aggravated assault",
+ columns = c("AAST", "AAST_se")) %>%
+ cols_label(
+ Violent = "Rate",
+ Violent_se = "SE",
+ AAST = "Rate",
+ AAST_se = "SE",
+ ) %>%
+ fmt_number(
+ columns = c("Violent", "Violent_se", "AAST", "AAST_se"),
+ decimals = 1
+ ) %>%
+ tab_footnote(
+ footnote = "Includes rape or sexual assault, robbery,
+ aggravated assault, and simple assault.",
+ locations = cells_column_spanners(spanners = "viol_span")
+ ) %>%
+ tab_footnote(
+ footnote = "Excludes persons of Hispanic origin",
+ locations =
+ cells_stub(rows = Level %in%
+ c("White", "Black", "Asian", NHOPI, "Other"))) %>%
+ tab_footnote(
+ footnote = "Includes persons who identified as
+ Native Hawaiian or Other Pacific Islander only.",
+ locations = cells_stub(rows = Level == NHOPI)
+ ) %>%
+ tab_footnote(
+ footnote = "Includes persons who identified as American Indian or
+ Alaska Native only or as two or more races.",
+ locations = cells_stub(rows = Level == "Other")
+ ) %>%
+ tab_source_note(
+ source_note = "Note: Rates per 1,000 persons age 12 or older.") %>%
+ tab_source_note(source_note = "Source: Bureau of Justice Statistics,
+ National Crime Victimization Survey, 2021.") %>%
+ tab_stubhead(label = "Victim demographic") %>%
+ tab_caption("Rate and standard error of violent victimization,
+ by type of crime and demographic characteristics, 2021")
+