From 596a5c855bae00f6e46c649a49bd413728dbf43c Mon Sep 17 00:00:00 2001
From: rpowell22
1.8 Colophonbookdown using RStudio. The complete source is available on GitHub.
This version of the book was built with R version 4.4.0 (2024-04-24) and with the packages listed in Table 1.1.
-chi_ex2_obs_table
chi_ex3_obs_table
trust_gov_gt %>%
tab_caption("Example of {gt} table with trust in government estimate")
We use data from the United States National Crime Victimization Survey (NCVS.) These data are available in the {srvyrexploR} package as ncvs_2021_incident
, ncvs_2021_household
, and ncvs_2021_person
.
We use data from the United States National Crime Victimization Survey (NCVS). These data are available in the {srvyrexploR} package as ncvs_2021_incident
, ncvs_2021_household
, and ncvs_2021_person
.
The National Crime Victimization Survey (NCVS) is a household survey sponsored by the Bureau of Justice Statistics (BJS), which collects data on criminal victimization, including characteristics of the crimes, offenders, and victims. Crime types include both household and personal crimes, as well as violent and non-violent crimes. The population of interest of this survey is all people in the United States age 12 and older living in housing units and noninstitutional group quarters.
-The NCVS has been ongoing since 1992. An earlier survey, the National Crime Survey, was run from 1972 to 1991 (U. S. Bureau of Justice Statistics 2017). The survey is administered using a rotating panel. When an address enters the sample, the residents of that address are interviewed every six months for a total of seven interviews. If the initial residents move away from the address during the period and new residents move in, the new residents are included in the survey, as people are not followed when they move.
-NCVS data are publicly available and distributed by Inter-university Consortium for Political and Social Research (ICPSR), with data going back to 1992. The vignette in this book includes data from 2021 (U.S. Bureau of Justice Statistics 2022). The NCVS data structure is complicated, and the User’s Guide contains examples for analysis in SAS, SUDAAN, SPSS, and Stata, but not R (Shook-Sa, Bonnie, Couzens, G. Lance, and Berzofsky, Marcus 2015). This vignette adapts those examples for R.
+The National Crime Victimization Survey (NCVS) is a household survey sponsored by the Bureau of Justice Statistics (BJS), which collects data on criminal victimization, including characteristics of the crimes, offenders, and victims. Crime types include both household and personal crimes, as well as violent and non-violent crimes. The population of interest of this survey is all people in the United States age 12 and older living in housing units and non-institutional group quarters.
+The NCVS has been ongoing since 1992. An earlier survey, the National Crime Survey, was run from 1972 to 1991 (U. S. Bureau of Justice Statistics 2017). The survey is administered using a rotating panel. When an address enters the sample, the residents of that address are interviewed every 6 months for a total of 7 interviews. If the initial residents move away from the address during the period and new residents move in, the new residents are included in the survey, as people are not followed when they move.
+NCVS data are publicly available and distributed by Inter-university Consortium for Political and Social Research (ICPSR), with data going back to 1992. The vignette in this book includes data from 2021 (U.S. Bureau of Justice Statistics 2022). The NCVS data structure is complicated, and the User’s Guide contains examples for analysis in SAS, SUDAAN, SPSS, and Stata, but not R (Shook-Sa, Couzens, and Berzofsky 2015). This vignette adapts those examples for R.
The NCVS User Guide (Shook-Sa, Bonnie, Couzens, G. Lance, and Berzofsky, Marcus 2015) uses the following notation:
+The NCVS User Guide (Shook-Sa, Couzens, and Berzofsky 2015) uses the following notation:
IDHH
.IDPER
.YEARQ
) for household \(i\) and individual respondent \(j\).\[\hat{VR}_{C,D}= \frac{\sum_{ijkl \in C,D} v_{ijkl}}{\sum_{ijk \in D} w_{ijk}}\times 1000\]
-where \(w_{ijk}\) is the person weight (WGTPERCY
) for personal crimes or household weight (WGTHHCY
) for household crimes. The numerator is the number of incidents in a domain, and the denominator is the number of persons or households in a domain. Notice that the weights in the numerator and denominator are different - this is important, and in the syntax and examples below, we discuss how to make an estimate that involves two weights.
WGTPERCY
) for personal crimes or household weight (WGTHHCY
) for household crimes. The numerator is the number of incidents in a domain, and the denominator is the number of persons or households in a domain. Notice that the weights in the numerator and denominator are different; this is important, and in the syntax and examples below, we discuss how to make an estimate that involves two weights.
-Some work is necessary to prepare the files before analysis. The design variables indicating pseudostratum (V2117
) and half-sample code (V2118
) are only included on the household file, so they must be added to the person and incident files for any analysis.
+Some work is necessary to prepare the files before analysis. The design variables indicating pseudo-stratum (V2117
) and half-sample code (V2118
) are only included on the household file, so they must be added to the person and incident files for any analysis.
For victimization rates, we need to know the victimization status for both victims and non-victims. Therefore, the incident file must be summarized and merged onto the household or person files for household-level and person-level crimes, respectively. We begin this vignette by discussing how to create these incident summary files. This is following Section 2.2 of the NCVS User’s Guide (Shook-Sa, Bonnie, Couzens, G. Lance, and Berzofsky, Marcus 2015).
+For victimization rates, we need to know the victimization status for both victims and non-victims. Therefore, the incident file must be summarized and merged onto the household or person files for household-level and person-level crimes, respectively. We begin this vignette by discussing how to create these incident summary files. This is following Section 2.2 of the NCVS User’s Guide (Shook-Sa, Couzens, and Berzofsky 2015).
Each record on the incident file represents one victimization, which is not the same as one incident. Some victimizations have several instances that make it difficult for the victim to differentiate the details of these incidents, labeled as “series crimes.” Appendix A of the User’s Guide indicates how to calculate the series weight in other statistical languages.
Here, we adapt that code for R. Essentially, if a victimization is a series crime, its series weight is top-coded at 10 based on the number of actual victimizations, that is, even if the crime occurred more than 10 times, it is counted as 10 times to reduce the influence of extreme outliers. If an incident is a series crime, but the number of occurrences is unknown, the series weight is set to 6. A description of the variables used to create indicators of series and the associated weights is included in Table 13.1.
V4016 | -How many times incident occur last 6 mos | -1-996 | +How many times incident occur last 6 months | +1–996 | Number of times |
V4017 | How many incidents | 1 | -1-5 incidents (not a “series”) | +1–5 incidents (not a “series”) | |
@@ -690,7 +690,7 @@ |
We want to create four variables to indicate if an incident is a series crime. First, we create a variable called series
using V4017
, V4018
, and V4019
where an incident is considered a series crime if there are 6 or more incidents (V4107
), the incidents are similar in detail (V4018
), or there is not enough detail to distinguish the incidents (V4019
.) Second, we top-code the number of incidents (V4016
) by creating a variable n10v4016
, which is set to 10 if V4016 > 10
. Third, we create the serieswgt
using the two new variables series
and n10v4019
to classify the max series based on missing data and number of incidents. Finally, we create the new weight using our new serieswgt
variable and the existing weight (WGTVICCY
.)
We want to create four variables to indicate if an incident is a series crime. First, we create a variable called series
using V4017
, V4018
, and V4019
where an incident is considered a series crime if there are 6 or more incidents (V4107
), the incidents are similar in detail (V4018
), or there is not enough detail to distinguish the incidents (V4019
). Second, we top-code the number of incidents (V4016
) by creating a variable n10v4016
, which is set to 10 if V4016 > 10
. Third, we create the serieswgt
using the two new variables series
and n10v4019
to classify the max series based on missing data and number of incidents. Finally, we create the new weight using our new serieswgt
variable and the existing weight (WGTVICCY
).
inc_series <- ncvs_2021_incident %>%
mutate(
series = case_when(
@@ -713,7 +713,13 @@ 13.4.1 Preparing files for estima
)
The next step in preparing the files for estimation is to create indicators on the victimization file for characteristics of interest. Almost all BJS publications limit the analysis to records where the victimization occurred in the United States (where V4022
is not equal to 1). We do this for all estimates as well. A brief codebook of variables for this task is located in Table 13.2.