diff --git a/01-introduction.Rmd b/01-introduction.Rmd index 7f73429d..c522ecd0 100644 --- a/01-introduction.Rmd +++ b/01-introduction.Rmd @@ -38,44 +38,18 @@ In most chapters, you'll find code that you can follow. Each of these chapters s ## Datasets used in this book {#book-datasets} -We work with two key datasets throughout the book: the Residential Energy Consumption Survey [RECS -- @recs-2020-tech] and the American National Election Studies [ANES -- @debell]. To ensure that all readers can follow the examples, we have provided analytic datasets available on OSF^[https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957]. - -If a chapter contains data that is not part of existing packages, we have created a helper function, `read_osf()`, for you to load it easily. We recommend saving the script below in a folder called "helper-fun" and calling the file `helper-function.R` if you would like to follow along with the prerequisites listed in the chapters that contain code. +We work with two key datasets throughout the book: the Residential Energy Consumption Survey [RECS -- @recs-2020-tech] and the American National Election Studies [ANES -- @debell]. To ensure that all readers can follow the examples, we have provided analytic datasets in an R package, {srvyr.data}. Install the package from GitHub using the {remotes} package. ```r -read_osf <- function(filename){ - #' Downloads file from OSF project - #' Reads in file - #' Deletes file from computer - - osf_dl_del_later <- !dir.exists("osf_dl") - - if (osf_dl_del_later) { - osf_dl_del_later <- TRUE - dir.create("osf_dl") - } - - dat_det <- - osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") %>% - osf_ls_files() %>% - dplyr::filter(name == filename) %>% - osf_download(conflicts = "overwrite", path = "osf_dl") - - out <- dat_det %>% - dplyr::pull(local_path) %>% - readr::read_rds() - - if (osf_dl_del_later) { - unlink("osf_dl", recursive = TRUE) - } else{ - unlink(dplyr::pull(dat_det, local_path)) - } - - return(out) -} +remotes::install_github("https://github.com/tidy-survey-r/srvyr.data") ``` -Here's how to use the function to read in the RECS and ANES datasets: +To explore the provided datasets in the package, access the documentation usng the `help()` command. + +```r +help(package="srvyr.data") +``` +To load the RECS and ANES datasets, start by running `library(srvyr.data)` to load the package. Then, use the `data()` command to load the datasets into the environment. ```{r} #| label: intro-setup @@ -85,8 +59,7 @@ Here's how to use the function to read in the RECS and ANES datasets: library(tidyverse) library(survey) library(srvyr) -library(osfr) -source("helper-fun/helper-function.R") +library(srvyr.data) ``` ```{r} @@ -95,26 +68,26 @@ source("helper-fun/helper-function.R") #| warning: FALSE #| message: FALSE #| cache: TRUE -recs_in <- read_osf("recs_2020.rds") -anes_in <- read_osf("anes_2020.rds") +data(recs_2020) +data(anes_2020) ``` -RECS is a study that provides energy consumption and expenditures data in American households. The Energy Information Administration funds RECS and has been fielded 15 times between 1950 and 2020. The survey has two components - the household survey and the energy supplier survey. In 2020, the household survey was collected by web and paper questionnaires and included questions about appliances, electronics, heating, air conditioning (A/C), temperatures, water heating, lighting, respondent demographics, and energy assistance. The energy supplier survey consists of components relating to energy consumption and energy expenditure. Below is an overview of the `recs_in` data: +RECS is a study that provides energy consumption and expenditures data in American households. The Energy Information Administration funds RECS and has been fielded 15 times between 1950 and 2020. The survey has two components - the household survey and the energy supplier survey. In 2020, the household survey was collected by web and paper questionnaires and included questions about appliances, electronics, heating, air conditioning (A/C), temperatures, water heating, lighting, respondent demographics, and energy assistance. The energy supplier survey consists of components relating to energy consumption and energy expenditure. Below is an overview of the `recs_2020` data: ```{r} #| label: intro-recs -recs_in %>% select(-starts_with("NWEIGHT")) -recs_in %>% select(starts_with("NWEIGHT")) +recs_2020 %>% select(-starts_with("NWEIGHT")) +recs_2020 %>% select(starts_with("NWEIGHT")) ``` -From this output, we can see that there are `r nrow(recs_in) %>% formatC(big.mark = ",")` rows and `r ncol(recs_in) %>% formatC(big.mark = ",")` variables. We can see that there are variables containing an ID (`DOEID`), geographic information (e.g., `Region`, `state_postal`, `Urbanicity`), along with information about the house, including the type of house (`HousingUnitType`) and when the house was built (`YearMade`). Additionally, there is a long list of weighting variables that we will use in the analysis (e.g., `NWEIGHT`, `NWEIGHT1`, ..., `NWEIGHT60`). We will discuss using these weighting variables in Chapter \@ref(c03-specifying-sample-designs). For a more detailed codebook, see Appendix \@ref(recs-cb). +From this output, we can see that there are `r nrow(recs_2020) %>% formatC(big.mark = ",")` rows and `r ncol(recs_2020) %>% formatC(big.mark = ",")` variables. We can see that there are variables containing an ID (`DOEID`), geographic information (e.g., `Region`, `state_postal`, `Urbanicity`), along with information about the house, including the type of house (`HousingUnitType`) and when the house was built (`YearMade`). Additionally, there is a long list of weighting variables that we will use in the analysis (e.g., `NWEIGHT`, `NWEIGHT1`, ..., `NWEIGHT60`). We will discuss using these weighting variables in Chapter \@ref(c03-specifying-sample-designs). For a more detailed codebook, see Appendix \@ref(recs-cb). -The ANES is a series study that has collected data from election surveys since 1948. These surveys contain data on public opinion and voting behavior in U.S. presidential elections. The 2020 survey (the data we will be using) was fielded to individuals over the web, through live video interviewing, or over with computer-assisted telephone interviewing (CATI). The survey includes questions on party affiliation, voting choice, and level of trust with the government. Here is an overview of the `anes_in` data. First, we show the variables starting with "V" followed by a number; these are the original variables. Then, we show you the remaining variables that we created based on the original data: +The ANES is a series study that has collected data from election surveys since 1948. These surveys contain data on public opinion and voting behavior in U.S. presidential elections. The 2020 survey (the data we will be using) was fielded to individuals over the web, through live video interviewing, or over with computer-assisted telephone interviewing (CATI). The survey includes questions on party affiliation, voting choice, and level of trust with the government. Here is an overview of the `anes_2020` data. First, we show the variables starting with "V" followed by a number; these are the original variables. Then, we show you the remaining variables that we created based on the original data: ```{r} #| label: intro-anes -anes_in %>% select(matches("^V\\d")) -anes_in %>% select(-matches("^V\\d")) +anes_2020 %>% select(matches("^V\\d")) +anes_2020 %>% select(-matches("^V\\d")) ``` -From this output we can see that there are `r nrow(anes_in) %>% formatC(big.mark = ",")` rows and `r ncol(anes_in) %>% formatC(big.mark = ",")` variables. Most of the variables start with V20, so referencing the documentation for survey will be crucial to not get lost (see Chapter \@ref(c04-understanding-survey-data-documentation)). We have created some more descriptive variables for you to use throughout this book, such as the age (`Age`) and gender (`Gender`) of the respondent, along with variables that represent their party affiliation (`PartyID`). Additionally, we need the variables `Weight` and `Stratum` to analyze this data accurately. We will discuss how to use these weighting variables in Chapters \@ref(c03-specifying-sample-designs) and \@ref(c04-understanding-survey-data-documentation). For a more detailed codebook, see Appendix \@ref(anes-cb). +From this output we can see that there are `r nrow(anes_2020) %>% formatC(big.mark = ",")` rows and `r ncol(anes_2020) %>% formatC(big.mark = ",")` variables. Most of the variables start with V20, so referencing the documentation for survey will be crucial to not get lost (see Chapter \@ref(c04-understanding-survey-data-documentation)). We have created some more descriptive variables for you to use throughout this book, such as the age (`Age`) and gender (`Gender`) of the respondent, along with variables that represent their party affiliation (`PartyID`). Additionally, we need the variables `Weight` and `Stratum` to analyze this data accurately. We will discuss how to use these weighting variables in Chapters \@ref(c03-specifying-sample-designs) and \@ref(c04-understanding-survey-data-documentation). For a more detailed codebook, see Appendix \@ref(anes-cb). diff --git a/03-specifying-sample-designs.Rmd b/03-specifying-sample-designs.Rmd index b30a9761..7a1738a3 100644 --- a/03-specifying-sample-designs.Rmd +++ b/03-specifying-sample-designs.Rmd @@ -14,8 +14,7 @@ For this chapter, load the following packages and the helper function: library(tidyverse) library(survey) library(srvyr) -library(osfr) -source("helper-fun/helper-function.R") +library(srvyr.data) ``` To help explain the different types of sample designs, this chapter will use the `api` and `scd` data that comes in the {survey} package: @@ -25,12 +24,13 @@ data(api) data(scd) ``` -Additionally, we have created multiple analytic datasets for use in this book on a directory on OSF^[https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957]. To load any data used in the book that is not included in existing packages, we have created a helper function `read_osf()`. This chapter uses data from the Residential Energy Consumption Survey (RECS) - both 2015 and 2020, so we will use the following code to load the RECS data to use later in this chapter: +Additionally, we have created multiple analytic datasets for use in the {srvyr.data} package, as described in \@ref{book-datasets}. This chapter uses data from the Residential Energy Consumption Survey (RECS) - both 2015 and 2020, so we will use the following code to load the RECS data to use later in this chapter: + ```{r} #| label: samp-setup-recs #| eval: FALSE -recs_2015_in <- read_osf("recs_2015.rds") -recs_in <- read_osf("recs_2020.rds") +data(recs_2015) +data(recs_2020) ``` ::: @@ -573,7 +573,7 @@ fay_des <- dat %>% #### Example {-} -The 2015 RECS [@recs-2015-micro] uses Fay's BRR weights with the final weight as NWEIGHT and replicate weights as BRRWT1 - BRRWT96 with $\rho=0.5$. On the file, DOEID is a unique identifier for each respondent, TOTALDOL is the total cost of energy, TOTSQFT_EN is the total square footage of the residence, and REGOINC is the Census region. We have already read in the RECS data and created a dataset called `recs_2015_in` above in the prerequisites. +The 2015 RECS [@recs-2015-micro] uses Fay's BRR weights with the final weight as NWEIGHT and replicate weights as BRRWT1 - BRRWT96 with $\rho=0.5$. On the file, DOEID is a unique identifier for each respondent, TOTALDOL is the total cost of energy, TOTSQFT_EN is the total square footage of the residence, and REGOINC is the Census region. We have already read in the RECS data and created a dataset called `recs_2015` above in the prerequisites. To specify this design, use the following syntax: @@ -583,14 +583,14 @@ To specify this design, use the following syntax: #| warning: FALSE #| message: FALSE #| cache: TRUE -recs_2015_in <- read_osf("recs_2015.rds") +data(recs_2015) ``` ```{r} #| label: samp-des-recs-des #| eval: TRUE -recs_2015_des <- recs_2015_in %>% +recs_2015_des <- recs_2015 %>% as_survey_rep(weights = NWEIGHT, repweights = BRRWT1:BRRWT96, type = "Fay", @@ -649,12 +649,13 @@ jkn_des <- dat %>% #### Example {-} -The 2020 RECS [@recs-2020-micro] uses jackknife weights with the final weight as NWEIGHT and replicate weights as NWEIGHT1 - NWEIGHT60 with a scale of $(R-1)/R=59/60$. On the file, DOEID is a unique identifier for each respondent, TOTALDOL is the total cost of energy, TOTSQFT_EN is the total square footage of the residence, and REGOINC is the Census region. We have already read in the RECS data and created a dataset called `recs_in` above in the prerequisites. +The 2020 RECS [@recs-2020-micro] uses jackknife weights with the final weight as NWEIGHT and replicate weights as NWEIGHT1 - NWEIGHT60 with a scale of $(R-1)/R=59/60$. On the file, DOEID is a unique identifier for each respondent, TOTALDOL is the total cost of energy, TOTSQFT_EN is the total square footage of the residence, and REGOINC is the Census region. We have already read in the RECS data and created a dataset called `recs_2020` above in the prerequisites. To specify this design, use the following syntax: ```{r} -recs_des <- recs_in %>% +#| label: samp-des-recs2020-des +recs_des <- recs_2020 %>% as_survey_rep( weights = NWEIGHT, repweights = NWEIGHT1:NWEIGHT60, @@ -673,7 +674,7 @@ summary(recs_des) #| label: samp-des-recs-des-full #| echo: FALSE # This is just for later use in book -recs_des <- recs_in %>% +recs_des <- recs_2020 %>% as_survey_rep( weights = NWEIGHT, repweights = NWEIGHT1:NWEIGHT60, diff --git a/04-understanding-survey-data-documentation.Rmd b/04-understanding-survey-data-documentation.Rmd index 67fc09dd..b2821541 100644 --- a/04-understanding-survey-data-documentation.Rmd +++ b/04-understanding-survey-data-documentation.Rmd @@ -14,8 +14,7 @@ For this chapter, load the following packages and the helper function: library(tidyverse) library(survey) library(srvyr) -library(osfr) -source("helper-fun/helper-function.R") +library(srvyr.data) library(censusapi) ``` @@ -23,7 +22,7 @@ We will be using data from ANES. Here is the code to read in the data. ```{r} #| label: understand-anes-c04 #| eval: FALSE -anes_in <- read_osf("anes_2020.rds") +data(anes_2020) ``` ::: @@ -250,7 +249,7 @@ The target population in 2020 is `r scales::comma(targetpop)`. This information ```{r} #| label: understand-read-anes -anes_adjwgt <- anes_in %>% +anes_adjwgt <- anes_2020 %>% mutate(Weight = V200010b / sum(V200010b) * targetpop) ``` diff --git a/05-descriptive-analysis.Rmd b/05-descriptive-analysis.Rmd index 3696a7e1..0c862ae9 100644 --- a/05-descriptive-analysis.Rmd +++ b/05-descriptive-analysis.Rmd @@ -14,8 +14,7 @@ For this chapter, load the following packages and the helper function: library(tidyverse) library(survey) library(srvyr) -library(osfr) -source("helper-fun/helper-function.R") +library(srvyr.data) library(broom) ``` @@ -40,10 +39,10 @@ We will be using data from ANES and RECS. Here is the code to create the design ```{r} #| label: desc-anes-des #| eval: FALSE -anes_in <- read_osf("anes_2020.rds") targetpop <- 231592693 +data(anes_2020) -anes_adjwgt <- anes_in %>% +anes_adjwgt <- anes_2020 %>% mutate(Weight = Weight / sum(Weight) * targetpop) anes_des <- anes_adjwgt %>% @@ -60,9 +59,9 @@ For RECS, details are included in the RECS documentation and Chapter \@ref(c03-s ```{r} #| label: desc-recs-des #| eval: FALSE -recs_in <- read_osf("recs_2020.rds") +data(recs_2020) -recs_des <- recs_in %>% +recs_des <- recs_2020 %>% as_survey_rep( weights = NWEIGHT, repweights = NWEIGHT1:NWEIGHT60, @@ -978,7 +977,7 @@ It is estimated that American residential households spent an average of `r .elb Briefly, we mentioned using `filter()` to subset a survey object for analysis. This operation should be done after creating the design object. In rare circumstances, subsetting data before creating the object can lead to incorrect variability estimates. This can occur if subsetting removes an entire PSU. -Suppose we wanted estimates of the average amount spent on natural gas among housing units that use natural gas using the variable `BTUNG`^[`BTUNG` is derived from the supplier side component of the survey where `BTUNG` represents the natural gas consumption in British thermal units (BTUs) in a year]. This could be obtained by first filtering records to only include records where `BTUNG > 0` and then finding the average amount of money spent. +Suppose we wanted estimates of the average amount spent on natural gas among housing units that use natural gas using the variable `BTUNG`^[`BTUNG` is derived from the supplier side component of the survey where `BTUNG` represents the natural gas consumption in British thermal units (Btus) in a year]. This could be obtained by first filtering records to only include records where `BTUNG > 0` and then finding the average amount of money spent. ```{r} #| label: desc-subpop diff --git a/06-statistical-testing.Rmd b/06-statistical-testing.Rmd index 90e75478..2f49d5e5 100644 --- a/06-statistical-testing.Rmd +++ b/06-statistical-testing.Rmd @@ -14,8 +14,7 @@ For this chapter, load the following packages and the helper function: library(tidyverse) library(survey) library(srvyr) -library(osfr) -source("helper-fun/helper-function.R") +library(srvyr.data) library(broom) library(gt) ``` @@ -24,10 +23,10 @@ We will be using data from ANES and RECS. Here is the code to create the design ```{r} #| label: stattest-anes-des #| eval: FALSE -anes_in <- read_osf("anes_2020.rds") targetpop <- 231592693 +data(anes_2020) -anes_adjwgt <- anes_in %>% +anes_adjwgt <- anes_2020 %>% mutate(Weight = Weight / sum(Weight) * targetpop) anes_des <- anes_adjwgt %>% @@ -44,9 +43,9 @@ For RECS, details are included in the RECS documentation and Chapter \@ref(c03-s ```{r} #| label: stattest-recs-des #| eval: FALSE -recs_in <- read_osf("recs_2020.rds") +data(recs_2020) -recs_des <- recs_in %>% +recs_des <- recs_2020 %>% as_survey_rep( weights = NWEIGHT, repweights = NWEIGHT1:NWEIGHT60, diff --git a/07-modeling.Rmd b/07-modeling.Rmd index 04dabc1c..a1e19a18 100644 --- a/07-modeling.Rmd +++ b/07-modeling.Rmd @@ -14,8 +14,7 @@ For this chapter, load the following packages and the helper function: library(tidyverse) library(survey) library(srvyr) -library(osfr) -source("helper-fun/helper-function.R") +library(srvyr.data) library(broom) ``` @@ -23,10 +22,10 @@ We will be using data from ANES and RECS. Here is the code to create the design ```{r} #| label: model-anes-des #| eval: FALSE -anes_in <- read_osf("anes_2020.rds") targetpop <- 231592693 +data(anes_2020) -anes_adjwgt <- anes_in %>% +anes_adjwgt <- anes_2020 %>% mutate(Weight = Weight / sum(Weight) * targetpop) anes_des <- anes_adjwgt %>% @@ -41,9 +40,7 @@ For RECS, details are included in the RECS documentation and Chapter \@ref(c03-s ```{r} #| label: model-recs-des #| eval: FALSE -recs_in <- read_osf("recs_2020.rds") - -recs_des <- recs_in %>% +recs_des <- recs_2020 %>% as_survey_rep( weights = NWEIGHT, repweights = NWEIGHT1:NWEIGHT60, @@ -215,7 +212,7 @@ On RECS, we can obtain information on the square footage of homes and the electr #| fig.alt: Hex chart where each hexagon represents a number of housing units at a point. x-axis is 'Total square footage' ranging from 0 to 7,500 and y-axis is 'Amount spent on electricity' ranging from $0 to 8,000. The trend is relatively linear and positve. A high concentration of points have square footage between 0 and 2,500 square feet as well as between electricity expenditure between $0 and 2,000 #| echo: FALSE #| warning: FALSE -recs_in %>% +recs_2020 %>% ggplot(aes( x = TOTSQFT_EN, y = DOLLAREL, @@ -311,7 +308,7 @@ Additionally, `augment()` can be used to predict outcomes for data not used in m ```{r} #| label: model-predict-new-dat add_data <- - recs_in %>% select(DOEID, + recs_2020 %>% select(DOEID, Region, Urbanicity, TOTSQFT_EN, @@ -649,7 +646,7 @@ tidy(earlyvote_mod) %>% arrange(p.value) ```{r} #| label: model-ex-logistic-2 -add_vote_dat <- anes_in %>% +add_vote_dat <- anes_2020 %>% select(EarlyVote2020, Age, Education, PartyID) %>% rbind(tibble( EarlyVote2020 = NA, diff --git a/08-communicating-results.Rmd b/08-communicating-results.Rmd index d43ec848..546cddb5 100644 --- a/08-communicating-results.Rmd +++ b/08-communicating-results.Rmd @@ -14,8 +14,7 @@ For this chapter, load the following packages and the helper function: library(tidyverse) library(survey) library(srvyr) -library(osfr) -source("helper-fun/helper-function.R") +library(srvyr.data) library(gt) library(gtsummary) ``` @@ -25,10 +24,10 @@ We will be using data from ANES. Here is the code to create the ANES design obje ```{r} #| label: results-anes-des #| eval: FALSE -anes_in <- read_osf("anes_2020.rds") targetpop <- 231592693 +data(anes_2020) -anes_adjwgt <- anes_in %>% +anes_adjwgt <- anes_2020 %>% mutate(Weight = Weight / sum(Weight) * targetpop) anes_des <- anes_adjwgt %>% diff --git a/09-ncvs-vignette.Rmd b/09-ncvs-vignette.Rmd index c1801d13..bb558a0b 100644 --- a/09-ncvs-vignette.Rmd +++ b/09-ncvs-vignette.Rmd @@ -13,8 +13,7 @@ For this chapter, load the following packages and the helper function: #| message: FALSE library(tidyverse) library(srvyr) -library(osfr) -source("helper-fun/helper-function.R") +library(srvyr.data) library(gt) ``` @@ -22,9 +21,9 @@ We will be using data from NCVS. Here is the code to read in the three datasets ```{r} #| label: ncvs-data #| cache: TRUE -inc_in <- read_osf("ncvs_2021_incident.rds") -hh_in <- read_osf("ncvs_2021_household.rds") -pers_in <- read_osf("ncvs_2021_person.rds") +data(ncvs_2021_incident) +data(ncvs_2021_household) +data(ncvs_2021_person) ``` ::: @@ -119,7 +118,7 @@ We want to create four variables to indicate if an incident is a series crime. #| label: ncvs-vign-incfile #| message: false #| cache: TRUE -inc_series <- inc_in %>% +inc_series <- ncvs_2021_incident %>% mutate( series = case_when(V4017 %in% c(1, 8) ~ 1, V4018 %in% c(2, 8) ~ 1, @@ -314,12 +313,12 @@ hh_z_list <- rep(0, ncol(inc_hh_sums) - 3) %>% as.list() %>% pers_z_list <- rep(0, ncol(inc_pers_sums) - 4) %>% as.list() %>% setNames(names(inc_pers_sums)[-(1:4)]) -hh_vsum <- hh_in %>% +hh_vsum <- ncvs_2021_household %>% full_join(inc_hh_sums, by = c("YEARQ", "IDHH")) %>% replace_na(hh_z_list) %>% mutate(ADJINC_WT = if_else(is.na(WGTVICCY), 0, WGTVICCY / WGTHHCY)) -pers_vsum <- pers_in %>% +pers_vsum <- ncvs_2021_person %>% full_join(inc_pers_sums, by = c("YEARQ", "IDHH", "IDPER")) %>% replace_na(pers_z_list) %>% mutate(ADJINC_WT = if_else(is.na(WGTVICCY), 0, WGTVICCY / WGTPERCY)) diff --git a/90-AppendixA.Rmd b/90-AppendixA.Rmd index 1c627c1d..9b2c74c7 100644 --- a/90-AppendixA.Rmd +++ b/90-AppendixA.Rmd @@ -15,7 +15,7 @@ library(janitor) library(kableExtra) library(knitr) -anes_2020 <- anes_in +data(anes_2020) attrlist <- map(anes_2020, attributes) @@ -27,7 +27,6 @@ NULL_to_NA <- function(x){ } } - anes_var_info <- tibble( Vars=names(attrlist), Section=map_chr(attrlist, "Section") %>% unname(), @@ -45,8 +44,6 @@ anes_var_info <- tibble( ) %>% ungroup() - - cb_count <- function(dat, var){ t <- dat %>% count(.data[[var]]) %>% diff --git a/91-AppendixB.Rmd b/91-AppendixB.Rmd index dc22defd..35a6c2f5 100644 --- a/91-AppendixB.Rmd +++ b/91-AppendixB.Rmd @@ -11,7 +11,7 @@ library(janitor) library(kableExtra) library(knitr) -recs <- recs_in +recs <- recs_2020 ``` The full codebook with the original variables is available at [https://www.eia.gov/consumption/residential/data/2020/index.php?view=microdata](https://www.eia.gov/consumption/residential/data/2020/index.php?view=microdata) - "Variable and response codebook". This codebook includes the variables on the dataset included for download along with this book. diff --git a/DataCleaningScripts/00_Run.R b/DataCleaningScripts/00_Run.R index 5c11556d..1ec96c83 100644 --- a/DataCleaningScripts/00_Run.R +++ b/DataCleaningScripts/00_Run.R @@ -1,24 +1,4 @@ -rmarkdown::render( - input=here::here("DataCleaningScripts", "RECS_2015_DataPrep.Rmd"), - envir=new.env() -) - -rmarkdown::render( - input=here::here("DataCleaningScripts", "RECS_2020_DataPrep.Rmd"), - envir=new.env() -) - -rmarkdown::render( - input=here::here("DataCleaningScripts", "ANES_2020_DataPrep.Rmd"), - envir=new.env() -) - rmarkdown::render( input=here::here("DataCleaningScripts", "LAPOP_2021_DataPrep.Rmd"), envir=new.env() ) - -rmarkdown::render( - input=here::here("DataCleaningScripts", "NCVS_2021_DataPrep.Rmd"), - envir=new.env() -) diff --git a/DataCleaningScripts/ANES_2020_DataPrep.Rmd b/DataCleaningScripts/ANES_2020_DataPrep.Rmd index ed2a2471..e69de29b 100644 --- a/DataCleaningScripts/ANES_2020_DataPrep.Rmd +++ b/DataCleaningScripts/ANES_2020_DataPrep.Rmd @@ -1,395 +0,0 @@ ---- -title: "American National Election Studies (ANES) 2020 Time Series Study Data Prep" -output: - github_document: - html_preview: false ---- - -```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) -``` - -## Data information - -All data and resources were downloaded from https://electionstudies.org/data-center/2020-time-series-study/ on February 28, 2022. - -American National Election Studies. 2021. ANES 2020 Time Series Study Full Release [dataset and documentation]. www.electionstudies.org -```{r loadpackageh, message=FALSE} -library(here) # easy relative paths -``` - - - -```{r loadpackages} -library(tidyverse) # data manipulation -library(haven) # data import -library(tidylog) # informative logging messages -library(osfr) -``` -## Import data and create derived variables - -```{r derivedata} -anes_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="ANES_2020", pattern="sav") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -anes_in_2020 <- read_sav(pull(anes_file_osf_det, local_path)) - -unlink(pull(anes_file_osf_det, local_path)) - -# weight validity for post-election survey -anes_in_2020 %>% - select(V200004, V200010a, V200010b) %>% - group_by(V200004) %>% #type of respondent - summarise( - n=n(), - nvalidwt_pre=sum(!is.na(V200010a) & V200010a>0), - nvalidwt_post=sum(!is.na(V200010b) & V200010b>0) - ) - -# Are all PSU/Stratum represented in post-weight? If so, we can drop pre-only cases later - -anes_in_2020 %>% - count(V200010d, V200010c, V200004) %>% - group_by(V200010d, V200010c) %>% - mutate( - Pct=n/sum(n) - ) %>% - filter(V200004==3) %>% - arrange(Pct) - - -anes_2020 <- anes_in_2020 %>% - filter(V200004==3) %>% - select( - V200001, - V200001, - V200002, # MODE OF INTERVIEW: PRE-ELECTION INTERVIEW - V200010b, # FULL SAMPLE POST-ELECTION WEIGHT - V200010d, # FULL SAMPLE VARIANCE STRATUM - V200010c, # FULL SAMPLE VARIANCE UNIT - V201006, # PRE: HOW INTERESTED IN FOLLOWING CAMPAIGNS - V201102, # PRE: DID R VOTE FOR PRESIDENT IN 2016 - V201101, # PRE: DID R VOTE FOR PRESIDENT IN 2016 [REVISED] - V201103, # PRE: RECALL OF LAST (2016) PRESIDENTIAL VOTE CHOICE) - V201025x, # PRE: SUMMARY: REGISTRATION AND EARLY VOTE STATUS - V201228, - V201229, - V201230, - V201231x, # PRE: SUMMARY: PARTY ID - V201233, # PRE: HOW OFTEN TRUST GOVERNMENT IN WASHINGTON TO DO WHAT IS RIGHT [REVISED] - V201237, # PRE: HOW OFTEN CAN PEOPLE BE TRUSTED - V201507x, # PRE: SUMMARY: RESPONDENT AGE - V201510, # PRE: HIGHEST LEVEL OF EDUCATION - V201546, - starts_with("V201547"), - V201549x, # PRE: SUMMARY: R SELF-IDENTIFIED RACE/ETHNICITY - V201600, # PRE: WHAT IS YOUR (R) SEX? [REVISED] - V201607, - V201610, - V201611, - V201613, - V201615, - V201616, - V201617x, # PRE: SUMMARY: TOTAL (FAMILY) INCOME - V202066, # POST: DID R VOTE IN NOVEMBER 2020 ELECTION - V201024, - V202066, - V202051, - V202109x, # PRE-POST: SUMMARY: VOTER TURNOUT IN 2020 - V202072, # POST: DID R VOTE FOR PRESIDENT - V201029, - V202073, # POST: FOR WHOM DID R VOTE FOR PRESIDENT - V202110x # PRE-POST: SUMMARY: 2020 PRESIDENTIAL VOTE - ) %>% - mutate( - CaseID=V200001, - InterviewMode = fct_recode(as.character(V200002), Video = "1", Telephone = "2", Web = "3"), - Weight = V200010b, - Stratum = as.factor(V200010d), - VarUnit = as.factor(V200010c), - Age = if_else(V201507x > 0, as.numeric(V201507x), NA_real_), - AgeGroup = cut(Age, c(17, 29, 39, 49, 59, 69, 200), - labels = c("18-29", "30-39", "40-49", "50-59", "60-69", "70 or older") - ), - Gender = factor( - case_when( - V201600 == 1 ~ "Male", - V201600 == 2 ~ "Female", - TRUE ~ NA_character_ - ), - levels = c("Male", "Female") - ), - RaceEth = factor( - case_when( - V201549x == 1 ~ "White", - V201549x == 2 ~ "Black", - V201549x == 3 ~ "Hispanic", - V201549x == 4 ~ "Asian, NH/PI", - V201549x == 5 ~ "AI/AN", - V201549x == 6 ~ "Other/multiple race", - TRUE ~ NA_character_ - ), - levels = c("White", "Black", "Hispanic", "Asian, NH/PI", "AI/AN", "Other/multiple race", NA_character_) - ), - PartyID = factor( - case_when( - V201231x == 1 ~ "Strong democrat", - V201231x == 2 ~ "Not very strong democrat", - V201231x == 3 ~ "Independent-democrat", - V201231x == 4 ~ "Independent", - V201231x == 5 ~ "Independent-republican", - V201231x == 6 ~ "Not very strong republican", - V201231x == 7 ~ "Strong republican", - TRUE ~ NA_character_ - ), - levels = c("Strong democrat", "Not very strong democrat", "Independent-democrat", "Independent", "Independent-republican", "Not very strong republican", "Strong republican") - ), - Education = factor( - case_when( - V201510 <= 0 ~ NA_character_, - V201510 == 1 ~ "Less than HS", - V201510 == 2 ~ "High school", - V201510 <= 5 ~ "Post HS", - V201510 == 6 ~ "Bachelor's", - V201510 <= 8 ~ "Graduate", - TRUE ~ NA_character_ - ), - levels = c("Less than HS", "High school", "Post HS", "Bachelor's", "Graduate") - ), - Income = cut(V201617x, c(-5, 1:22), - labels = c( - "Under $9,999", - "$10,000-14,999", - "$15,000-19,999", - "$20,000-24,999", - "$25,000-29,999", - "$30,000-34,999", - "$35,000-39,999", - "$40,000-44,999", - "$45,000-49,999", - "$50,000-59,999", - "$60,000-64,999", - "$65,000-69,999", - "$70,000-74,999", - "$75,000-79,999", - "$80,000-89,999", - "$90,000-99,999", - "$100,000-109,999", - "$110,000-124,999", - "$125,000-149,999", - "$150,000-174,999", - "$175,000-249,999", - "$250,000 or more" - ) - ), - Income7 = fct_collapse( - Income, - "Under $20k" = c("Under $9,999", "$10,000-14,999", "$15,000-19,999"), - "$20-40k" = c("$20,000-24,999", "$25,000-29,999", "$30,000-34,999", "$35,000-39,999"), - "$40-60k" = c("$40,000-44,999", "$45,000-49,999", "$50,000-59,999"), - "$60-80k" = c("$60,000-64,999", "$65,000-69,999", "$70,000-74,999", "$75,000-79,999"), - "$80-100k" = c("$80,000-89,999", "$90,000-99,999"), - "$100-125k" = c("$100,000-109,999", "$110,000-124,999"), - "$125k or more" = c("$125,000-149,999", "$150,000-174,999", "$175,000-249,999", "$250,000 or more") - ), - CampaignInterest = factor( - case_when( - V201006 == 1 ~ "Very much interested", - V201006 == 2 ~ "Somewhat interested", - V201006 == 3 ~ "Not much interested", - TRUE ~ NA_character_ - ), - levels = c("Very much interested", "Somewhat interested", "Not much interested") - ), - TrustGovernment = factor( - case_when( - V201233 == 1 ~ "Always", - V201233 == 2 ~ "Most of the time", - V201233 == 3 ~ "About half the time", - V201233 == 4 ~ "Some of the time", - V201233 == 5 ~ "Never", - TRUE ~ NA_character_ - ), - levels = c("Always", "Most of the time", "About half the time", "Some of the time", "Never") - ), - TrustPeople = factor( - case_when( - V201237 == 1 ~ "Always", - V201237 == 2 ~ "Most of the time", - V201237 == 3 ~ "About half the time", - V201237 == 4 ~ "Some of the time", - V201237 == 5 ~ "Never", - TRUE ~ NA_character_ - ), - levels = c("Always", "Most of the time", "About half the time", "Some of the time", "Never") - ), - VotedPres2016 = factor( - case_when( - V201101 == 1 | V201102 == 1 ~ "Yes", - V201101 == 2 | V201102 == 2 ~ "No", - TRUE ~ NA_character_ - ), - levels = c("Yes", "No") - ), - VotedPres2016_selection = factor( - case_when( - V201103 == 1 ~ "Clinton", - V201103 == 2 ~ "Trump", - V201103 == 5 ~ "Other", - TRUE ~ NA_character_ - ), - levels = c("Clinton", "Trump", "Other") - ), - VotedPres2020 = factor( - case_when( - V202072 == 1 ~ "Yes", - V202072 == 2 ~ "No", - TRUE ~ NA_character_ - ), - levels = c("Yes", "No") - ), - VotedPres2020_selection = factor( - case_when( - V202110x == 1 ~ "Biden", - V202110x == 2 ~ "Trump", - V202110x >= 3 & V202110x <= 5~ "Other", - TRUE ~ NA_character_ - ), - levels = c("Biden", "Trump", "Other") - ), - EarlyVote2020 = factor( - case_when( - V201025x < 0 ~ NA_character_, - V201025x == 4 ~ "Yes", - VotedPres2020 == "Yes" ~ "No", - TRUE ~ NA_character_), - levels = c("Yes", "No") - ) - ) - -summary(anes_2020) -``` - -## Check derived variables for correct coding - -```{r checkvars} - -anes_2020 %>% count(InterviewMode, V200002) - -anes_2020 %>% - group_by(AgeGroup) %>% - summarise( - minAge = min(Age), - maxAge = max(Age), - minV = min(V201507x), - maxV = max(V201507x) - ) - -anes_2020 %>% count(Gender, V201600) - -anes_2020 %>% count(RaceEth, V201549x) - -anes_2020 %>% count(PartyID, V201231x) - -anes_2020 %>% count(Education, V201510) - -anes_2020 %>% - count(Income, Income7, V201617x) %>% - print(n = 30) - -anes_2020 %>% count(CampaignInterest, V201006) - -anes_2020 %>% count(TrustGovernment, V201233) - -anes_2020 %>% count(TrustPeople, V201237) - -anes_2020 %>% count(VotedPres2016, V201101, V201102) - -anes_2020 %>% count(VotedPres2016_selection, V201103) - -anes_2020 %>% count(VotedPres2020, V202072) - -anes_2020 %>% count(VotedPres2020_selection, V202110x) - -anes_2020 %>% count(EarlyVote2020, V201025x, VotedPres2020) - -anes_2020 %>% - summarise(WtSum = sum(Weight, na.rm = TRUE)) %>% - pull(WtSum) -``` - - -## Label and order data - -```{r} -#label: label-ord - -cb_in <- readxl::read_xlsx(here::here("DataCleaningScripts", "ANES Codebook Metadata.xlsx")) - -cb_ord <- cb_in %>% - mutate( - Type=1, - SectNum=case_match( - Section, - "ADMIN"~1, - "WEIGHTS"~2, - "PRE-ELECTION SURVEY QUESTIONNAIRE"~3, - "POST-ELECTION SURVEY QUESTIONNAIRE"~4 - )) %>% - arrange(SectNum, Variable) %>% - mutate( - Order=row_number() - ) - - - - -cb_slim <- cb_ord %>% - select(Variable=BookDerived, `Description and Labels`, Question, Section, SectNum, Order) %>% - filter(!is.na(Variable)) %>% - separate_longer_delim(Variable, delim="; ") %>% - add_case(Variable="VotedPres2016", `Description and Labels`="PRE: Did R vote for President in 2016", Question="Derived from V201102, V201101", Section="PRE-ELECTION SURVEY QUESTIONNAIRE", SectNum=3, Order=11) %>% - add_case(Variable="EarlyVote2020", `Description and Labels`="PRE-POST: Voted early for president", Question="Derived from V201025x, VotedPres2020", Section="POST-ELECTION SURVEY QUESTIONNAIRE", SectNum=4, Order=44) %>% - mutate(Type=2) %>% - bind_rows(select(cb_ord, -BookDerived)) %>% - arrange(SectNum, Order, Type) - -names(anes_2020)[!(names(anes_2020) %in% pull(cb_slim, Variable))] - -cb_vars <- cb_slim %>% - filter(Variable %in% names(anes_2020)) - - -anes_ord <- anes_2020 %>% - select(all_of(pull(cb_vars, Variable))) - -options("tidylog.display" = list()) - -for (var in pull(cb_vars, Variable)) { - vi <- cb_vars %>% filter(Variable==var) - attr(anes_ord[[deparse(as.name(var))]], "format.spss") <- NULL - attr(anes_ord[[deparse(as.name(var))]], "display_width") <- NULL - attr(anes_ord[[deparse(as.name(var))]], "label") <- pull(vi, `Description and Labels`) - attr(anes_ord[[deparse(as.name(var))]], "Section") <- pull(vi, Section) %>% as.character() - if (!is.na(pull(vi, Question))) attr(anes_ord[[deparse(as.name(var))]], "Question") <- pull(vi, Question) -} - -options("tidylog.display" = NULL) - -``` - - - -## Save data - -```{r savedat} -summary(anes_ord) - -anes_der_tmp_loc <- here("osf_dl", "anes_2020.rds") -write_rds(anes_ord, anes_der_tmp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=anes_der_tmp_loc, conflicts="overwrite") -unlink(anes_der_tmp_loc) - -``` diff --git a/DataCleaningScripts/ANES_2020_DataPrep.md b/DataCleaningScripts/ANES_2020_DataPrep.md index d673ba6d..e69de29b 100644 --- a/DataCleaningScripts/ANES_2020_DataPrep.md +++ b/DataCleaningScripts/ANES_2020_DataPrep.md @@ -1,881 +0,0 @@ -American National Election Studies (ANES) 2020 Time Series Study Data -Prep -================ - -## Data information - -All data and resources were downloaded from - on -February 28, 2022. - -American National Election Studies. 2021. ANES 2020 Time Series Study -Full Release \[dataset and documentation\]. www.electionstudies.org - -``` r -library(here) # easy relative paths -``` - -``` r -library(tidyverse) # data manipulation -library(haven) # data import -library(tidylog) # informative logging messages -``` - - ## - ## Attaching package: 'tidylog' - - ## The following objects are masked from 'package:srvyr': - ## - ## anti_join, drop_na, filter, filter_all, filter_at, filter_if, group_by, group_by_all, group_by_at, group_by_if, mutate, mutate_all, mutate_at, mutate_if, rename, - ## rename_all, rename_at, rename_if, rename_with, select, select_all, select_at, select_if, semi_join, summarise, summarise_all, summarise_at, summarise_if, summarize, - ## summarize_all, summarize_at, summarize_if, transmute, ungroup - - ## The following objects are masked from 'package:dplyr': - ## - ## add_count, add_tally, anti_join, count, distinct, distinct_all, distinct_at, distinct_if, filter, filter_all, filter_at, filter_if, full_join, group_by, group_by_all, - ## group_by_at, group_by_if, inner_join, left_join, mutate, mutate_all, mutate_at, mutate_if, relocate, rename, rename_all, rename_at, rename_if, rename_with, right_join, - ## sample_frac, sample_n, select, select_all, select_at, select_if, semi_join, slice, slice_head, slice_max, slice_min, slice_sample, slice_tail, summarise, summarise_all, - ## summarise_at, summarise_if, summarize, summarize_all, summarize_at, summarize_if, tally, top_frac, top_n, transmute, transmute_all, transmute_at, transmute_if, ungroup - - ## The following objects are masked from 'package:tidyr': - ## - ## drop_na, fill, gather, pivot_longer, pivot_wider, replace_na, spread, uncount - - ## The following object is masked from 'package:stats': - ## - ## filter - -``` r -library(osfr) -``` - -## Import data and create derived variables - -``` r -anes_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="ANES_2020", pattern="sav") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -anes_in_2020 <- read_sav(pull(anes_file_osf_det, local_path)) - -unlink(pull(anes_file_osf_det, local_path)) - -# weight validity for post-election survey -anes_in_2020 %>% - select(V200004, V200010a, V200010b) %>% - group_by(V200004) %>% #type of respondent - summarise( - n=n(), - nvalidwt_pre=sum(!is.na(V200010a) & V200010a>0), - nvalidwt_post=sum(!is.na(V200010b) & V200010b>0) - ) -``` - - ## select: dropped 1,768 variables (version, V200001, V160001_orig, V200002, V200003, …) - - ## group_by: one grouping variable (V200004) - - ## summarise: now 2 rows and 4 columns, ungrouped - - ## # A tibble: 2 × 4 - ## V200004 n nvalidwt_pre nvalidwt_post - ## - ## 1 1 [1. pre-election interview (only) complete] 827 827 0 - ## 2 3 [3. pre and post-election interviews (both) complete] 7453 7453 7453 - -``` r -# Are all PSU/Stratum represented in post-weight? If so, we can drop pre-only cases later - -anes_in_2020 %>% - count(V200010d, V200010c, V200004) %>% - group_by(V200010d, V200010c) %>% - mutate( - Pct=n/sum(n) - ) %>% - filter(V200004==3) %>% - arrange(Pct) -``` - - ## count: now 202 rows and 4 columns, ungrouped - - ## group_by: 2 grouping variables (V200010d, V200010c) - - ## mutate (grouped): new variable 'Pct' (double) with 168 unique values and 0% NA - - ## filter (grouped): removed 101 rows (50%), 101 rows remaining - - ## # A tibble: 101 × 5 - ## # Groups: V200010d, V200010c [101] - ## V200010d V200010c V200004 n Pct - ## - ## 1 32 1 3 [3. pre and post-election interviews (both) complete] 63 0.797 - ## 2 33 2 3 [3. pre and post-election interviews (both) complete] 67 0.798 - ## 3 45 1 3 [3. pre and post-election interviews (both) complete] 60 0.8 - ## 4 8 1 3 [3. pre and post-election interviews (both) complete] 72 0.828 - ## 5 38 2 3 [3. pre and post-election interviews (both) complete] 71 0.835 - ## 6 49 2 3 [3. pre and post-election interviews (both) complete] 67 0.838 - ## 7 36 1 3 [3. pre and post-election interviews (both) complete] 68 0.85 - ## 8 47 2 3 [3. pre and post-election interviews (both) complete] 69 0.852 - ## 9 50 2 3 [3. pre and post-election interviews (both) complete] 69 0.852 - ## 10 4 1 3 [3. pre and post-election interviews (both) complete] 76 0.854 - ## # ℹ 91 more rows - -``` r -anes_2020 <- anes_in_2020 %>% - filter(V200004==3) %>% - select( - V200001, - V200001, - V200002, # MODE OF INTERVIEW: PRE-ELECTION INTERVIEW - V200010b, # FULL SAMPLE POST-ELECTION WEIGHT - V200010d, # FULL SAMPLE VARIANCE STRATUM - V200010c, # FULL SAMPLE VARIANCE UNIT - V201006, # PRE: HOW INTERESTED IN FOLLOWING CAMPAIGNS - V201102, # PRE: DID R VOTE FOR PRESIDENT IN 2016 - V201101, # PRE: DID R VOTE FOR PRESIDENT IN 2016 [REVISED] - V201103, # PRE: RECALL OF LAST (2016) PRESIDENTIAL VOTE CHOICE) - V201025x, # PRE: SUMMARY: REGISTRATION AND EARLY VOTE STATUS - V201228, - V201229, - V201230, - V201231x, # PRE: SUMMARY: PARTY ID - V201233, # PRE: HOW OFTEN TRUST GOVERNMENT IN WASHINGTON TO DO WHAT IS RIGHT [REVISED] - V201237, # PRE: HOW OFTEN CAN PEOPLE BE TRUSTED - V201507x, # PRE: SUMMARY: RESPONDENT AGE - V201510, # PRE: HIGHEST LEVEL OF EDUCATION - V201546, - starts_with("V201547"), - V201549x, # PRE: SUMMARY: R SELF-IDENTIFIED RACE/ETHNICITY - V201600, # PRE: WHAT IS YOUR (R) SEX? [REVISED] - V201607, - V201610, - V201611, - V201613, - V201615, - V201616, - V201617x, # PRE: SUMMARY: TOTAL (FAMILY) INCOME - V202066, # POST: DID R VOTE IN NOVEMBER 2020 ELECTION - V201024, - V202066, - V202051, - V202109x, # PRE-POST: SUMMARY: VOTER TURNOUT IN 2020 - V202072, # POST: DID R VOTE FOR PRESIDENT - V201029, - V202073, # POST: FOR WHOM DID R VOTE FOR PRESIDENT - V202110x # PRE-POST: SUMMARY: 2020 PRESIDENTIAL VOTE - ) %>% - mutate( - CaseID=V200001, - InterviewMode = fct_recode(as.character(V200002), Video = "1", Telephone = "2", Web = "3"), - Weight = V200010b, - Stratum = as.factor(V200010d), - VarUnit = as.factor(V200010c), - Age = if_else(V201507x > 0, as.numeric(V201507x), NA_real_), - AgeGroup = cut(Age, c(17, 29, 39, 49, 59, 69, 200), - labels = c("18-29", "30-39", "40-49", "50-59", "60-69", "70 or older") - ), - Gender = factor( - case_when( - V201600 == 1 ~ "Male", - V201600 == 2 ~ "Female", - TRUE ~ NA_character_ - ), - levels = c("Male", "Female") - ), - RaceEth = factor( - case_when( - V201549x == 1 ~ "White", - V201549x == 2 ~ "Black", - V201549x == 3 ~ "Hispanic", - V201549x == 4 ~ "Asian, NH/PI", - V201549x == 5 ~ "AI/AN", - V201549x == 6 ~ "Other/multiple race", - TRUE ~ NA_character_ - ), - levels = c("White", "Black", "Hispanic", "Asian, NH/PI", "AI/AN", "Other/multiple race", NA_character_) - ), - PartyID = factor( - case_when( - V201231x == 1 ~ "Strong democrat", - V201231x == 2 ~ "Not very strong democrat", - V201231x == 3 ~ "Independent-democrat", - V201231x == 4 ~ "Independent", - V201231x == 5 ~ "Independent-republican", - V201231x == 6 ~ "Not very strong republican", - V201231x == 7 ~ "Strong republican", - TRUE ~ NA_character_ - ), - levels = c("Strong democrat", "Not very strong democrat", "Independent-democrat", "Independent", "Independent-republican", "Not very strong republican", "Strong republican") - ), - Education = factor( - case_when( - V201510 <= 0 ~ NA_character_, - V201510 == 1 ~ "Less than HS", - V201510 == 2 ~ "High school", - V201510 <= 5 ~ "Post HS", - V201510 == 6 ~ "Bachelor's", - V201510 <= 8 ~ "Graduate", - TRUE ~ NA_character_ - ), - levels = c("Less than HS", "High school", "Post HS", "Bachelor's", "Graduate") - ), - Income = cut(V201617x, c(-5, 1:22), - labels = c( - "Under $9,999", - "$10,000-14,999", - "$15,000-19,999", - "$20,000-24,999", - "$25,000-29,999", - "$30,000-34,999", - "$35,000-39,999", - "$40,000-44,999", - "$45,000-49,999", - "$50,000-59,999", - "$60,000-64,999", - "$65,000-69,999", - "$70,000-74,999", - "$75,000-79,999", - "$80,000-89,999", - "$90,000-99,999", - "$100,000-109,999", - "$110,000-124,999", - "$125,000-149,999", - "$150,000-174,999", - "$175,000-249,999", - "$250,000 or more" - ) - ), - Income7 = fct_collapse( - Income, - "Under $20k" = c("Under $9,999", "$10,000-14,999", "$15,000-19,999"), - "$20-40k" = c("$20,000-24,999", "$25,000-29,999", "$30,000-34,999", "$35,000-39,999"), - "$40-60k" = c("$40,000-44,999", "$45,000-49,999", "$50,000-59,999"), - "$60-80k" = c("$60,000-64,999", "$65,000-69,999", "$70,000-74,999", "$75,000-79,999"), - "$80-100k" = c("$80,000-89,999", "$90,000-99,999"), - "$100-125k" = c("$100,000-109,999", "$110,000-124,999"), - "$125k or more" = c("$125,000-149,999", "$150,000-174,999", "$175,000-249,999", "$250,000 or more") - ), - CampaignInterest = factor( - case_when( - V201006 == 1 ~ "Very much interested", - V201006 == 2 ~ "Somewhat interested", - V201006 == 3 ~ "Not much interested", - TRUE ~ NA_character_ - ), - levels = c("Very much interested", "Somewhat interested", "Not much interested") - ), - TrustGovernment = factor( - case_when( - V201233 == 1 ~ "Always", - V201233 == 2 ~ "Most of the time", - V201233 == 3 ~ "About half the time", - V201233 == 4 ~ "Some of the time", - V201233 == 5 ~ "Never", - TRUE ~ NA_character_ - ), - levels = c("Always", "Most of the time", "About half the time", "Some of the time", "Never") - ), - TrustPeople = factor( - case_when( - V201237 == 1 ~ "Always", - V201237 == 2 ~ "Most of the time", - V201237 == 3 ~ "About half the time", - V201237 == 4 ~ "Some of the time", - V201237 == 5 ~ "Never", - TRUE ~ NA_character_ - ), - levels = c("Always", "Most of the time", "About half the time", "Some of the time", "Never") - ), - VotedPres2016 = factor( - case_when( - V201101 == 1 | V201102 == 1 ~ "Yes", - V201101 == 2 | V201102 == 2 ~ "No", - TRUE ~ NA_character_ - ), - levels = c("Yes", "No") - ), - VotedPres2016_selection = factor( - case_when( - V201103 == 1 ~ "Clinton", - V201103 == 2 ~ "Trump", - V201103 == 5 ~ "Other", - TRUE ~ NA_character_ - ), - levels = c("Clinton", "Trump", "Other") - ), - VotedPres2020 = factor( - case_when( - V202072 == 1 ~ "Yes", - V202072 == 2 ~ "No", - TRUE ~ NA_character_ - ), - levels = c("Yes", "No") - ), - VotedPres2020_selection = factor( - case_when( - V202110x == 1 ~ "Biden", - V202110x == 2 ~ "Trump", - V202110x >= 3 & V202110x <= 5~ "Other", - TRUE ~ NA_character_ - ), - levels = c("Biden", "Trump", "Other") - ), - EarlyVote2020 = factor( - case_when( - V201025x < 0 ~ NA_character_, - V201025x == 4 ~ "Yes", - VotedPres2020 == "Yes" ~ "No", - TRUE ~ NA_character_), - levels = c("Yes", "No") - ) - ) -``` - - ## filter: removed 827 rows (10%), 7,453 rows remaining - - ## select: dropped 1,729 variables (version, V160001_orig, V200003, V200004, V200005, …) - - ## mutate: new variable 'CaseID' (double) with 7,453 unique values and 0% NA - - ## new variable 'InterviewMode' (factor) with 3 unique values and 0% NA - - ## new variable 'Weight' (double) with 7,195 unique values and 0% NA - - ## new variable 'Stratum' (factor) with 50 unique values and 0% NA - - ## new variable 'VarUnit' (factor) with 3 unique values and 0% NA - - ## new variable 'Age' (double) with 64 unique values and 4% NA - - ## new variable 'AgeGroup' (factor) with 7 unique values and 4% NA - - ## new variable 'Gender' (factor) with 3 unique values and 1% NA - - ## new variable 'RaceEth' (factor) with 7 unique values and 1% NA - - ## new variable 'PartyID' (factor) with 8 unique values and <1% NA - - ## new variable 'Education' (factor) with 6 unique values and 2% NA - - ## new variable 'Income' (factor) with 23 unique values and 7% NA - - ## new variable 'Income7' (factor) with 8 unique values and 7% NA - - ## new variable 'CampaignInterest' (factor) with 4 unique values and <1% NA - - ## new variable 'TrustGovernment' (factor) with 6 unique values and <1% NA - - ## new variable 'TrustPeople' (factor) with 6 unique values and <1% NA - - ## new variable 'VotedPres2016' (factor) with 3 unique values and <1% NA - - ## new variable 'VotedPres2016_selection' (factor) with 4 unique values and 23% NA - - ## new variable 'VotedPres2020' (factor) with 3 unique values and 19% NA - - ## new variable 'VotedPres2020_selection' (factor) with 4 unique values and 16% NA - - ## new variable 'EarlyVote2020' (factor) with 3 unique values and 15% NA - -``` r -summary(anes_2020) -``` - - ## V200001 V200002 V200010b V200010d V200010c V201006 V201102 V201101 V201103 V201025x V201228 - ## Min. :200015 Min. :1.000 Min. :0.008262 Min. : 1.00 Min. :1.000 Min. :-9.000 Min. :-9.0000 Min. :-9.00000 Min. :-9.000 Min. :-4.000 Min. :-9.00 - ## 1st Qu.:225427 1st Qu.:3.000 1st Qu.:0.386263 1st Qu.:12.00 1st Qu.:1.000 1st Qu.: 1.000 1st Qu.:-1.0000 1st Qu.:-1.00000 1st Qu.: 1.000 1st Qu.: 3.000 1st Qu.: 1.00 - ## Median :335416 Median :3.000 Median :0.686301 Median :24.00 Median :2.000 Median : 1.000 Median : 1.0000 Median :-1.00000 Median : 1.000 Median : 3.000 Median : 2.00 - ## Mean :336416 Mean :2.911 Mean :1.000000 Mean :24.63 Mean :1.507 Mean : 1.596 Mean : 0.1048 Mean : 0.08493 Mean : 1.042 Mean : 2.919 Mean : 1.99 - ## 3rd Qu.:427865 3rd Qu.:3.000 3rd Qu.:1.211032 3rd Qu.:37.00 3rd Qu.:2.000 3rd Qu.: 2.000 3rd Qu.: 1.0000 3rd Qu.: 1.00000 3rd Qu.: 2.000 3rd Qu.: 3.000 3rd Qu.: 3.00 - ## Max. :535469 Max. :3.000 Max. :6.650665 Max. :50.00 Max. :3.000 Max. : 3.000 Max. : 2.0000 Max. : 2.00000 Max. : 5.000 Max. : 4.000 Max. : 5.00 - ## - ## V201229 V201230 V201231x V201233 V201237 V201507x V201510 V201546 V201547a V201547b V201547c V201547d - ## Min. :-9.0000 Min. :-9.00000 Min. :-9.000 Min. :-9.000 Min. :-9.00 Min. :-9.00 Min. :-9.000 Min. :-9.000 Min. :-3 Min. :-3 Min. :-3 Min. :-3 - ## 1st Qu.:-1.0000 1st Qu.:-1.00000 1st Qu.: 2.000 1st Qu.: 3.000 1st Qu.: 2.00 1st Qu.:35.00 1st Qu.: 3.000 1st Qu.: 2.000 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 - ## Median : 1.0000 Median :-1.00000 Median : 4.000 Median : 4.000 Median : 3.00 Median :51.00 Median : 5.000 Median : 2.000 Median :-3 Median :-3 Median :-3 Median :-3 - ## Mean : 0.5154 Mean : 0.01302 Mean : 3.834 Mean : 3.429 Mean : 2.78 Mean :49.43 Mean : 5.621 Mean : 1.841 Mean :-3 Mean :-3 Mean :-3 Mean :-3 - ## 3rd Qu.: 1.0000 3rd Qu.: 1.00000 3rd Qu.: 6.000 3rd Qu.: 4.000 3rd Qu.: 3.00 3rd Qu.:66.00 3rd Qu.: 6.000 3rd Qu.: 2.000 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 - ## Max. : 2.0000 Max. : 3.00000 Max. : 7.000 Max. : 5.000 Max. : 5.00 Max. :80.00 Max. :95.000 Max. : 2.000 Max. :-3 Max. :-3 Max. :-3 Max. :-3 - ## - ## V201547e V201547z V201549x V201600 V201607 V201610 V201611 V201613 V201615 V201616 V201617x V202066 V201024 - ## Min. :-3 Min. :-3 Min. :-9.000 Min. :-9.000 Min. :-3 Min. :-3 Min. :-3 Min. :-3 Min. :-3 Min. :-3 Min. :-9.00 Min. :-9.000 Min. :-9.0000 - ## 1st Qu.:-3 1st Qu.:-3 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.: 4.00 1st Qu.: 4.000 1st Qu.:-1.0000 - ## Median :-3 Median :-3 Median : 1.000 Median : 2.000 Median :-3 Median :-3 Median :-3 Median :-3 Median :-3 Median :-3 Median :11.00 Median : 4.000 Median :-1.0000 - ## Mean :-3 Mean :-3 Mean : 1.499 Mean : 1.472 Mean :-3 Mean :-3 Mean :-3 Mean :-3 Mean :-3 Mean :-3 Mean :10.36 Mean : 3.402 Mean :-0.8595 - ## 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:17.00 3rd Qu.: 4.000 3rd Qu.:-1.0000 - ## Max. :-3 Max. :-3 Max. : 6.000 Max. : 2.000 Max. :-3 Max. :-3 Max. :-3 Max. :-3 Max. :-3 Max. :-3 Max. :22.00 Max. : 4.000 Max. : 4.0000 - ## - ## V202051 V202109x V202072 V201029 V202073 V202110x CaseID InterviewMode Weight Stratum VarUnit - ## Min. :-9.0000 Min. :-2.0000 Min. :-9.0000 Min. :-9.0000 Min. :-9.0000 Min. :-9.0000 Min. :200015 Video : 274 Min. :0.008262 12 : 179 1:3689 - ## 1st Qu.:-1.0000 1st Qu.: 1.0000 1st Qu.: 1.0000 1st Qu.:-1.0000 1st Qu.: 1.0000 1st Qu.: 1.0000 1st Qu.:225427 Telephone: 115 1st Qu.:0.386263 6 : 172 2:3750 - ## Median :-1.0000 Median : 1.0000 Median : 1.0000 Median :-1.0000 Median : 1.0000 Median : 1.0000 Median :335416 Web :7064 Median :0.686301 27 : 172 3: 14 - ## Mean :-0.7259 Mean : 0.8578 Mean : 0.6234 Mean :-0.8967 Mean : 0.9415 Mean : 0.9902 Mean :336416 Mean :1.000000 21 : 170 - ## 3rd Qu.:-1.0000 3rd Qu.: 1.0000 3rd Qu.: 1.0000 3rd Qu.:-1.0000 3rd Qu.: 2.0000 3rd Qu.: 2.0000 3rd Qu.:427865 3rd Qu.:1.211032 25 : 169 - ## Max. : 3.0000 Max. : 1.0000 Max. : 2.0000 Max. :12.0000 Max. :12.0000 Max. : 5.0000 Max. :535469 Max. :6.650665 1 : 167 - ## (Other):6424 - ## Age AgeGroup Gender RaceEth PartyID Education Income Income7 - ## Min. :18.00 18-29 : 871 Male :3375 White :5420 Strong democrat :1796 Less than HS: 312 Under $9,999 : 647 $125k or more:1468 - ## 1st Qu.:37.00 30-39 :1241 Female:4027 Black : 650 Strong republican :1545 High school :1160 $50,000-59,999 : 485 Under $20k :1076 - ## Median :53.00 40-49 :1081 NA's : 51 Hispanic : 662 Independent-democrat : 881 Post HS :2514 $100,000-109,999: 451 $20-40k :1051 - ## Mean :51.83 50-59 :1200 Asian, NH/PI : 248 Independent : 876 Bachelor's :1877 $250,000 or more: 405 $40-60k : 984 - ## 3rd Qu.:66.00 60-69 :1436 AI/AN : 155 Not very strong democrat: 790 Graduate :1474 $80,000-89,999 : 383 $60-80k : 920 - ## Max. :80.00 70 or older:1330 Other/multiple race: 237 (Other) :1540 NA's : 116 (Other) :4565 (Other) :1437 - ## NA's :294 NA's : 294 NA's : 81 NA's : 25 NA's : 517 NA's : 517 - ## CampaignInterest TrustGovernment TrustPeople VotedPres2016 VotedPres2016_selection VotedPres2020 VotedPres2020_selection EarlyVote2020 - ## Very much interested:3940 Always : 80 Always : 48 Yes :5810 Clinton:2911 Yes :5952 Biden:3509 Yes : 371 - ## Somewhat interested :2569 Most of the time :1016 Most of the time :3511 No :1622 Trump :2466 No : 77 Trump:2567 No :5949 - ## Not much interested : 943 About half the time:2313 About half the time:2020 NA's: 21 Other : 390 NA's:1424 Other: 158 NA's:1133 - ## NA's : 1 Some of the time :3313 Some of the time :1597 NA's :1686 NA's :1219 - ## Never : 702 Never : 264 - ## NA's : 29 NA's : 13 - ## - -## Check derived variables for correct coding - -``` r -anes_2020 %>% count(InterviewMode, V200002) -``` - - ## count: now 3 rows and 3 columns, ungrouped - - ## # A tibble: 3 × 3 - ## InterviewMode V200002 n - ## - ## 1 Video 1 [1. Video] 274 - ## 2 Telephone 2 [2. Telephone] 115 - ## 3 Web 3 [3. Web] 7064 - -``` r -anes_2020 %>% - group_by(AgeGroup) %>% - summarise( - minAge = min(Age), - maxAge = max(Age), - minV = min(V201507x), - maxV = max(V201507x) - ) -``` - - ## group_by: one grouping variable (AgeGroup) - - ## summarise: now 7 rows and 5 columns, ungrouped - - ## # A tibble: 7 × 5 - ## AgeGroup minAge maxAge minV maxV - ## - ## 1 18-29 18 29 18 29 - ## 2 30-39 30 39 30 39 - ## 3 40-49 40 49 40 49 - ## 4 50-59 50 59 50 59 - ## 5 60-69 60 69 60 69 - ## 6 70 or older 70 80 70 80 [80. Age 80 or older] - ## 7 NA NA -9 [-9. Refused] -9 [-9. Refused] - -``` r -anes_2020 %>% count(Gender, V201600) -``` - - ## count: now 3 rows and 3 columns, ungrouped - - ## # A tibble: 3 × 3 - ## Gender V201600 n - ## - ## 1 Male 1 [1. Male] 3375 - ## 2 Female 2 [2. Female] 4027 - ## 3 -9 [-9. Refused] 51 - -``` r -anes_2020 %>% count(RaceEth, V201549x) -``` - - ## count: now 8 rows and 3 columns, ungrouped - - ## # A tibble: 8 × 3 - ## RaceEth V201549x n - ## - ## 1 White 1 [1. White, non-Hispanic] 5420 - ## 2 Black 2 [2. Black, non-Hispanic] 650 - ## 3 Hispanic 3 [3. Hispanic] 662 - ## 4 Asian, NH/PI 4 [4. Asian or Native Hawaiian/other Pacific Islander, non-Hispanic alone] 248 - ## 5 AI/AN 5 [5. Native American/Alaska Native or other race, non-Hispanic alone] 155 - ## 6 Other/multiple race 6 [6. Multiple races, non-Hispanic] 237 - ## 7 -9 [-9. Refused] 75 - ## 8 -8 [-8. Don't know] 6 - -``` r -anes_2020 %>% count(PartyID, V201231x) -``` - - ## count: now 9 rows and 3 columns, ungrouped - - ## # A tibble: 9 × 3 - ## PartyID V201231x n - ## - ## 1 Strong democrat 1 [1. Strong Democrat] 1796 - ## 2 Not very strong democrat 2 [2. Not very strong Democrat] 790 - ## 3 Independent-democrat 3 [3. Independent-Democrat] 881 - ## 4 Independent 4 [4. Independent] 876 - ## 5 Independent-republican 5 [5. Independent-Republican] 782 - ## 6 Not very strong republican 6 [6. Not very strong Republican] 758 - ## 7 Strong republican 7 [7. Strong Republican] 1545 - ## 8 -9 [-9. Refused] 23 - ## 9 -8 [-8. Don't know] 2 - -``` r -anes_2020 %>% count(Education, V201510) -``` - - ## count: now 11 rows and 3 columns, ungrouped - - ## # A tibble: 11 × 3 - ## Education V201510 n - ## - ## 1 Less than HS 1 [1. Less than high school credential] 312 - ## 2 High school 2 [2. High school graduate - High school diploma or equivalent (e.g. GED)] 1160 - ## 3 Post HS 3 [3. Some college but no degree] 1519 - ## 4 Post HS 4 [4. Associate degree in college - occupational/vocational] 550 - ## 5 Post HS 5 [5. Associate degree in college - academic] 445 - ## 6 Bachelor's 6 [6. Bachelor's degree (e.g. BA, AB, BS)] 1877 - ## 7 Graduate 7 [7. Master's degree (e.g. MA, MS, MEng, MEd, MSW, MBA)] 1092 - ## 8 Graduate 8 [8. Professional school degree (e.g. MD, DDS, DVM, LLB, JD)/Doctoral degree (e.g. PHD, EDD)] 382 - ## 9 -9 [-9. Refused] 25 - ## 10 -8 [-8. Don't know] 1 - ## 11 95 [95. Other {SPECIFY}] 90 - -``` r -anes_2020 %>% - count(Income, Income7, V201617x) %>% - print(n = 30) -``` - - ## count: now 24 rows and 4 columns, ungrouped - - ## # A tibble: 24 × 4 - ## Income Income7 V201617x n - ## - ## 1 Under $9,999 Under $20k 1 [1. Under $9,999] 647 - ## 2 $10,000-14,999 Under $20k 2 [2. $10,000-14,999] 244 - ## 3 $15,000-19,999 Under $20k 3 [3. $15,000-19,999] 185 - ## 4 $20,000-24,999 $20-40k 4 [4. $20,000-24,999] 301 - ## 5 $25,000-29,999 $20-40k 5 [5. $25,000-29,999] 228 - ## 6 $30,000-34,999 $20-40k 6 [6. $30,000-34,999] 296 - ## 7 $35,000-39,999 $20-40k 7 [7. $35,000-39,999] 226 - ## 8 $40,000-44,999 $40-60k 8 [8. $40,000-44,999] 286 - ## 9 $45,000-49,999 $40-60k 9 [9. $45,000-49,999] 213 - ## 10 $50,000-59,999 $40-60k 10 [10. $50,000-59,999] 485 - ## 11 $60,000-64,999 $60-80k 11 [11. $60,000-64,999] 294 - ## 12 $65,000-69,999 $60-80k 12 [12. $65,000-69,999] 168 - ## 13 $70,000-74,999 $60-80k 13 [13. $70,000-74,999] 243 - ## 14 $75,000-79,999 $60-80k 14 [14. $75,000-79,999] 215 - ## 15 $80,000-89,999 $80-100k 15 [15. $80,000-89,999] 383 - ## 16 $90,000-99,999 $80-100k 16 [16. $90,000-99,999] 291 - ## 17 $100,000-109,999 $100-125k 17 [17. $100,000-109,999] 451 - ## 18 $110,000-124,999 $100-125k 18 [18. $110,000-124,999] 312 - ## 19 $125,000-149,999 $125k or more 19 [19. $125,000-149,999] 323 - ## 20 $150,000-174,999 $125k or more 20 [20. $150,000-174,999] 366 - ## 21 $175,000-249,999 $125k or more 21 [21. $175,000-249,999] 374 - ## 22 $250,000 or more $125k or more 22 [22. $250,000 or more] 405 - ## 23 -9 [-9. Refused] 502 - ## 24 -5 [-5. Interview breakoff (sufficient partial IW)] 15 - -``` r -anes_2020 %>% count(CampaignInterest, V201006) -``` - - ## count: now 4 rows and 3 columns, ungrouped - - ## # A tibble: 4 × 3 - ## CampaignInterest V201006 n - ## - ## 1 Very much interested 1 [1. Very much interested] 3940 - ## 2 Somewhat interested 2 [2. Somewhat interested] 2569 - ## 3 Not much interested 3 [3. Not much interested] 943 - ## 4 -9 [-9. Refused] 1 - -``` r -anes_2020 %>% count(TrustGovernment, V201233) -``` - - ## count: now 7 rows and 3 columns, ungrouped - - ## # A tibble: 7 × 3 - ## TrustGovernment V201233 n - ## - ## 1 Always 1 [1. Always] 80 - ## 2 Most of the time 2 [2. Most of the time] 1016 - ## 3 About half the time 3 [3. About half the time] 2313 - ## 4 Some of the time 4 [4. Some of the time] 3313 - ## 5 Never 5 [5. Never] 702 - ## 6 -9 [-9. Refused] 26 - ## 7 -8 [-8. Don't know] 3 - -``` r -anes_2020 %>% count(TrustPeople, V201237) -``` - - ## count: now 7 rows and 3 columns, ungrouped - - ## # A tibble: 7 × 3 - ## TrustPeople V201237 n - ## - ## 1 Always 1 [1. Always] 48 - ## 2 Most of the time 2 [2. Most of the time] 3511 - ## 3 About half the time 3 [3. About half the time] 2020 - ## 4 Some of the time 4 [4. Some of the time] 1597 - ## 5 Never 5 [5. Never] 264 - ## 6 -9 [-9. Refused] 12 - ## 7 -8 [-8. Don't know] 1 - -``` r -anes_2020 %>% count(VotedPres2016, V201101, V201102) -``` - - ## count: now 8 rows and 4 columns, ungrouped - - ## # A tibble: 8 × 4 - ## VotedPres2016 V201101 V201102 n - ## - ## 1 Yes -1 [-1. Inapplicable] 1 [1. Yes, voted] 3030 - ## 2 Yes 1 [1. Yes, voted] -1 [-1. Inapplicable] 2780 - ## 3 No -1 [-1. Inapplicable] 2 [2. No, didn't vote] 743 - ## 4 No 2 [2. No, didn't vote] -1 [-1. Inapplicable] 879 - ## 5 -9 [-9. Refused] -1 [-1. Inapplicable] 13 - ## 6 -8 [-8. Don't know] -1 [-1. Inapplicable] 1 - ## 7 -1 [-1. Inapplicable] -9 [-9. Refused] 6 - ## 8 -1 [-1. Inapplicable] -8 [-8. Don't know] 1 - -``` r -anes_2020 %>% count(VotedPres2016_selection, V201103) -``` - - ## count: now 6 rows and 3 columns, ungrouped - - ## # A tibble: 6 × 3 - ## VotedPres2016_selection V201103 n - ## - ## 1 Clinton 1 [1. Hillary Clinton] 2911 - ## 2 Trump 2 [2. Donald Trump] 2466 - ## 3 Other 5 [5. Other {SPECIFY}] 390 - ## 4 -9 [-9. Refused] 41 - ## 5 -8 [-8. Don't know] 2 - ## 6 -1 [-1. Inapplicable] 1643 - -``` r -anes_2020 %>% count(VotedPres2020, V202072) -``` - - ## count: now 5 rows and 3 columns, ungrouped - - ## # A tibble: 5 × 3 - ## VotedPres2020 V202072 n - ## - ## 1 Yes 1 [1. Yes, voted for President] 5952 - ## 2 No 2 [2. No, didn't vote for President] 77 - ## 3 -9 [-9. Refused] 2 - ## 4 -6 [-6. No post-election interview] 4 - ## 5 -1 [-1. Inapplicable] 1418 - -``` r -anes_2020 %>% count(VotedPres2020_selection, V202110x) -``` - - ## count: now 8 rows and 3 columns, ungrouped - - ## # A tibble: 8 × 3 - ## VotedPres2020_selection V202110x n - ## - ## 1 Biden 1 [1. Joe Biden] 3509 - ## 2 Trump 2 [2. Donald Trump] 2567 - ## 3 Other 3 [3. Jo Jorgensen] 74 - ## 4 Other 4 [4. Howie Hawkins] 24 - ## 5 Other 5 [5. Other candidate {SPECIFY}] 60 - ## 6 -9 [-9. Refused] 81 - ## 7 -8 [-8. Don't know] 2 - ## 8 -1 [-1. Inapplicable] 1136 - -``` r -anes_2020 %>% count(EarlyVote2020, V201025x, VotedPres2020) -``` - - ## count: now 12 rows and 4 columns, ungrouped - - ## # A tibble: 12 × 4 - ## EarlyVote2020 V201025x VotedPres2020 n - ## - ## 1 Yes 4 [4. Registered and voted early] Yes 2 - ## 2 Yes 4 [4. Registered and voted early] 369 - ## 3 No 1 [1. Not registered (or DK/RF), does not intend to register (or DK/RF intent)] Yes 32 - ## 4 No 2 [2. Not registered (or DK/RF), intends to register] Yes 105 - ## 5 No 3 [3. Registered but did not vote early (or DK/RF)] Yes 5812 - ## 6 -4 [-4. Technical error] Yes 1 - ## 7 1 [1. Not registered (or DK/RF), does not intend to register (or DK/RF intent)] No 2 - ## 8 1 [1. Not registered (or DK/RF), does not intend to register (or DK/RF intent)] 305 - ## 9 2 [2. Not registered (or DK/RF), intends to register] No 1 - ## 10 2 [2. Not registered (or DK/RF), intends to register] 184 - ## 11 3 [3. Registered but did not vote early (or DK/RF)] No 74 - ## 12 3 [3. Registered but did not vote early (or DK/RF)] 566 - -``` r -anes_2020 %>% - summarise(WtSum = sum(Weight, na.rm = TRUE)) %>% - pull(WtSum) -``` - - ## summarise: now one row and one column, ungrouped - - ## [1] 7453 - -## Label and order data - -``` r -#label: label-ord - -cb_in <- readxl::read_xlsx(here::here("DataCleaningScripts", "ANES Codebook Metadata.xlsx")) - -cb_ord <- cb_in %>% - mutate( - Type=1, - SectNum=case_match( - Section, - "ADMIN"~1, - "WEIGHTS"~2, - "PRE-ELECTION SURVEY QUESTIONNAIRE"~3, - "POST-ELECTION SURVEY QUESTIONNAIRE"~4 - )) %>% - arrange(SectNum, Variable) %>% - mutate( - Order=row_number() - ) -``` - - ## mutate: new variable 'Type' (double) with one unique value and 0% NA - - ## new variable 'SectNum' (double) with 4 unique values and 0% NA - - ## mutate: new variable 'Order' (integer) with 42 unique values and 0% NA - -``` r -cb_slim <- cb_ord %>% - select(Variable=BookDerived, `Description and Labels`, Question, Section, SectNum, Order) %>% - filter(!is.na(Variable)) %>% - separate_longer_delim(Variable, delim="; ") %>% - add_case(Variable="VotedPres2016", `Description and Labels`="PRE: Did R vote for President in 2016", Question="Derived from V201102, V201101", Section="PRE-ELECTION SURVEY QUESTIONNAIRE", SectNum=3, Order=11) %>% - add_case(Variable="EarlyVote2020", `Description and Labels`="PRE-POST: Voted early for president", Question="Derived from V201025x, VotedPres2020", Section="POST-ELECTION SURVEY QUESTIONNAIRE", SectNum=4, Order=44) %>% - mutate(Type=2) %>% - bind_rows(select(cb_ord, -BookDerived)) %>% - arrange(SectNum, Order, Type) -``` - - ## select: dropped 2 variables (BookDerived, Type) - - ## filter: removed 25 rows (60%), 17 rows remaining - - ## mutate: new variable 'Type' (double) with one unique value and 0% NA - - ## select: dropped one variable (BookDerived) - -``` r -names(anes_2020)[!(names(anes_2020) %in% pull(cb_slim, Variable))] -``` - - ## character(0) - -``` r -cb_vars <- cb_slim %>% - filter(Variable %in% names(anes_2020)) -``` - - ## filter: no rows removed - -``` r -anes_ord <- anes_2020 %>% - select(all_of(pull(cb_vars, Variable))) -``` - - ## select: columns reordered (V200001, CaseID, V200002, InterviewMode, V200010b, …) - -``` r -options("tidylog.display" = list()) - -for (var in pull(cb_vars, Variable)) { - vi <- cb_vars %>% filter(Variable==var) - attr(anes_ord[[deparse(as.name(var))]], "format.spss") <- NULL - attr(anes_ord[[deparse(as.name(var))]], "display_width") <- NULL - attr(anes_ord[[deparse(as.name(var))]], "label") <- pull(vi, `Description and Labels`) - attr(anes_ord[[deparse(as.name(var))]], "Section") <- pull(vi, Section) %>% as.character() - if (!is.na(pull(vi, Question))) attr(anes_ord[[deparse(as.name(var))]], "Question") <- pull(vi, Question) -} - -options("tidylog.display" = NULL) -``` - -## Save data - -``` r -summary(anes_ord) -``` - - ## V200001 CaseID V200002 InterviewMode V200010b Weight V200010c VarUnit V200010d Stratum V201006 - ## Min. :200015 Min. :200015 Min. :1.000 Video : 274 Min. :0.008262 Min. :0.008262 Min. :1.000 1:3689 Min. : 1.00 12 : 179 Min. :-9.000 - ## 1st Qu.:225427 1st Qu.:225427 1st Qu.:3.000 Telephone: 115 1st Qu.:0.386263 1st Qu.:0.386263 1st Qu.:1.000 2:3750 1st Qu.:12.00 6 : 172 1st Qu.: 1.000 - ## Median :335416 Median :335416 Median :3.000 Web :7064 Median :0.686301 Median :0.686301 Median :2.000 3: 14 Median :24.00 27 : 172 Median : 1.000 - ## Mean :336416 Mean :336416 Mean :2.911 Mean :1.000000 Mean :1.000000 Mean :1.507 Mean :24.63 21 : 170 Mean : 1.596 - ## 3rd Qu.:427865 3rd Qu.:427865 3rd Qu.:3.000 3rd Qu.:1.211032 3rd Qu.:1.211032 3rd Qu.:2.000 3rd Qu.:37.00 25 : 169 3rd Qu.: 2.000 - ## Max. :535469 Max. :535469 Max. :3.000 Max. :6.650665 Max. :6.650665 Max. :3.000 Max. :50.00 1 : 167 Max. : 3.000 - ## (Other):6424 - ## CampaignInterest V201024 V201025x V201029 V201101 V201102 VotedPres2016 V201103 VotedPres2016_selection V201228 - ## Very much interested:3940 Min. :-9.0000 Min. :-4.000 Min. :-9.0000 Min. :-9.00000 Min. :-9.0000 Yes :5810 Min. :-9.000 Clinton:2911 Min. :-9.00 - ## Somewhat interested :2569 1st Qu.:-1.0000 1st Qu.: 3.000 1st Qu.:-1.0000 1st Qu.:-1.00000 1st Qu.:-1.0000 No :1622 1st Qu.: 1.000 Trump :2466 1st Qu.: 1.00 - ## Not much interested : 943 Median :-1.0000 Median : 3.000 Median :-1.0000 Median :-1.00000 Median : 1.0000 NA's: 21 Median : 1.000 Other : 390 Median : 2.00 - ## NA's : 1 Mean :-0.8595 Mean : 2.919 Mean :-0.8967 Mean : 0.08493 Mean : 0.1048 Mean : 1.042 NA's :1686 Mean : 1.99 - ## 3rd Qu.:-1.0000 3rd Qu.: 3.000 3rd Qu.:-1.0000 3rd Qu.: 1.00000 3rd Qu.: 1.0000 3rd Qu.: 2.000 3rd Qu.: 3.00 - ## Max. : 4.0000 Max. : 4.000 Max. :12.0000 Max. : 2.00000 Max. : 2.0000 Max. : 5.000 Max. : 5.00 - ## - ## V201229 V201230 V201231x PartyID V201233 TrustGovernment V201237 TrustPeople V201507x - ## Min. :-9.0000 Min. :-9.00000 Min. :-9.000 Strong democrat :1796 Min. :-9.000 Always : 80 Min. :-9.00 Always : 48 Min. :-9.00 - ## 1st Qu.:-1.0000 1st Qu.:-1.00000 1st Qu.: 2.000 Strong republican :1545 1st Qu.: 3.000 Most of the time :1016 1st Qu.: 2.00 Most of the time :3511 1st Qu.:35.00 - ## Median : 1.0000 Median :-1.00000 Median : 4.000 Independent-democrat : 881 Median : 4.000 About half the time:2313 Median : 3.00 About half the time:2020 Median :51.00 - ## Mean : 0.5154 Mean : 0.01302 Mean : 3.834 Independent : 876 Mean : 3.429 Some of the time :3313 Mean : 2.78 Some of the time :1597 Mean :49.43 - ## 3rd Qu.: 1.0000 3rd Qu.: 1.00000 3rd Qu.: 6.000 Not very strong democrat: 790 3rd Qu.: 4.000 Never : 702 3rd Qu.: 3.00 Never : 264 3rd Qu.:66.00 - ## Max. : 2.0000 Max. : 3.00000 Max. : 7.000 (Other) :1540 Max. : 5.000 NA's : 29 Max. : 5.00 NA's : 13 Max. :80.00 - ## NA's : 25 - ## Age AgeGroup V201510 Education V201546 V201547a V201547b V201547c V201547d V201547e V201547z V201549x - ## Min. :18.00 18-29 : 871 Min. :-9.000 Less than HS: 312 Min. :-9.000 Min. :-3 Min. :-3 Min. :-3 Min. :-3 Min. :-3 Min. :-3 Min. :-9.000 - ## 1st Qu.:37.00 30-39 :1241 1st Qu.: 3.000 High school :1160 1st Qu.: 2.000 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.: 1.000 - ## Median :53.00 40-49 :1081 Median : 5.000 Post HS :2514 Median : 2.000 Median :-3 Median :-3 Median :-3 Median :-3 Median :-3 Median :-3 Median : 1.000 - ## Mean :51.83 50-59 :1200 Mean : 5.621 Bachelor's :1877 Mean : 1.841 Mean :-3 Mean :-3 Mean :-3 Mean :-3 Mean :-3 Mean :-3 Mean : 1.499 - ## 3rd Qu.:66.00 60-69 :1436 3rd Qu.: 6.000 Graduate :1474 3rd Qu.: 2.000 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.: 2.000 - ## Max. :80.00 70 or older:1330 Max. :95.000 NA's : 116 Max. : 2.000 Max. :-3 Max. :-3 Max. :-3 Max. :-3 Max. :-3 Max. :-3 Max. : 6.000 - ## NA's :294 NA's : 294 - ## RaceEth V201600 Gender V201607 V201610 V201611 V201613 V201615 V201616 V201617x Income - ## White :5420 Min. :-9.000 Male :3375 Min. :-3 Min. :-3 Min. :-3 Min. :-3 Min. :-3 Min. :-3 Min. :-9.00 Under $9,999 : 647 - ## Black : 650 1st Qu.: 1.000 Female:4027 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.:-3 1st Qu.: 4.00 $50,000-59,999 : 485 - ## Hispanic : 662 Median : 2.000 NA's : 51 Median :-3 Median :-3 Median :-3 Median :-3 Median :-3 Median :-3 Median :11.00 $100,000-109,999: 451 - ## Asian, NH/PI : 248 Mean : 1.472 Mean :-3 Mean :-3 Mean :-3 Mean :-3 Mean :-3 Mean :-3 Mean :10.36 $250,000 or more: 405 - ## AI/AN : 155 3rd Qu.: 2.000 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:-3 3rd Qu.:17.00 $80,000-89,999 : 383 - ## Other/multiple race: 237 Max. : 2.000 Max. :-3 Max. :-3 Max. :-3 Max. :-3 Max. :-3 Max. :-3 Max. :22.00 (Other) :4565 - ## NA's : 81 NA's : 517 - ## Income7 V202051 V202066 V202072 VotedPres2020 V202073 V202109x V202110x VotedPres2020_selection EarlyVote2020 - ## $125k or more:1468 Min. :-9.0000 Min. :-9.000 Min. :-9.0000 Yes :5952 Min. :-9.0000 Min. :-2.0000 Min. :-9.0000 Biden:3509 Yes : 371 - ## Under $20k :1076 1st Qu.:-1.0000 1st Qu.: 4.000 1st Qu.: 1.0000 No : 77 1st Qu.: 1.0000 1st Qu.: 1.0000 1st Qu.: 1.0000 Trump:2567 No :5949 - ## $20-40k :1051 Median :-1.0000 Median : 4.000 Median : 1.0000 NA's:1424 Median : 1.0000 Median : 1.0000 Median : 1.0000 Other: 158 NA's:1133 - ## $40-60k : 984 Mean :-0.7259 Mean : 3.402 Mean : 0.6234 Mean : 0.9415 Mean : 0.8578 Mean : 0.9902 NA's :1219 - ## $60-80k : 920 3rd Qu.:-1.0000 3rd Qu.: 4.000 3rd Qu.: 1.0000 3rd Qu.: 2.0000 3rd Qu.: 1.0000 3rd Qu.: 2.0000 - ## (Other) :1437 Max. : 3.0000 Max. : 4.000 Max. : 2.0000 Max. :12.0000 Max. : 1.0000 Max. : 5.0000 - ## NA's : 517 - -``` r -anes_der_tmp_loc <- here("osf_dl", "anes_2020.rds") -write_rds(anes_ord, anes_der_tmp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=anes_der_tmp_loc, conflicts="overwrite") -``` - - ## # A tibble: 1 × 3 - ## name id meta - ## - ## 1 anes_2020.rds 647d2affa8dbe909c6cb5482 - -``` r -unlink(anes_der_tmp_loc) -``` diff --git a/DataCleaningScripts/LAPOP_2021_DataPrep.Rmd b/DataCleaningScripts/LAPOP_2021_DataPrep.Rmd index 609720b0..8457d6ab 100644 --- a/DataCleaningScripts/LAPOP_2021_DataPrep.Rmd +++ b/DataCleaningScripts/LAPOP_2021_DataPrep.Rmd @@ -13,13 +13,6 @@ knitr::opts_chunk$set(echo = TRUE) All data and resources were downloaded from http://datasets.americasbarometer.org/database/ on May 7, 2023. -```{r} -#| label: loadpackageh -#| message: FALSE - -library(here) #easy relative paths -``` - ```{r} #| label: loadpackages @@ -39,7 +32,7 @@ stata_files <- osf_retrieve_node("https://osf.io/z5c3m/") %>% read_stata_unlabeled <- function(osf_tbl_i){ filedet <- osf_tbl_i %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) + osf_download(conflicts="overwrite", path=here::here("osf_dl")) tibin <- filedet %>% pull(local_path) %>% @@ -77,9 +70,9 @@ lapop <- lapop_in %>% summary(lapop) -dir.create(here("osf_dl", "LAPOP_2021")) +dir.create(here::here("osf_dl", "LAPOP_2021")) -lapop_temp_loc <- here("osf_dl", "LAPOP_2021", "lapop_2021.rds") +lapop_temp_loc <- here::here("osf_dl", "LAPOP_2021", "lapop_2021.rds") write_rds(lapop, lapop_temp_loc) @@ -87,7 +80,7 @@ write_rds(lapop, lapop_temp_loc) target_dir <- osf_retrieve_node("https://osf.io/z5c3m/") -osf_upload(target_dir, path=here("osf_dl", "LAPOP_2021"), conflicts="overwrite") +osf_upload(target_dir, path=here::here("osf_dl", "LAPOP_2021"), conflicts="overwrite") unlink(lapop_temp_loc) ``` diff --git a/DataCleaningScripts/LAPOP_2021_DataPrep.md b/DataCleaningScripts/LAPOP_2021_DataPrep.md index 5bc6db9f..41347bba 100644 --- a/DataCleaningScripts/LAPOP_2021_DataPrep.md +++ b/DataCleaningScripts/LAPOP_2021_DataPrep.md @@ -6,10 +6,6 @@ AmericasBarometer 2021 All data and resources were downloaded from on May 7, 2023. -``` r -library(here) #easy relative paths -``` - ``` r library(tidyverse) #data manipulation library(haven) #data import @@ -25,7 +21,7 @@ stata_files <- osf_retrieve_node("https://osf.io/z5c3m/") %>% read_stata_unlabeled <- function(osf_tbl_i){ filedet <- osf_tbl_i %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) + osf_download(conflicts="overwrite", path=here::here("osf_dl")) tibin <- filedet %>% pull(local_path) %>% @@ -61,47 +57,64 @@ lapop <- lapop_in %>% summary(lapop) ``` - ## pais strata upm weight1500 core_a_core_b q2 q1tb covid2at - ## Min. : 1.00 Min. :1.000e+08 Min. :1.001e+07 Min. :0.004136 Length:64352 Min. : 16.00 Min. :1.000 Min. :1.000 - ## 1st Qu.: 6.00 1st Qu.:6.000e+08 1st Qu.:6.153e+07 1st Qu.:0.251556 Class :character 1st Qu.: 27.00 1st Qu.:1.000 1st Qu.:1.000 - ## Median :11.00 Median :1.100e+09 Median :1.202e+08 Median :0.417251 Mode :character Median : 36.00 Median :2.000 Median :2.000 - ## Mean :13.03 Mean :1.303e+09 Mean :1.666e+08 Mean :0.512805 Mean : 38.86 Mean :1.521 Mean :2.076 - ## 3rd Qu.:17.00 3rd Qu.:1.700e+09 3rd Qu.:2.105e+08 3rd Qu.:0.674477 3rd Qu.: 49.00 3rd Qu.:2.000 3rd Qu.:3.000 - ## Max. :41.00 Max. :4.100e+09 Max. :1.135e+09 Max. :7.024495 Max. :121.00 Max. :3.000 Max. :4.000 - ## NA's :90 NA's :90 NA's :6686 - ## a4 idio2 idio2cov it1 jc13 m1 mil10a mil10e ccch1 - ## Min. : 1.00 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00 - ## 1st Qu.: 3.00 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:1.00 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:1.00 - ## Median : 22.00 Median :3.000 Median :1.000 Median :2.000 Median :2.00 Median :3.00 Median :3.00 Median :2.00 Median :1.00 - ## Mean : 36.73 Mean :2.439 Mean :1.242 Mean :2.275 Mean :1.62 Mean :2.98 Mean :2.72 Mean :2.39 Mean :1.78 - ## 3rd Qu.: 71.00 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:2.00 3rd Qu.:4.00 3rd Qu.:3.00 3rd Qu.:3.00 3rd Qu.:2.00 - ## Max. :865.00 Max. :3.000 Max. :2.000 Max. :4.000 Max. :2.00 Max. :5.00 Max. :4.00 Max. :4.00 Max. :4.00 - ## NA's :4965 NA's :2766 NA's :31580 NA's :3631 NA's :50827 NA's :33238 NA's :49939 NA's :44021 NA's :50535 - ## ccch3 ccus1 ccus3 edr ocup4a q14 q11n q12c q12bn - ## Min. :1.00 Min. :1.00 Min. :1.00 Min. :0.000 Min. :1.000 Min. :1.0 Min. :1.000 Min. : 1.000 Min. : 0.000 - ## 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:1.000 1st Qu.: 3.000 1st Qu.: 0.000 - ## Median :2.00 Median :1.00 Median :2.00 Median :2.000 Median :1.000 Median :2.0 Median :2.000 Median : 4.000 Median : 1.000 - ## Mean :1.82 Mean :1.58 Mean :1.76 Mean :2.192 Mean :2.627 Mean :1.6 Mean :2.214 Mean : 4.036 Mean : 1.001 - ## 3rd Qu.:2.00 3rd Qu.:2.00 3rd Qu.:2.00 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:2.0 3rd Qu.:3.000 3rd Qu.: 5.000 3rd Qu.: 2.000 - ## Max. :3.00 Max. :4.00 Max. :3.00 Max. :3.000 Max. :7.000 Max. :2.0 Max. :7.000 Max. :20.000 Max. :16.000 - ## NA's :51961 NA's :50028 NA's :51226 NA's :4114 NA's :29505 NA's :44130 NA's :31198 NA's :29144 NA's :29449 - ## covidedu1_1 covidedu1_2 covidedu1_3 covidedu1_4 covidedu1_5 gi0n r15 r18n r18 - ## Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :1.000 Min. :0.000 Min. :0.000 Min. :0.000 - ## 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:1.000 - ## Median :0.00 Median :0.00 Median :1.00 Median :0.00 Median :0.00 Median :1.000 Median :1.000 Median :1.000 Median :1.000 - ## Mean :0.17 Mean :0.07 Mean :0.62 Mean :0.12 Mean :0.08 Mean :1.646 Mean :0.513 Mean :0.537 Mean :0.815 - ## 3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.:1.00 3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 - ## Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :5.000 Max. :1.000 Max. :1.000 Max. :1.000 - ## NA's :51297 NA's :51297 NA's :51297 NA's :51297 NA's :51297 NA's :1240 NA's :4118 NA's :4386 NA's :4249 + ## pais strata upm weight1500 core_a_core_b + ## Min. : 1.00 Min. :1.000e+08 Min. :1.001e+07 Min. :0.004136 Length:64352 + ## 1st Qu.: 6.00 1st Qu.:6.000e+08 1st Qu.:6.153e+07 1st Qu.:0.251556 Class :character + ## Median :11.00 Median :1.100e+09 Median :1.202e+08 Median :0.417251 Mode :character + ## Mean :13.03 Mean :1.303e+09 Mean :1.666e+08 Mean :0.512805 + ## 3rd Qu.:17.00 3rd Qu.:1.700e+09 3rd Qu.:2.105e+08 3rd Qu.:0.674477 + ## Max. :41.00 Max. :4.100e+09 Max. :1.135e+09 Max. :7.024495 + ## + ## q2 q1tb covid2at a4 idio2 idio2cov + ## Min. : 16.00 Min. :1.000 Min. :1.000 Min. : 1.00 Min. :1.000 Min. :1.000 + ## 1st Qu.: 27.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 3.00 1st Qu.:2.000 1st Qu.:1.000 + ## Median : 36.00 Median :2.000 Median :2.000 Median : 22.00 Median :3.000 Median :1.000 + ## Mean : 38.86 Mean :1.521 Mean :2.076 Mean : 36.73 Mean :2.439 Mean :1.242 + ## 3rd Qu.: 49.00 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.: 71.00 3rd Qu.:3.000 3rd Qu.:1.000 + ## Max. :121.00 Max. :3.000 Max. :4.000 Max. :865.00 Max. :3.000 Max. :2.000 + ## NA's :90 NA's :90 NA's :6686 NA's :4965 NA's :2766 NA's :31580 + ## it1 jc13 m1 mil10a mil10e ccch1 + ## Min. :1.000 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00 + ## 1st Qu.:2.000 1st Qu.:1.00 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:1.00 + ## Median :2.000 Median :2.00 Median :3.00 Median :3.00 Median :2.00 Median :1.00 + ## Mean :2.275 Mean :1.62 Mean :2.98 Mean :2.72 Mean :2.39 Mean :1.78 + ## 3rd Qu.:3.000 3rd Qu.:2.00 3rd Qu.:4.00 3rd Qu.:3.00 3rd Qu.:3.00 3rd Qu.:2.00 + ## Max. :4.000 Max. :2.00 Max. :5.00 Max. :4.00 Max. :4.00 Max. :4.00 + ## NA's :3631 NA's :50827 NA's :33238 NA's :49939 NA's :44021 NA's :50535 + ## ccch3 ccus1 ccus3 edr ocup4a q14 + ## Min. :1.00 Min. :1.00 Min. :1.00 Min. :0.000 Min. :1.000 Min. :1.0 + ## 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.0 + ## Median :2.00 Median :1.00 Median :2.00 Median :2.000 Median :1.000 Median :2.0 + ## Mean :1.82 Mean :1.58 Mean :1.76 Mean :2.192 Mean :2.627 Mean :1.6 + ## 3rd Qu.:2.00 3rd Qu.:2.00 3rd Qu.:2.00 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:2.0 + ## Max. :3.00 Max. :4.00 Max. :3.00 Max. :3.000 Max. :7.000 Max. :2.0 + ## NA's :51961 NA's :50028 NA's :51226 NA's :4114 NA's :29505 NA's :44130 + ## q11n q12c q12bn covidedu1_1 covidedu1_2 covidedu1_3 + ## Min. :1.000 Min. : 1.000 Min. : 0.000 Min. :0.00 Min. :0.00 Min. :0.00 + ## 1st Qu.:1.000 1st Qu.: 3.000 1st Qu.: 0.000 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00 + ## Median :2.000 Median : 4.000 Median : 1.000 Median :0.00 Median :0.00 Median :1.00 + ## Mean :2.214 Mean : 4.036 Mean : 1.001 Mean :0.17 Mean :0.07 Mean :0.62 + ## 3rd Qu.:3.000 3rd Qu.: 5.000 3rd Qu.: 2.000 3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.:1.00 + ## Max. :7.000 Max. :20.000 Max. :16.000 Max. :1.00 Max. :1.00 Max. :1.00 + ## NA's :31198 NA's :29144 NA's :29449 NA's :51297 NA's :51297 NA's :51297 + ## covidedu1_4 covidedu1_5 gi0n r15 r18n r18 + ## Min. :0.00 Min. :0.00 Min. :1.000 Min. :0.000 Min. :0.000 Min. :0.000 + ## 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:1.000 + ## Median :0.00 Median :0.00 Median :1.000 Median :1.000 Median :1.000 Median :1.000 + ## Mean :0.12 Mean :0.08 Mean :1.646 Mean :0.513 Mean :0.537 Mean :0.815 + ## 3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 + ## Max. :1.00 Max. :1.00 Max. :5.000 Max. :1.000 Max. :1.000 Max. :1.000 + ## NA's :51297 NA's :51297 NA's :1240 NA's :4118 NA's :4386 NA's :4249 ``` r -dir.create(here("osf_dl", "LAPOP_2021")) +dir.create(here::here("osf_dl", "LAPOP_2021")) ``` - ## Warning in dir.create(here("osf_dl", "LAPOP_2021")): 'C:\Users\steph\Documents\GitHub\tidy-survey-book\osf_dl\LAPOP_2021' already exists + ## Warning in dir.create(here::here("osf_dl", "LAPOP_2021")): + ## 'C:\Users\steph\Documents\GitHub\tidy-survey-book\osf_dl\LAPOP_2021' already exists ``` r -lapop_temp_loc <- here("osf_dl", "LAPOP_2021", "lapop_2021.rds") +lapop_temp_loc <- here::here("osf_dl", "LAPOP_2021", "lapop_2021.rds") write_rds(lapop, lapop_temp_loc) @@ -109,7 +122,7 @@ write_rds(lapop, lapop_temp_loc) target_dir <- osf_retrieve_node("https://osf.io/z5c3m/") -osf_upload(target_dir, path=here("osf_dl", "LAPOP_2021"), conflicts="overwrite") +osf_upload(target_dir, path=here::here("osf_dl", "LAPOP_2021"), conflicts="overwrite") ``` ## Searching for conflicting files on OSF diff --git a/DataCleaningScripts/NCVS_2021_DataPrep.Rmd b/DataCleaningScripts/NCVS_2021_DataPrep.Rmd deleted file mode 100644 index c1cbe78d..00000000 --- a/DataCleaningScripts/NCVS_2021_DataPrep.Rmd +++ /dev/null @@ -1,158 +0,0 @@ ---- -title: "National Crime Victimization Survey (NCVS) 2021 Data Prep" -output: - github_document: - html_preview: false -bibliography: ../book.bib ---- - -```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) -``` - -## Data information - -Complete data is not stored on this repository but can be obtained on [ICPSR](https://www.icpsr.umich.edu/web/ICPSR/studies/38429) by downloading the R version of data files (@ncvs_data_2021). The files used here are from Version 1 and were downloaded on March 11, 2023. - -This script selects a subset of columns of several files and only retains those on this repository. - -```{r} -#| label: loadpackageh -#| message: FALSE -library(here) #easy relative paths -``` - -```{r} -#| label: loadpackages -library(tidyverse) #data manipulation -library(tidylog) #informative logging messages -library(osfr) -``` - -## Incident data file - -```{r} -#| label: incfile - -inc_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="NCVS_2021/DS0004") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -incfiles <- load(pull(inc_file_osf_det, local_path), verbose=TRUE) - -inc_in <- get(incfiles) %>% - as_tibble() - -unlink(pull(inc_file_osf_det, local_path)) - -make_num_fact <- function(x){ - xchar <- sub("^\\(0*([0-9]+)\\).+$", "\\1", x) - xnum <- as.numeric(xchar) - fct_reorder(xchar, xnum, .na_rm = TRUE) -} - -inc_slim <- inc_in %>% - select( - YEARQ, IDHH, IDPER, V4012, WGTVICCY, # identifiers and weight - num_range("V", 4016:4019), # series crime information - V4021B, V4022, V4024, # time of incident, location of incident (macro and micro) - num_range("V", 4049:4058), #weapon type - V4234, V4235, num_range("V", 4241:4245), V4248, num_range("V", 4256:4278), starts_with("V4277"), # victim-offender relationship - V4399, # report to police - V4529 # type of crime - ) %>% - mutate( - IDHH=as.character(IDHH), - IDPER=as.character(IDPER), - across(where(is.factor), make_num_fact) - ) - -summary(inc_slim) - -inc_temp_loc <- here("osf_dl", "ncvs_2021_incident.rds") -write_rds(inc_slim, inc_temp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=inc_temp_loc, conflicts="overwrite") -unlink(inc_temp_loc) -``` - - -## Person data file - -```{r} -#| label: persfile - -pers_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="NCVS_2021/DS0003") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -persfiles <- load(pull(pers_file_osf_det, local_path), verbose=TRUE) - -pers_in <- get(persfiles) %>% - as_tibble() - -unlink(pull(pers_file_osf_det, local_path)) - -pers_slim <- pers_in %>% - select( - YEARQ, IDHH, IDPER, WGTPERCY, # identifiers and weight - V3014, V3015, V3018, V3023A, V3024, V3084, V3086 - # age, marital status, sex, race, hispanic origin, gender, sexual orientation - ) %>% - mutate( - IDHH=as.character(IDHH), - IDPER=as.character(IDPER), - across(where(is.factor), make_num_fact) - ) - -summary(pers_slim) - -pers_temp_loc <- here("osf_dl", "ncvs_2021_person.rds") -write_rds(pers_slim, pers_temp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=pers_temp_loc, conflicts="overwrite") -unlink(pers_temp_loc) - -``` - -## Household data file - - -```{r} -#| label: hhfile - -hh_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="NCVS_2021/DS0002") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -hhfiles <- load(pull(hh_file_osf_det, local_path), verbose=TRUE) - -hh_in <- get(hhfiles) %>% - as_tibble() - -unlink(pull(hh_file_osf_det, local_path)) - -hh_slim <- hh_in %>% - select( - YEARQ, IDHH, WGTHHCY, V2117, V2118, # identifiers, weight, design - V2015, V2143, SC214A, V2122, V2126B, V2127B, V2129 - # tenure, urbanicity, income, family structure, place size, region, msa status - ) %>% - mutate( - IDHH=as.character(IDHH), - across(where(is.factor), make_num_fact) - ) - -summary(hh_slim) - -hh_temp_loc <- here("osf_dl", "ncvs_2021_household.rds") -write_rds(hh_slim, hh_temp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=hh_temp_loc, conflicts="overwrite") -unlink(hh_temp_loc) -``` - -## Resources - -- [USER’S GUIDE TO NATIONAL CRIME VICTIMIZATION SURVEY (NCVS) DIRECT VARIANCE ESTIMATION](https://bjs.ojp.gov/sites/g/files/xyckuh236/files/media/document/ncvs_variance_user_guide_11.06.14.pdf) --[Appendix C: Examples in SAS](https://bjs.ojp.gov/sites/g/files/xyckuh236/files/media/document/variance_guide_appendix_c_sas.pdf) \ No newline at end of file diff --git a/DataCleaningScripts/NCVS_2021_DataPrep.md b/DataCleaningScripts/NCVS_2021_DataPrep.md deleted file mode 100644 index 42c839d1..00000000 --- a/DataCleaningScripts/NCVS_2021_DataPrep.md +++ /dev/null @@ -1,456 +0,0 @@ -National Crime Victimization Survey (NCVS) 2021 Data Prep -================ - -## Data information - -Complete data is not stored on this repository but can be obtained on -[ICPSR](https://www.icpsr.umich.edu/web/ICPSR/studies/38429) by -downloading the R version of data files (United States. Bureau of -Justice Statistics (2022)). The files used here are from Version 1 and -were downloaded on March 11, 2023. - -This script selects a subset of columns of several files and only -retains those on this repository. - -``` r -library(here) #easy relative paths -``` - -``` r -library(tidyverse) #data manipulation -library(tidylog) #informative logging messages -library(osfr) -``` - -## Incident data file - -``` r -inc_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="NCVS_2021/DS0004") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -incfiles <- load(pull(inc_file_osf_det, local_path), verbose=TRUE) -``` - - ## Loading objects: - ## da38429.0004 - -``` r -inc_in <- get(incfiles) %>% - as_tibble() - -unlink(pull(inc_file_osf_det, local_path)) - -make_num_fact <- function(x){ - xchar <- sub("^\\(0*([0-9]+)\\).+$", "\\1", x) - xnum <- as.numeric(xchar) - fct_reorder(xchar, xnum, .na_rm = TRUE) -} - -inc_slim <- inc_in %>% - select( - YEARQ, IDHH, IDPER, V4012, WGTVICCY, # identifiers and weight - num_range("V", 4016:4019), # series crime information - V4021B, V4022, V4024, # time of incident, location of incident (macro and micro) - num_range("V", 4049:4058), #weapon type - V4234, V4235, num_range("V", 4241:4245), V4248, num_range("V", 4256:4278), starts_with("V4277"), # victim-offender relationship - V4399, # report to police - V4529 # type of crime - ) %>% - mutate( - IDHH=as.character(IDHH), - IDPER=as.character(IDPER), - across(where(is.factor), make_num_fact) - ) -``` - - ## select: dropped 1,201 variables (V4001, V4002, V4003, V4004, V4005, …) - - ## mutate: converted 'IDHH' from factor to character (0 new NA) - - ## converted 'IDPER' from factor to character (0 new NA) - - ## changed 8,982 values (100%) of 'V4017' (0 new NA) - - ## changed 157 values (2%) of 'V4018' (0 new NA) - - ## changed 153 values (2%) of 'V4019' (0 new NA) - - ## changed 8,982 values (100%) of 'V4021B' (0 new NA) - - ## changed 8,982 values (100%) of 'V4022' (0 new NA) - - ## changed 8,982 values (100%) of 'V4024' (0 new NA) - - ## changed 2,737 values (30%) of 'V4049' (0 new NA) - - ## changed 409 values (5%) of 'V4050' (0 new NA) - - ## changed 409 values (5%) of 'V4051' (0 new NA) - - ## changed 409 values (5%) of 'V4052' (0 new NA) - - ## changed 409 values (5%) of 'V4053' (0 new NA) - - ## changed 409 values (5%) of 'V4054' (0 new NA) - - ## changed 409 values (5%) of 'V4055' (0 new NA) - - ## changed 409 values (5%) of 'V4056' (0 new NA) - - ## changed 409 values (5%) of 'V4057' (0 new NA) - - ## changed 409 values (5%) of 'V4058' (0 new NA) - - ## changed 2,827 values (31%) of 'V4234' (0 new NA) - - ## changed 398 values (4%) of 'V4235' (0 new NA) - - ## changed 2,096 values (23%) of 'V4241' (0 new NA) - - ## changed 920 values (10%) of 'V4242' (0 new NA) - - ## changed 1,246 values (14%) of 'V4243' (0 new NA) - - ## changed 831 values (9%) of 'V4244' (0 new NA) - - ## changed 1,075 values (12%) of 'V4245' (0 new NA) - - ## changed 353 values (4%) of 'V4256' (0 new NA) - - ## changed 231 values (3%) of 'V4257' (0 new NA) - - ## changed 139 values (2%) of 'V4258' (0 new NA) - - ## changed 139 values (2%) of 'V4259' (0 new NA) - - ## changed 139 values (2%) of 'V4260' (0 new NA) - - ## changed 139 values (2%) of 'V4261' (0 new NA) - - ## changed 139 values (2%) of 'V4262' (0 new NA) - - ## changed 181 values (2%) of 'V4263' (0 new NA) - - ## changed 104 values (1%) of 'V4264' (0 new NA) - - ## changed 104 values (1%) of 'V4265' (0 new NA) - - ## changed 104 values (1%) of 'V4266' (0 new NA) - - ## changed 104 values (1%) of 'V4267' (0 new NA) - - ## changed 104 values (1%) of 'V4268' (0 new NA) - - ## changed 104 values (1%) of 'V4269' (0 new NA) - - ## changed 104 values (1%) of 'V4270' (0 new NA) - - ## changed 104 values (1%) of 'V4271' (0 new NA) - - ## changed 104 values (1%) of 'V4272' (0 new NA) - - ## changed 104 values (1%) of 'V4273' (0 new NA) - - ## changed 104 values (1%) of 'V4274' (0 new NA) - - ## changed 104 values (1%) of 'V4275' (0 new NA) - - ## changed 104 values (1%) of 'V4276' (0 new NA) - - ## changed 104 values (1%) of 'V4277' (0 new NA) - - ## changed 104 values (1%) of 'V4278' (0 new NA) - - ## changed 104 values (1%) of 'V4277A' (0 new NA) - - ## changed 104 values (1%) of 'V4277B' (0 new NA) - - ## changed 104 values (1%) of 'V4277C' (0 new NA) - - ## changed 104 values (1%) of 'V4277D' (0 new NA) - - ## changed 104 values (1%) of 'V4277E' (0 new NA) - - ## changed 8,982 values (100%) of 'V4399' (0 new NA) - - ## changed 8,982 values (100%) of 'V4529' (0 new NA) - -``` r -summary(inc_slim) -``` - - ## YEARQ IDHH IDPER V4012 WGTVICCY V4016 - ## Min. :2021 Length:8982 Length:8982 Min. :1.000 Min. : 221.6 Min. : 1.000 - ## 1st Qu.:2021 Class :character Class :character 1st Qu.:1.000 1st Qu.: 867.3 1st Qu.: 1.000 - ## Median :2021 Mode :character Mode :character Median :1.000 Median : 1352.3 Median : 1.000 - ## Mean :2021 Mean :1.179 Mean : 1674.9 Mean : 4.324 - ## 3rd Qu.:2021 3rd Qu.:1.000 3rd Qu.: 2217.4 3rd Qu.: 1.000 - ## Max. :2021 Max. :7.000 Max. :10106.2 Max. :998.000 - ## - ## V4017 V4018 V4019 V4021B V4022 V4024 V4049 V4050 V4051 - ## 1:8825 1 : 127 1 : 10 7 :1855 1: 34 5 :3210 1 : 409 1 : 380 0 : 278 - ## 2: 131 2 : 4 2 : 117 9 :1217 2: 65 1 :1481 2 :1803 3 : 26 1 : 131 - ## 8: 26 8 : 26 8 : 26 2 :1145 3:7697 7 : 727 3 : 525 7 : 3 NA's:8573 - ## NA's:8825 NA's:8829 3 : 940 4:1143 21 : 453 NA's:6245 NA's:8573 - ## 8 : 856 5: 39 16 : 449 - ## 4 : 833 8: 4 6 : 429 - ## (Other):2136 (Other):2233 - ## V4052 V4053 V4054 V4055 V4056 V4057 V4058 V4234 V4235 - ## 0 : 390 0 : 334 0 : 394 0 : 302 0 : 360 0 : 406 0 : 380 1 :2076 1 : 20 - ## 1 : 19 1 : 75 1 : 15 1 : 107 1 : 49 1 : 3 8 : 29 2 : 353 2 : 291 - ## NA's:8573 NA's:8573 NA's:8573 NA's:8573 NA's:8573 NA's:8573 NA's:8573 3 : 311 8 : 87 - ## 8 : 87 NA's:8584 - ## NA's:6155 - ## - ## - ## V4241 V4242 V4243 V4244 V4245 V4248 V4256 V4257 - ## 1 :1176 1 : 326 1 : 171 1 : 307 7 : 149 Min. : 2.000 1 : 83 1 : 65 - ## 2 : 793 2 : 240 2 : 292 2 : 424 8 : 139 1st Qu.: 2.000 2 : 37 2 : 63 - ## 3 : 57 3 : 271 3 : 701 3 : 4 11 : 137 Median : 2.000 3 : 194 3 : 85 - ## 8 : 70 8 : 83 6 : 1 8 : 96 13 : 114 Mean : 7.992 4 : 20 8 : 18 - ## NA's:6886 NA's:8062 8 : 81 NA's:8151 98 : 85 3rd Qu.: 3.000 6 : 2 NA's:8751 - ## NA's:7736 (Other): 451 Max. :98.000 8 : 17 - ## NA's :7907 NA's :8629 NA's:8629 - ## V4258 V4259 V4260 V4261 V4262 V4263 V4264 V4265 V4266 - ## 1 : 122 0 : 85 0 : 76 0 : 77 0 : 122 1 : 65 1 : 87 0 : 87 0 : 83 - ## 8 : 17 1 : 37 1 : 46 1 : 45 8 : 17 2 : 98 8 : 17 8 : 17 1 : 4 - ## NA's:8843 8 : 17 8 : 17 8 : 17 NA's:8843 8 : 18 NA's:8878 NA's:8878 8 : 17 - ## NA's:8843 NA's:8843 NA's:8843 NA's:8801 NA's:8878 - ## - ## - ## - ## V4267 V4268 V4269 V4270 V4271 V4272 V4273 V4274 V4275 - ## 0 : 82 0 : 84 0 : 84 0 : 83 0 : 84 0 : 66 0 : 81 0 : 84 0 : 64 - ## 1 : 5 1 : 3 1 : 3 1 : 4 1 : 3 1 : 21 1 : 6 1 : 3 1 : 23 - ## 8 : 17 8 : 17 8 : 17 8 : 17 8 : 17 8 : 17 8 : 17 8 : 17 8 : 17 - ## NA's:8878 NA's:8878 NA's:8878 NA's:8878 NA's:8878 NA's:8878 NA's:8878 NA's:8878 NA's:8878 - ## - ## - ## - ## V4276 V4277 V4278 V4277A V4277B V4277C V4277D V4277E V4399 - ## 0 : 85 0 : 62 0 : 85 0 : 87 0 : 87 0 : 87 0 : 84 0 : 87 1:3175 - ## 1 : 2 1 : 25 8 : 19 8 : 17 8 : 17 8 : 17 1 : 3 8 : 17 2:5692 - ## 8 : 17 8 : 17 NA's:8878 NA's:8878 NA's:8878 NA's:8878 8 : 17 NA's:8878 3: 103 - ## NA's:8878 NA's:8878 NA's:8878 8: 12 - ## - ## - ## - ## V4529 - ## 56 :1689 - ## 57 :1431 - ## 55 :1011 - ## 58 : 799 - ## 32 : 637 - ## 20 : 609 - ## (Other):2806 - -``` r -inc_temp_loc <- here("osf_dl", "ncvs_2021_incident.rds") -write_rds(inc_slim, inc_temp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=inc_temp_loc, conflicts="overwrite") -``` - - ## # A tibble: 1 × 3 - ## name id meta - ## - ## 1 ncvs_2021_incident.rds 647cfbcd85df4808fa7753f2 - -``` r -unlink(inc_temp_loc) -``` - -## Person data file - -``` r -pers_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="NCVS_2021/DS0003") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -persfiles <- load(pull(pers_file_osf_det, local_path), verbose=TRUE) -``` - - ## Loading objects: - ## da38429.0003 - -``` r -pers_in <- get(persfiles) %>% - as_tibble() - -unlink(pull(pers_file_osf_det, local_path)) - -pers_slim <- pers_in %>% - select( - YEARQ, IDHH, IDPER, WGTPERCY, # identifiers and weight - V3014, V3015, V3018, V3023A, V3024, V3084, V3086 - # age, marital status, sex, race, hispanic origin, gender, sexual orientation - ) %>% - mutate( - IDHH=as.character(IDHH), - IDPER=as.character(IDPER), - across(where(is.factor), make_num_fact) - ) -``` - - ## select: dropped 418 variables (V3001, V3002, V3003, V3004, V3005, …) - - ## mutate: converted 'IDHH' from factor to character (0 new NA) - - ## converted 'IDPER' from factor to character (0 new NA) - - ## changed 291,878 values (100%) of 'V3015' (0 new NA) - - ## changed 291,878 values (100%) of 'V3018' (0 new NA) - - ## changed 291,878 values (100%) of 'V3023A' (0 new NA) - - ## changed 291,878 values (100%) of 'V3024' (0 new NA) - - ## changed 216,287 values (74%) of 'V3084' (0 new NA) - - ## changed 216,287 values (74%) of 'V3086' (0 new NA) - -``` r -summary(pers_slim) -``` - - ## YEARQ IDHH IDPER WGTPERCY V3014 V3015 V3018 - ## Min. :2021 Length:291878 Length:291878 Min. : 0.0 Min. :12.00 1:148131 1:140922 - ## 1st Qu.:2021 Class :character Class :character 1st Qu.: 432.2 1st Qu.:31.00 2: 17668 2:150956 - ## Median :2021 Mode :character Mode :character Median : 791.5 Median :48.00 3: 28596 - ## Mean :2021 Mean : 956.5 Mean :47.57 4: 4524 - ## 3rd Qu.:2021 3rd Qu.: 1397.4 3rd Qu.:64.00 5: 90425 - ## Max. :2021 Max. :10691.5 Max. :90.00 8: 2534 - ## - ## V3023A V3024 V3084 V3086 - ## 1 :236785 1: 41450 8 :151725 1 : 29733 - ## 2 : 30972 2:249306 2 : 61108 2 : 34489 - ## 4 : 16337 8: 1122 6 : 1477 3 : 56 - ## 3 : 1776 1 : 924 4 : 115 - ## 6 : 1590 3 : 611 8 :151894 - ## 7 : 1465 (Other): 442 NA's: 75591 - ## (Other): 2953 NA's : 75591 - -``` r -pers_temp_loc <- here("osf_dl", "ncvs_2021_person.rds") -write_rds(pers_slim, pers_temp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=pers_temp_loc, conflicts="overwrite") -``` - - ## # A tibble: 1 × 3 - ## name id meta - ## - ## 1 ncvs_2021_person.rds 647cfe9ba8dbe909bacb51bf - -``` r -unlink(pers_temp_loc) -``` - -## Household data file - -``` r -hh_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="NCVS_2021/DS0002") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -hhfiles <- load(pull(hh_file_osf_det, local_path), verbose=TRUE) -``` - - ## Loading objects: - ## da38429.0002 - -``` r -hh_in <- get(hhfiles) %>% - as_tibble() - -unlink(pull(hh_file_osf_det, local_path)) - -hh_slim <- hh_in %>% - select( - YEARQ, IDHH, WGTHHCY, V2117, V2118, # identifiers, weight, design - V2015, V2143, SC214A, V2122, V2126B, V2127B, V2129 - # tenure, urbanicity, income, family structure, place size, region, msa status - ) %>% - mutate( - IDHH=as.character(IDHH), - across(where(is.factor), make_num_fact) - ) -``` - - ## select: dropped 440 variables (V2001, V2002, V2003, V2004, V2005, …) - - ## mutate: converted 'IDHH' from factor to character (0 new NA) - - ## changed 150,138 values (59%) of 'V2015' (0 new NA) - - ## changed 256,460 values (100%) of 'V2143' (0 new NA) - - ## changed 253,779 values (99%) of 'SC214A' (0 new NA) - - ## changed 256,460 values (100%) of 'V2122' (0 new NA) - - ## changed 256,460 values (100%) of 'V2126B' (0 new NA) - - ## changed 256,460 values (100%) of 'V2127B' (0 new NA) - - ## changed 256,460 values (100%) of 'V2129' (0 new NA) - -``` r -summary(hh_slim) -``` - - ## YEARQ IDHH WGTHHCY V2117 V2118 V2015 V2143 - ## Min. :2021 Length:256460 Min. : 0.0 Min. : 1.00 Min. :1.000 1 :101944 1: 26878 - ## 1st Qu.:2021 Class :character 1st Qu.: 0.0 1st Qu.: 24.00 1st Qu.:1.000 2 : 46269 2:173491 - ## Median :2021 Mode :character Median : 399.4 Median : 48.00 Median :2.000 3 : 1925 3: 56091 - ## Mean :2021 Mean : 504.2 Mean : 58.85 Mean :1.526 NA's:106322 - ## 3rd Qu.:2021 3rd Qu.: 829.1 3rd Qu.: 88.00 3rd Qu.:2.000 - ## Max. :2021 Max. :4515.8 Max. :160.00 Max. :3.000 - ## - ## SC214A V2122 V2126B V2127B V2129 - ## 13 : 44601 33 :106322 0 :69484 1:41585 1: 80895 - ## 16 : 34287 32 : 28306 16 :53002 2:74666 2:135438 - ## 15 : 33353 16 : 23617 13 :39873 3:87783 3: 40127 - ## 12 : 23282 8 : 21383 17 :27205 4:52426 - ## 18 : 16892 24 : 15629 18 :24461 - ## (Other):101364 4 : 10477 20 :15194 - ## NA's : 2681 (Other): 50726 (Other):27241 - -``` r -hh_temp_loc <- here("osf_dl", "ncvs_2021_household.rds") -write_rds(hh_slim, hh_temp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=hh_temp_loc, conflicts="overwrite") -``` - - ## # A tibble: 1 × 3 - ## name id meta - ## - ## 1 ncvs_2021_household.rds 647cfe323c3a380880a046d8 - -``` r -unlink(hh_temp_loc) -``` - -## Resources - -- [USER’S GUIDE TO NATIONAL CRIME VICTIMIZATION SURVEY (NCVS) DIRECT - VARIANCE - ESTIMATION](https://bjs.ojp.gov/sites/g/files/xyckuh236/files/media/document/ncvs_variance_user_guide_11.06.14.pdf) - -[Appendix C: Examples in - SAS](https://bjs.ojp.gov/sites/g/files/xyckuh236/files/media/document/variance_guide_appendix_c_sas.pdf) - -
- -
- -United States. Bureau of Justice Statistics. 2022. “National Crime -Victimization Survey, \[United States\], 2021.” Inter-university -Consortium for Political; Social Research \[distributor\]. -. - -
- -
diff --git a/DataCleaningScripts/RECS_2015_DataPrep.Rmd b/DataCleaningScripts/RECS_2015_DataPrep.Rmd deleted file mode 100644 index 046fe65a..00000000 --- a/DataCleaningScripts/RECS_2015_DataPrep.Rmd +++ /dev/null @@ -1,162 +0,0 @@ ---- -title: "Residential Energy Consumption Survey (RECS) 2015 Data Prep" -output: - github_document: - html_preview: false ---- - -```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) -``` - -## Data information - -All data and resources were downloaded from https://www.eia.gov/consumption/residential/data/2015/index.php?view=microdata on March 3, 2021. - -```{r loadpackageh, message=FALSE} -library(here) #easy relative paths -``` - -```{r loadpackages} -library(tidyverse) #data manipulation -library(haven) #data import -library(tidylog) #informative logging messages -library(osfr) -``` -## Import data and create derived variables - -```{r derivedata} -recs_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="RECS_2015", pattern="csv") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -recs_in <- read_csv(pull(recs_file_osf_det, local_path)) - -unlink(pull(recs_file_osf_det, local_path)) - -recs <- recs_in %>% - select(DOEID, REGIONC, DIVISION, METROMICRO, UATYP10, TYPEHUQ, YEARMADERANGE, HEATHOME, EQUIPMUSE, TEMPHOME, TEMPGONE, TEMPNITE, AIRCOND, USECENAC, TEMPHOMEAC, TEMPGONEAC, TEMPNITEAC, TOTCSQFT, TOTHSQFT, TOTSQFT_EN, TOTUCSQFT, TOTUSQFT, NWEIGHT, starts_with("BRRWT"), CDD30YR, CDD65, CDD80, CLIMATE_REGION_PUB, IECC_CLIMATE_PUB, HDD30YR, HDD65, HDD50, GNDHDD65, BTUEL, DOLLAREL, BTUNG, DOLLARNG, BTULP, DOLLARLP, BTUFO, DOLLARFO, TOTALBTU, TOTALDOL, BTUWOOD=WOODBTU, BTUPELLET=PELLETBTU ) %>% - mutate( - Region=parse_factor( - case_when( - REGIONC==1~"Northeast", - REGIONC==2~"Midwest", - REGIONC==3~"South", - REGIONC==4~"West", - ), levels=c("Northeast", "Midwest", "South", "West")), - Division=parse_factor( - case_when( - DIVISION==1~"New England", - DIVISION==2~"Middle Atlantic", - DIVISION==3~"East North Central", - DIVISION==4~"West North Central", - DIVISION==5~"South Atlantic", - DIVISION==6~"East South Central", - DIVISION==7~"West South Central", - DIVISION==8~"Mountain North", - DIVISION==9~"Mountain South", - DIVISION==10~"Pacific", - ), levels=c("New England", "Middle Atlantic", "East North Central", "West North Central", "South Atlantic", "East South Central", "West South Central", "Mountain North", "Mountain South", "Pacific")), - MSAStatus=fct_recode(METROMICRO, "Metropolitan Statistical Area"="METRO", "Micropolitan Statistical Area"="MICRO", "None"="NONE"), - Urbanicity=parse_factor( - case_when( - UATYP10=="U"~"Urban Area", - UATYP10=="C"~"Urban Cluster", - UATYP10=="R"~"Rural" - ), - levels=c("Urban Area", "Urban Cluster", "Rural") - ), - HousingUnitType=parse_factor( - case_when( - TYPEHUQ==1~"Mobile home", - TYPEHUQ==2~"Single-family detached", - TYPEHUQ==3~"Single-family attached", - TYPEHUQ==4~"Apartment: 2-4 Units", - TYPEHUQ==5~"Apartment: 5 or more units", - ), levels=c("Mobile home", "Single-family detached", "Single-family attached", "Apartment: 2-4 Units", "Apartment: 5 or more units")), - YearMade=parse_factor( - case_when( - YEARMADERANGE==1~"Before 1950", - YEARMADERANGE==2~"1950-1959", - YEARMADERANGE==3~"1960-1969", - YEARMADERANGE==4~"1970-1979", - YEARMADERANGE==5~"1980-1989", - YEARMADERANGE==6~"1990-1999", - YEARMADERANGE==7~"2000-2009", - YEARMADERANGE==8~"2010-2015", - ), - levels=c("Before 1950", "1950-1959", "1960-1969", "1970-1979", "1980-1989", "1990-1999", "2000-2009", "2010-2015"), - ordered = TRUE - ), - SpaceHeatingUsed=as.logical(HEATHOME), - HeatingBehavior=parse_factor( - case_when( - EQUIPMUSE==1~"Set one temp and leave it", - EQUIPMUSE==2~"Manually adjust at night/no one home", - EQUIPMUSE==3~"Program thermostat to change at certain times", - EQUIPMUSE==4~"Turn on or off as needed", - EQUIPMUSE==5~"No control", - EQUIPMUSE==9~"Other", - EQUIPMUSE==-9~NA_character_), - levels=c("Set one temp and leave it", "Manually adjust at night/no one home", "Program thermostat to change at certain times", "Turn on or off as needed", "No control", "Other") - ), - WinterTempDay=if_else(TEMPHOME>0, TEMPHOME, NA_real_), - WinterTempAway=if_else(TEMPGONE>0, TEMPGONE, NA_real_), - WinterTempNight=if_else(TEMPNITE>0, TEMPNITE, NA_real_), - ACUsed=as.logical(AIRCOND), - ACBehavior=parse_factor( - case_when( - USECENAC==1~"Set one temp and leave it", - USECENAC==2~"Manually adjust at night/no one home", - USECENAC==3~"Program thermostat to change at certain times", - USECENAC==4~"Turn on or off as needed", - USECENAC==5~"No control", - USECENAC==-9~NA_character_), - levels=c("Set one temp and leave it", "Manually adjust at night/no one home", "Program thermostat to change at certain times", "Turn on or off as needed", "No control") - ), - SummerTempDay=if_else(TEMPHOMEAC>0, TEMPHOMEAC, NA_real_), - SummerTempAway=if_else(TEMPGONEAC>0, TEMPGONEAC, NA_real_), - SummerTempNight=if_else(TEMPNITEAC>0, TEMPNITEAC, NA_real_), - ClimateRegion_BA=parse_factor(CLIMATE_REGION_PUB), - ClimateRegion_IECC=factor(IECC_CLIMATE_PUB) - - ) - -``` - - -## Check derived variables for correct coding - -```{r checkvars} -recs %>% count(Region, REGIONC) -recs %>% count(Division, DIVISION) -recs %>% count(MSAStatus, METROMICRO) -recs %>% count(Urbanicity, UATYP10) -recs %>% count(HousingUnitType, TYPEHUQ) -recs %>% count(YearMade, YEARMADERANGE) -recs %>% count(SpaceHeatingUsed, HEATHOME) -recs %>% count(HeatingBehavior, EQUIPMUSE) -recs %>% count(ACUsed, AIRCOND) -recs %>% count(ACBehavior, USECENAC) -recs %>% count(ClimateRegion_BA, CLIMATE_REGION_PUB) -recs %>% count(ClimateRegion_IECC, IECC_CLIMATE_PUB) - -``` - -## Save data - -```{r savedat} -recs_out <- recs %>% - select(DOEID, REGIONC, Region, Division, MSAStatus, Urbanicity, HousingUnitType, YearMade, SpaceHeatingUsed, HeatingBehavior, WinterTempDay, WinterTempAway, WinterTempNight, ACUsed, ACBehavior, SummerTempDay, SummerTempAway, SummerTempNight, TOTCSQFT, TOTHSQFT, TOTSQFT_EN, TOTUCSQFT, TOTUSQFT, NWEIGHT, starts_with("BRRWT"), CDD30YR, CDD65, CDD80, ClimateRegion_BA, ClimateRegion_IECC, HDD30YR, HDD65, HDD50, GNDHDD65, BTUEL, DOLLAREL, BTUNG, DOLLARNG, BTULP, DOLLARLP, BTUFO, DOLLARFO, TOTALBTU, TOTALDOL, BTUWOOD, BTUPELLET) - -summary(recs_out) - - -recs_der_tmp_loc <- here("osf_dl", "recs_2015.rds") -write_rds(recs_out, recs_der_tmp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=recs_der_tmp_loc, conflicts="overwrite") -unlink(recs_der_tmp_loc) - -``` - diff --git a/DataCleaningScripts/RECS_2015_DataPrep.md b/DataCleaningScripts/RECS_2015_DataPrep.md deleted file mode 100644 index 863a88b8..00000000 --- a/DataCleaningScripts/RECS_2015_DataPrep.md +++ /dev/null @@ -1,617 +0,0 @@ -Residential Energy Consumption Survey (RECS) 2015 Data Prep -================ - -## Data information - -All data and resources were downloaded from - -on March 3, 2021. - -``` r -library(here) #easy relative paths -``` - -``` r -library(tidyverse) #data manipulation -library(haven) #data import -library(tidylog) #informative logging messages -``` - - ## - ## Attaching package: 'tidylog' - - ## The following objects are masked from 'package:srvyr': - ## - ## anti_join, drop_na, filter, filter_all, filter_at, filter_if, group_by, group_by_all, - ## group_by_at, group_by_if, mutate, mutate_all, mutate_at, mutate_if, rename, rename_all, - ## rename_at, rename_if, rename_with, select, select_all, select_at, select_if, semi_join, - ## summarise, summarise_all, summarise_at, summarise_if, summarize, summarize_all, summarize_at, - ## summarize_if, transmute, ungroup - - ## The following objects are masked from 'package:dplyr': - ## - ## add_count, add_tally, anti_join, count, distinct, distinct_all, distinct_at, distinct_if, - ## filter, filter_all, filter_at, filter_if, full_join, group_by, group_by_all, group_by_at, - ## group_by_if, inner_join, left_join, mutate, mutate_all, mutate_at, mutate_if, relocate, - ## rename, rename_all, rename_at, rename_if, rename_with, right_join, sample_frac, sample_n, - ## select, select_all, select_at, select_if, semi_join, slice, slice_head, slice_max, slice_min, - ## slice_sample, slice_tail, summarise, summarise_all, summarise_at, summarise_if, summarize, - ## summarize_all, summarize_at, summarize_if, tally, top_frac, top_n, transmute, transmute_all, - ## transmute_at, transmute_if, ungroup - - ## The following objects are masked from 'package:tidyr': - ## - ## drop_na, fill, gather, pivot_longer, pivot_wider, replace_na, spread, uncount - - ## The following object is masked from 'package:stats': - ## - ## filter - -``` r -library(osfr) -``` - -## Import data and create derived variables - -``` r -recs_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="RECS_2015", pattern="csv") %>% - osf_download(conflicts="overwrite", path=here("osf_dl")) - -recs_in <- read_csv(pull(recs_file_osf_det, local_path)) -``` - - ## Rows: 5686 Columns: 759 - ## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────── - ## Delimiter: "," - ## chr (4): METROMICRO, UATYP10, CLIMATE_REGION_PUB, IECC_CLIMATE_PUB - ## dbl (755): DOEID, REGIONC, DIVISION, TYPEHUQ, ZTYPEHUQ, CELLAR, ZCELLAR, BASEFIN, ZBASEFIN, ATTIC, ZATTIC, ... - ## - ## ℹ Use `spec()` to retrieve the full column specification for this data. - ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. - -``` r -unlink(pull(recs_file_osf_det, local_path)) - -recs <- recs_in %>% - select(DOEID, REGIONC, DIVISION, METROMICRO, UATYP10, TYPEHUQ, YEARMADERANGE, HEATHOME, EQUIPMUSE, TEMPHOME, TEMPGONE, TEMPNITE, AIRCOND, USECENAC, TEMPHOMEAC, TEMPGONEAC, TEMPNITEAC, TOTCSQFT, TOTHSQFT, TOTSQFT_EN, TOTUCSQFT, TOTUSQFT, NWEIGHT, starts_with("BRRWT"), CDD30YR, CDD65, CDD80, CLIMATE_REGION_PUB, IECC_CLIMATE_PUB, HDD30YR, HDD65, HDD50, GNDHDD65, BTUEL, DOLLAREL, BTUNG, DOLLARNG, BTULP, DOLLARLP, BTUFO, DOLLARFO, TOTALBTU, TOTALDOL, BTUWOOD=WOODBTU, BTUPELLET=PELLETBTU ) %>% - mutate( - Region=parse_factor( - case_when( - REGIONC==1~"Northeast", - REGIONC==2~"Midwest", - REGIONC==3~"South", - REGIONC==4~"West", - ), levels=c("Northeast", "Midwest", "South", "West")), - Division=parse_factor( - case_when( - DIVISION==1~"New England", - DIVISION==2~"Middle Atlantic", - DIVISION==3~"East North Central", - DIVISION==4~"West North Central", - DIVISION==5~"South Atlantic", - DIVISION==6~"East South Central", - DIVISION==7~"West South Central", - DIVISION==8~"Mountain North", - DIVISION==9~"Mountain South", - DIVISION==10~"Pacific", - ), levels=c("New England", "Middle Atlantic", "East North Central", "West North Central", "South Atlantic", "East South Central", "West South Central", "Mountain North", "Mountain South", "Pacific")), - MSAStatus=fct_recode(METROMICRO, "Metropolitan Statistical Area"="METRO", "Micropolitan Statistical Area"="MICRO", "None"="NONE"), - Urbanicity=parse_factor( - case_when( - UATYP10=="U"~"Urban Area", - UATYP10=="C"~"Urban Cluster", - UATYP10=="R"~"Rural" - ), - levels=c("Urban Area", "Urban Cluster", "Rural") - ), - HousingUnitType=parse_factor( - case_when( - TYPEHUQ==1~"Mobile home", - TYPEHUQ==2~"Single-family detached", - TYPEHUQ==3~"Single-family attached", - TYPEHUQ==4~"Apartment: 2-4 Units", - TYPEHUQ==5~"Apartment: 5 or more units", - ), levels=c("Mobile home", "Single-family detached", "Single-family attached", "Apartment: 2-4 Units", "Apartment: 5 or more units")), - YearMade=parse_factor( - case_when( - YEARMADERANGE==1~"Before 1950", - YEARMADERANGE==2~"1950-1959", - YEARMADERANGE==3~"1960-1969", - YEARMADERANGE==4~"1970-1979", - YEARMADERANGE==5~"1980-1989", - YEARMADERANGE==6~"1990-1999", - YEARMADERANGE==7~"2000-2009", - YEARMADERANGE==8~"2010-2015", - ), - levels=c("Before 1950", "1950-1959", "1960-1969", "1970-1979", "1980-1989", "1990-1999", "2000-2009", "2010-2015"), - ordered = TRUE - ), - SpaceHeatingUsed=as.logical(HEATHOME), - HeatingBehavior=parse_factor( - case_when( - EQUIPMUSE==1~"Set one temp and leave it", - EQUIPMUSE==2~"Manually adjust at night/no one home", - EQUIPMUSE==3~"Program thermostat to change at certain times", - EQUIPMUSE==4~"Turn on or off as needed", - EQUIPMUSE==5~"No control", - EQUIPMUSE==9~"Other", - EQUIPMUSE==-9~NA_character_), - levels=c("Set one temp and leave it", "Manually adjust at night/no one home", "Program thermostat to change at certain times", "Turn on or off as needed", "No control", "Other") - ), - WinterTempDay=if_else(TEMPHOME>0, TEMPHOME, NA_real_), - WinterTempAway=if_else(TEMPGONE>0, TEMPGONE, NA_real_), - WinterTempNight=if_else(TEMPNITE>0, TEMPNITE, NA_real_), - ACUsed=as.logical(AIRCOND), - ACBehavior=parse_factor( - case_when( - USECENAC==1~"Set one temp and leave it", - USECENAC==2~"Manually adjust at night/no one home", - USECENAC==3~"Program thermostat to change at certain times", - USECENAC==4~"Turn on or off as needed", - USECENAC==5~"No control", - USECENAC==-9~NA_character_), - levels=c("Set one temp and leave it", "Manually adjust at night/no one home", "Program thermostat to change at certain times", "Turn on or off as needed", "No control") - ), - SummerTempDay=if_else(TEMPHOMEAC>0, TEMPHOMEAC, NA_real_), - SummerTempAway=if_else(TEMPGONEAC>0, TEMPGONEAC, NA_real_), - SummerTempNight=if_else(TEMPNITEAC>0, TEMPNITEAC, NA_real_), - ClimateRegion_BA=parse_factor(CLIMATE_REGION_PUB), - ClimateRegion_IECC=factor(IECC_CLIMATE_PUB) - - ) -``` - - ## select: renamed 2 variables (BTUWOOD, BTUPELLET) and dropped 619 variables - ## mutate: new variable 'Region' (factor) with 4 unique values and 0% NA - ## new variable 'Division' (factor) with 10 unique values and 0% NA - ## new variable 'MSAStatus' (factor) with 3 unique values and 0% NA - ## new variable 'Urbanicity' (factor) with 3 unique values and 0% NA - ## new variable 'HousingUnitType' (factor) with 5 unique values and 0% NA - ## new variable 'YearMade' (ordered factor) with 8 unique values and 0% NA - ## new variable 'SpaceHeatingUsed' (logical) with 2 unique values and 0% NA - ## new variable 'HeatingBehavior' (factor) with 7 unique values and 0% NA - ## new variable 'WinterTempDay' (double) with 35 unique values and 5% NA - ## new variable 'WinterTempAway' (double) with 37 unique values and 5% NA - ## new variable 'WinterTempNight' (double) with 38 unique values and 5% NA - ## new variable 'ACUsed' (logical) with 2 unique values and 0% NA - ## new variable 'ACBehavior' (factor) with 6 unique values and 0% NA - ## new variable 'SummerTempDay' (double) with 38 unique values and 13% NA - ## new variable 'SummerTempAway' (double) with 35 unique values and 13% NA - ## new variable 'SummerTempNight' (double) with 36 unique values and 13% NA - ## new variable 'ClimateRegion_BA' (factor) with 5 unique values and 0% NA - ## new variable 'ClimateRegion_IECC' (factor) with 11 unique values and 0% NA - -## Check derived variables for correct coding - -``` r -recs %>% count(Region, REGIONC) -``` - - ## count: now 4 rows and 3 columns, ungrouped - - ## # A tibble: 4 × 3 - ## Region REGIONC n - ## - ## 1 Northeast 1 794 - ## 2 Midwest 2 1327 - ## 3 South 3 2010 - ## 4 West 4 1555 - -``` r -recs %>% count(Division, DIVISION) -``` - - ## count: now 10 rows and 3 columns, ungrouped - - ## # A tibble: 10 × 3 - ## Division DIVISION n - ## - ## 1 New England 1 253 - ## 2 Middle Atlantic 2 541 - ## 3 East North Central 3 836 - ## 4 West North Central 4 491 - ## 5 South Atlantic 5 1058 - ## 6 East South Central 6 372 - ## 7 West South Central 7 580 - ## 8 Mountain North 8 228 - ## 9 Mountain South 9 242 - ## 10 Pacific 10 1085 - -``` r -recs %>% count(MSAStatus, METROMICRO) -``` - - ## count: now 3 rows and 3 columns, ungrouped - - ## # A tibble: 3 × 3 - ## MSAStatus METROMICRO n - ## - ## 1 Metropolitan Statistical Area METRO 4745 - ## 2 Micropolitan Statistical Area MICRO 584 - ## 3 None NONE 357 - -``` r -recs %>% count(Urbanicity, UATYP10) -``` - - ## count: now 3 rows and 3 columns, ungrouped - - ## # A tibble: 3 × 3 - ## Urbanicity UATYP10 n - ## - ## 1 Urban Area U 3928 - ## 2 Urban Cluster C 598 - ## 3 Rural R 1160 - -``` r -recs %>% count(HousingUnitType, TYPEHUQ) -``` - - ## count: now 5 rows and 3 columns, ungrouped - - ## # A tibble: 5 × 3 - ## HousingUnitType TYPEHUQ n - ## - ## 1 Mobile home 1 286 - ## 2 Single-family detached 2 3752 - ## 3 Single-family attached 3 479 - ## 4 Apartment: 2-4 Units 4 311 - ## 5 Apartment: 5 or more units 5 858 - -``` r -recs %>% count(YearMade, YEARMADERANGE) -``` - - ## count: now 8 rows and 3 columns, ungrouped - - ## # A tibble: 8 × 3 - ## YearMade YEARMADERANGE n - ## - ## 1 Before 1950 1 858 - ## 2 1950-1959 2 544 - ## 3 1960-1969 3 565 - ## 4 1970-1979 4 928 - ## 5 1980-1989 5 874 - ## 6 1990-1999 6 786 - ## 7 2000-2009 7 901 - ## 8 2010-2015 8 230 - -``` r -recs %>% count(SpaceHeatingUsed, HEATHOME) -``` - - ## count: now 2 rows and 3 columns, ungrouped - - ## # A tibble: 2 × 3 - ## SpaceHeatingUsed HEATHOME n - ## - ## 1 FALSE 0 258 - ## 2 TRUE 1 5428 - -``` r -recs %>% count(HeatingBehavior, EQUIPMUSE) -``` - - ## count: now 7 rows and 3 columns, ungrouped - - ## # A tibble: 7 × 3 - ## HeatingBehavior EQUIPMUSE n - ## - ## 1 Set one temp and leave it 1 2156 - ## 2 Manually adjust at night/no one home 2 1414 - ## 3 Program thermostat to change at certain times 3 972 - ## 4 Turn on or off as needed 4 761 - ## 5 No control 5 114 - ## 6 Other 9 11 - ## 7 -2 258 - -``` r -recs %>% count(ACUsed, AIRCOND) -``` - - ## count: now 2 rows and 3 columns, ungrouped - - ## # A tibble: 2 × 3 - ## ACUsed AIRCOND n - ## - ## 1 FALSE 0 737 - ## 2 TRUE 1 4949 - -``` r -recs %>% count(ACBehavior, USECENAC) -``` - - ## count: now 6 rows and 3 columns, ungrouped - - ## # A tibble: 6 × 3 - ## ACBehavior USECENAC n - ## - ## 1 Set one temp and leave it 1 1661 - ## 2 Manually adjust at night/no one home 2 984 - ## 3 Program thermostat to change at certain times 3 727 - ## 4 Turn on or off as needed 4 438 - ## 5 No control 5 2 - ## 6 -2 1874 - -``` r -recs %>% count(ClimateRegion_BA, CLIMATE_REGION_PUB) -``` - - ## count: now 5 rows and 3 columns, ungrouped - - ## # A tibble: 5 × 3 - ## ClimateRegion_BA CLIMATE_REGION_PUB n - ## - ## 1 Hot-Dry/Mixed-Dry Hot-Dry/Mixed-Dry 750 - ## 2 Hot-Humid Hot-Humid 1036 - ## 3 Mixed-Humid Mixed-Humid 1468 - ## 4 Cold/Very Cold Cold/Very Cold 2008 - ## 5 Marine Marine 424 - -``` r -recs %>% count(ClimateRegion_IECC, IECC_CLIMATE_PUB) -``` - - ## count: now 11 rows and 3 columns, ungrouped - - ## # A tibble: 11 × 3 - ## ClimateRegion_IECC IECC_CLIMATE_PUB n - ## - ## 1 1A-2A 1A-2A 846 - ## 2 2B 2B 106 - ## 3 3A 3A 637 - ## 4 3B-4B 3B-4B 644 - ## 5 3C 3C 209 - ## 6 4A 4A 1021 - ## 7 4C 4C 215 - ## 8 5A 5A 1240 - ## 9 5B-5C 5B-5C 332 - ## 10 6A-6B 6A-6B 376 - ## 11 7A-7B-7AK-8AK 7A-7B-7AK-8AK 60 - -## Save data - -``` r -recs_out <- recs %>% - select(DOEID, REGIONC, Region, Division, MSAStatus, Urbanicity, HousingUnitType, YearMade, SpaceHeatingUsed, HeatingBehavior, WinterTempDay, WinterTempAway, WinterTempNight, ACUsed, ACBehavior, SummerTempDay, SummerTempAway, SummerTempNight, TOTCSQFT, TOTHSQFT, TOTSQFT_EN, TOTUCSQFT, TOTUSQFT, NWEIGHT, starts_with("BRRWT"), CDD30YR, CDD65, CDD80, ClimateRegion_BA, ClimateRegion_IECC, HDD30YR, HDD65, HDD50, GNDHDD65, BTUEL, DOLLAREL, BTUNG, DOLLARNG, BTULP, DOLLARLP, BTUFO, DOLLARFO, TOTALBTU, TOTALDOL, BTUWOOD, BTUPELLET) -``` - - ## select: dropped 17 variables (DIVISION, METROMICRO, UATYP10, TYPEHUQ, YEARMADERANGE, …) - -``` r -summary(recs_out) -``` - - ## DOEID REGIONC Region Division - ## Min. :10001 Min. :1.000 Northeast: 794 Pacific :1085 - ## 1st Qu.:11422 1st Qu.:2.000 Midwest :1327 South Atlantic :1058 - ## Median :12844 Median :3.000 South :2010 East North Central: 836 - ## Mean :12844 Mean :2.761 West :1555 West South Central: 580 - ## 3rd Qu.:14265 3rd Qu.:4.000 Middle Atlantic : 541 - ## Max. :15686 Max. :4.000 West North Central: 491 - ## (Other) :1095 - ## MSAStatus Urbanicity HousingUnitType YearMade - ## Metropolitan Statistical Area:4745 Urban Area :3928 Mobile home : 286 1970-1979 :928 - ## Micropolitan Statistical Area: 584 Urban Cluster: 598 Single-family detached :3752 2000-2009 :901 - ## None : 357 Rural :1160 Single-family attached : 479 1980-1989 :874 - ## Apartment: 2-4 Units : 311 Before 1950:858 - ## Apartment: 5 or more units: 858 1990-1999 :786 - ## 1960-1969 :565 - ## (Other) :774 - ## SpaceHeatingUsed HeatingBehavior WinterTempDay WinterTempAway - ## Mode :logical Set one temp and leave it :2156 Min. :50.00 Min. :50.00 - ## FALSE:258 Manually adjust at night/no one home :1414 1st Qu.:68.00 1st Qu.:65.00 - ## TRUE :5428 Program thermostat to change at certain times: 972 Median :70.00 Median :68.00 - ## Turn on or off as needed : 761 Mean :70.06 Mean :67.12 - ## No control : 114 3rd Qu.:72.00 3rd Qu.:70.00 - ## Other : 11 Max. :90.00 Max. :90.00 - ## NA : 258 NA's :258 NA's :258 - ## WinterTempNight ACUsed ACBehavior SummerTempDay - ## Min. :50.00 Mode :logical Set one temp and leave it :1661 Min. :50.00 - ## 1st Qu.:65.00 FALSE:737 Manually adjust at night/no one home : 984 1st Qu.:70.00 - ## Median :68.00 TRUE :4949 Program thermostat to change at certain times: 727 Median :72.00 - ## Mean :68.06 Turn on or off as needed : 438 Mean :72.66 - ## 3rd Qu.:70.00 No control : 2 3rd Qu.:76.00 - ## Max. :90.00 NA :1874 Max. :90.00 - ## NA's :258 NA's :737 - ## SummerTempAway SummerTempNight TOTCSQFT TOTHSQFT TOTSQFT_EN TOTUCSQFT - ## Min. :50.00 Min. :50.00 Min. : 0.0 Min. : 0 Min. : 221 Min. : 0.0 - ## 1st Qu.:71.00 1st Qu.:70.00 1st Qu.: 466.2 1st Qu.:1008 1st Qu.:1100 1st Qu.: 0.0 - ## Median :75.00 Median :72.00 Median :1218.5 Median :1559 Median :1774 Median : 400.0 - ## Mean :74.63 Mean :71.82 Mean :1454.5 Mean :1816 Mean :2081 Mean : 793.9 - ## 3rd Qu.:78.00 3rd Qu.:75.00 3rd Qu.:2094.0 3rd Qu.:2400 3rd Qu.:2766 3rd Qu.:1150.0 - ## Max. :90.00 Max. :90.00 Max. :8066.0 Max. :8066 Max. :8501 Max. :7986.0 - ## NA's :737 NA's :737 - ## TOTUSQFT NWEIGHT BRRWT1 BRRWT2 BRRWT3 BRRWT4 - ## Min. : 0.0 Min. : 1236 Min. : 1836 Min. : 685.9 Min. : 543.9 Min. : 699.7 - ## 1st Qu.: 0.0 1st Qu.: 13874 1st Qu.: 9859 1st Qu.: 9733.0 1st Qu.: 9575.3 1st Qu.: 9518.5 - ## Median : 250.0 Median : 18510 Median : 16942 Median : 16993.7 Median : 16698.7 Median : 17034.2 - ## Mean : 432.6 Mean : 20789 Mean : 20789 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 - ## 3rd Qu.: 569.8 3rd Qu.: 24840 3rd Qu.: 27219 3rd Qu.: 27825.1 3rd Qu.: 27941.8 3rd Qu.: 27931.5 - ## Max. :6660.0 Max. :139307 Max. :203902 Max. :189788.1 Max. :180155.3 Max. :159902.6 - ## - ## BRRWT5 BRRWT6 BRRWT7 BRRWT8 BRRWT9 - ## Min. : 649.3 Min. : 638.7 Min. : 564.1 Min. : 591 Min. : 545.2 - ## 1st Qu.: 9598.5 1st Qu.: 9501.7 1st Qu.: 9534.4 1st Qu.: 9653 1st Qu.: 9595.0 - ## Median : 16487.5 Median : 16150.6 Median : 16332.5 Median : 16802 Median : 17352.7 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789.3 - ## 3rd Qu.: 27856.7 3rd Qu.: 28092.8 3rd Qu.: 27992.5 3rd Qu.: 27926 3rd Qu.: 27753.7 - ## Max. :141796.4 Max. :189031.8 Max. :192311.7 Max. :195071 Max. :117167.3 - ## - ## BRRWT10 BRRWT11 BRRWT12 BRRWT13 BRRWT14 - ## Min. : 732.5 Min. : 586.1 Min. : 549.8 Min. : 668 Min. : 544.5 - ## 1st Qu.: 9077.6 1st Qu.: 9448.5 1st Qu.: 9388.2 1st Qu.: 9757 1st Qu.: 9491.8 - ## Median : 16601.9 Median : 16172.3 Median : 16167.4 Median : 16584 Median : 17028.9 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789.3 - ## 3rd Qu.: 28089.9 3rd Qu.: 28022.1 3rd Qu.: 28075.4 3rd Qu.: 27455 3rd Qu.: 27975.3 - ## Max. :183073.4 Max. :195408.4 Max. :197373.3 Max. :182228 Max. :173341.2 - ## - ## BRRWT15 BRRWT16 BRRWT17 BRRWT18 BRRWT19 - ## Min. : 671.4 Min. : 603.4 Min. : 563.3 Min. : 517.2 Min. : 657 - ## 1st Qu.: 9341.8 1st Qu.: 9804.6 1st Qu.: 9593.2 1st Qu.: 9839.6 1st Qu.: 9776 - ## Median : 15996.8 Median : 16562.6 Median : 16750.8 Median : 16560.5 Median : 16779 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789 - ## 3rd Qu.: 28117.5 3rd Qu.: 27322.1 3rd Qu.: 27458.0 3rd Qu.: 27636.2 3rd Qu.: 27986 - ## Max. :179152.7 Max. :210507.2 Max. :195346.9 Max. :158094.9 Max. :197236 - ## - ## BRRWT20 BRRWT21 BRRWT22 BRRWT23 BRRWT24 - ## Min. : 682.2 Min. : 689.4 Min. : 581.3 Min. : 658.4 Min. : 698.7 - ## 1st Qu.: 9569.2 1st Qu.: 9663.9 1st Qu.: 9805.3 1st Qu.: 9597.1 1st Qu.: 9387.9 - ## Median : 16881.2 Median : 16503.8 Median : 16711.4 Median : 16205.0 Median : 16398.2 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 - ## 3rd Qu.: 27467.7 3rd Qu.: 27863.0 3rd Qu.: 27503.4 3rd Qu.: 27855.2 3rd Qu.: 27791.0 - ## Max. :146347.4 Max. :181583.8 Max. :173557.2 Max. :182366.0 Max. :170970.0 - ## - ## BRRWT25 BRRWT26 BRRWT27 BRRWT28 BRRWT29 BRRWT30 - ## Min. : 541.3 Min. : 832.9 Min. : 1372 Min. : 764.7 Min. : 854 Min. : 680.6 - ## 1st Qu.: 9502.9 1st Qu.: 9593.2 1st Qu.: 9333 1st Qu.: 9358.0 1st Qu.: 9596 1st Qu.: 9689.3 - ## Median : 17120.6 Median : 16642.2 Median : 16671 Median : 16663.4 Median : 16336 Median : 16683.8 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789.3 Mean : 20789 Mean : 20789.3 - ## 3rd Qu.: 28108.8 3rd Qu.: 28018.5 3rd Qu.: 27832 3rd Qu.: 28065.9 3rd Qu.: 27506 3rd Qu.: 27613.1 - ## Max. :128220.6 Max. :176770.0 Max. :176453 Max. :210413.6 Max. :194434 Max. :118557.6 - ## - ## BRRWT31 BRRWT32 BRRWT33 BRRWT34 BRRWT35 - ## Min. : 868.4 Min. : 645.1 Min. : 714.2 Min. : 1880 Min. : 629.3 - ## 1st Qu.: 9493.1 1st Qu.: 9370.6 1st Qu.: 9530.8 1st Qu.: 9703 1st Qu.: 9842.0 - ## Median : 16876.0 Median : 16594.5 Median : 16839.7 Median : 16380 Median : 17204.4 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789.3 - ## 3rd Qu.: 27807.8 3rd Qu.: 28250.9 3rd Qu.: 27610.2 3rd Qu.: 27846 3rd Qu.: 27533.4 - ## Max. :197960.8 Max. :182658.3 Max. :183414.8 Max. :130246 Max. :125674.9 - ## - ## BRRWT36 BRRWT37 BRRWT38 BRRWT39 BRRWT40 BRRWT41 - ## Min. : 980.2 Min. : 634.6 Min. : 738.1 Min. : 684.5 Min. : 1531 Min. : 1406 - ## 1st Qu.: 9439.6 1st Qu.: 9276.7 1st Qu.: 9737.9 1st Qu.: 9389.5 1st Qu.: 9624 1st Qu.: 9776 - ## Median : 16440.6 Median : 16620.9 Median : 16862.8 Median : 16797.7 Median : 16644 Median : 16910 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789 - ## 3rd Qu.: 28354.2 3rd Qu.: 27754.3 3rd Qu.: 27710.0 3rd Qu.: 27850.3 3rd Qu.: 27858 3rd Qu.: 27616 - ## Max. :171375.9 Max. :209103.9 Max. :187208.7 Max. :136106.4 Max. :165612 Max. :145467 - ## - ## BRRWT42 BRRWT43 BRRWT44 BRRWT45 BRRWT46 BRRWT47 - ## Min. : 943.8 Min. : 683.3 Min. : 866.4 Min. : 1105 Min. : 750.7 Min. : 1230 - ## 1st Qu.: 9446.7 1st Qu.: 9563.6 1st Qu.: 9595.5 1st Qu.: 9563 1st Qu.: 9616.2 1st Qu.: 9362 - ## Median : 16177.2 Median : 16999.1 Median : 17034.6 Median : 16629 Median : 16821.6 Median : 16243 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789.3 Mean : 20789 - ## 3rd Qu.: 28089.3 3rd Qu.: 27724.1 3rd Qu.: 27593.8 3rd Qu.: 27773 3rd Qu.: 27563.3 3rd Qu.: 27547 - ## Max. :189726.6 Max. :192302.9 Max. :190671.5 Max. :160108 Max. :183963.8 Max. :196001 - ## - ## BRRWT48 BRRWT49 BRRWT50 BRRWT51 BRRWT52 - ## Min. : 684.4 Min. : 627.1 Min. : 1638 Min. : 922.9 Min. : 749.9 - ## 1st Qu.: 9383.9 1st Qu.: 9489.0 1st Qu.: 9601 1st Qu.: 9704.7 1st Qu.: 9496.9 - ## Median : 16720.3 Median : 17068.6 Median : 16788 Median : 16706.2 Median : 16442.9 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789.3 Mean : 20789.3 - ## 3rd Qu.: 27965.8 3rd Qu.: 27829.1 3rd Qu.: 27667 3rd Qu.: 27755.8 3rd Qu.: 27621.2 - ## Max. :199079.7 Max. :203407.7 Max. :223546 Max. :161561.8 Max. :146056.0 - ## - ## BRRWT53 BRRWT54 BRRWT55 BRRWT56 BRRWT57 - ## Min. : 871.8 Min. : 687.9 Min. : 2056 Min. : 623.7 Min. : 713.4 - ## 1st Qu.: 9489.1 1st Qu.: 9623.3 1st Qu.: 9595 1st Qu.: 9798.4 1st Qu.: 9393.8 - ## Median : 16494.9 Median : 16662.9 Median : 16589 Median : 16624.8 Median : 17198.4 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789.3 Mean : 20789.3 - ## 3rd Qu.: 28075.0 3rd Qu.: 27612.8 3rd Qu.: 27857 3rd Qu.: 27650.0 3rd Qu.: 27964.1 - ## Max. :143796.6 Max. :174657.5 Max. :206797 Max. :226169.8 Max. :162193.6 - ## - ## BRRWT58 BRRWT59 BRRWT60 BRRWT61 BRRWT62 - ## Min. : 905.5 Min. : 630.7 Min. : 1275 Min. : 546.4 Min. : 739.7 - ## 1st Qu.: 9559.2 1st Qu.: 9623.7 1st Qu.: 9577 1st Qu.: 9387.4 1st Qu.: 9643.5 - ## Median : 16540.0 Median : 16656.6 Median : 16197 Median : 16376.3 Median : 17067.2 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789.3 Mean : 20789.3 - ## 3rd Qu.: 27780.9 3rd Qu.: 27577.8 3rd Qu.: 27781 3rd Qu.: 28016.5 3rd Qu.: 27540.6 - ## Max. :211170.6 Max. :206702.7 Max. :169387 Max. :122260.9 Max. :158200.9 - ## - ## BRRWT63 BRRWT64 BRRWT65 BRRWT66 BRRWT67 BRRWT68 - ## Min. : 671.5 Min. : 926.4 Min. : 1144 Min. : 1264 Min. : 684.8 Min. : 1053 - ## 1st Qu.: 9455.3 1st Qu.: 9400.5 1st Qu.: 9597 1st Qu.: 9758 1st Qu.: 9588.0 1st Qu.: 9245 - ## Median : 16632.1 Median : 16508.1 Median : 16442 Median : 16565 Median : 16560.8 Median : 16464 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789 Mean : 20789.3 Mean : 20789 - ## 3rd Qu.: 28020.8 3rd Qu.: 27693.9 3rd Qu.: 27348 3rd Qu.: 27884 3rd Qu.: 27838.7 3rd Qu.: 28108 - ## Max. :196933.9 Max. :217490.7 Max. :239712 Max. :157193 Max. :179204.9 Max. :183266 - ## - ## BRRWT69 BRRWT70 BRRWT71 BRRWT72 BRRWT73 BRRWT74 - ## Min. : 1676 Min. : 758.4 Min. : 892.2 Min. : 695.5 Min. : 875 Min. : 541.6 - ## 1st Qu.: 9371 1st Qu.: 9622.5 1st Qu.: 9451.9 1st Qu.: 9516.0 1st Qu.: 9734 1st Qu.: 9503.9 - ## Median : 16682 Median : 16676.4 Median : 16482.8 Median : 16717.8 Median : 16930 Median : 16128.6 - ## Mean : 20789 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789 Mean : 20789.3 - ## 3rd Qu.: 27957 3rd Qu.: 27897.7 3rd Qu.: 27882.7 3rd Qu.: 27611.7 3rd Qu.: 27756 3rd Qu.: 27849.9 - ## Max. :193274 Max. :146583.8 Max. :126528.3 Max. :196704.6 Max. :184412 Max. :125833.8 - ## - ## BRRWT75 BRRWT76 BRRWT77 BRRWT78 BRRWT79 - ## Min. : 669.7 Min. : 617 Min. : 560.5 Min. : 526.7 Min. : 651.1 - ## 1st Qu.: 9835.9 1st Qu.: 9385 1st Qu.: 9673.8 1st Qu.: 9744.1 1st Qu.: 9549.7 - ## Median : 16921.5 Median : 17000 Median : 16713.6 Median : 17098.9 Median : 16676.0 - ## Mean : 20789.3 Mean : 20789 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 - ## 3rd Qu.: 27352.3 3rd Qu.: 27558 3rd Qu.: 27712.8 3rd Qu.: 27459.8 3rd Qu.: 27857.9 - ## Max. :194829.8 Max. :212262 Max. :234971.4 Max. :152055.4 Max. :180157.0 - ## - ## BRRWT80 BRRWT81 BRRWT82 BRRWT83 BRRWT84 - ## Min. : 675.7 Min. : 681.2 Min. : 563.6 Min. : 656.9 Min. : 652.7 - ## 1st Qu.: 9554.4 1st Qu.: 9489.0 1st Qu.: 9216.4 1st Qu.: 9634.4 1st Qu.: 9432.5 - ## Median : 16707.8 Median : 16769.3 Median : 16121.6 Median : 16516.9 Median : 16454.8 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 - ## 3rd Qu.: 27688.3 3rd Qu.: 27901.5 3rd Qu.: 28253.1 3rd Qu.: 27725.8 3rd Qu.: 28006.4 - ## Max. :165661.6 Max. :191740.1 Max. :171004.8 Max. :184719.0 Max. :191550.3 - ## - ## BRRWT85 BRRWT86 BRRWT87 BRRWT88 BRRWT89 - ## Min. : 675.4 Min. : 680.3 Min. : 551.7 Min. : 704.2 Min. : 644.9 - ## 1st Qu.: 9551.2 1st Qu.: 9619.8 1st Qu.: 9436.6 1st Qu.: 9393.1 1st Qu.: 9643.2 - ## Median : 16902.2 Median : 16772.0 Median : 16799.0 Median : 16778.6 Median : 16586.1 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 - ## 3rd Qu.: 27325.4 3rd Qu.: 27638.1 3rd Qu.: 28046.3 3rd Qu.: 27789.9 3rd Qu.: 28075.4 - ## Max. :198238.4 Max. :232065.5 Max. :179835.0 Max. :166866.1 Max. :144299.3 - ## - ## BRRWT90 BRRWT91 BRRWT92 BRRWT93 BRRWT94 - ## Min. : 649.2 Min. : 568.2 Min. : 591.9 Min. : 545.3 Min. : 716.2 - ## 1st Qu.: 9467.7 1st Qu.: 9506.3 1st Qu.: 9610.6 1st Qu.: 9688.4 1st Qu.: 9561.6 - ## Median : 16212.0 Median : 16781.5 Median : 16524.1 Median : 16258.4 Median : 17099.7 - ## Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 Mean : 20789.3 - ## 3rd Qu.: 28020.8 3rd Qu.: 27876.1 3rd Qu.: 27915.1 3rd Qu.: 27728.8 3rd Qu.: 27853.9 - ## Max. :175279.5 Max. :205917.4 Max. :225638.4 Max. :117260.5 Max. :207264.3 - ## - ## BRRWT95 BRRWT96 CDD30YR CDD65 CDD80 - ## Min. : 566.4 Min. : 551.1 Min. : 0 Min. : 0 Min. : 0.0 - ## 1st Qu.: 9530.2 1st Qu.: 9533.2 1st Qu.: 712 1st Qu.: 793 1st Qu.: 10.0 - ## Median : 16577.2 Median : 16358.9 Median :1150 Median :1378 Median : 60.0 - ## Mean : 20789.3 Mean : 20789.3 Mean :1451 Mean :1719 Mean : 174.7 - ## 3rd Qu.: 27441.4 3rd Qu.: 27823.1 3rd Qu.:1880 3rd Qu.:2231 3rd Qu.: 208.0 - ## Max. :205015.8 Max. :171550.8 Max. :5792 Max. :6607 Max. :2297.0 - ## - ## ClimateRegion_BA ClimateRegion_IECC HDD30YR HDD65 HDD50 GNDHDD65 - ## Hot-Dry/Mixed-Dry: 750 5A :1240 Min. : 0 Min. : 0 Min. : 0 Min. : 0 - ## Hot-Humid :1036 4A :1021 1st Qu.: 2102 1st Qu.:1881 1st Qu.: 260 1st Qu.: 1337 - ## Mixed-Humid :1468 1A-2A : 846 Median : 4353 Median :3878 Median :1260 Median : 3704 - ## Cold/Very Cold :2008 3B-4B : 644 Mean : 4087 Mean :3708 Mean :1486 Mean : 3578 - ## Marine : 424 3A : 637 3rd Qu.: 5967 3rd Qu.:5467 3rd Qu.:2499 3rd Qu.: 5630 - ## 6A-6B : 376 Max. :12184 Max. :9843 Max. :4956 Max. :11851 - ## (Other): 922 - ## BTUEL DOLLAREL BTUNG DOLLARNG BTULP DOLLARLP - ## Min. : 201.6 Min. : 18.72 Min. : 0 Min. : 0.0 Min. : 0 Min. : 0.00 - ## 1st Qu.: 20221.3 1st Qu.: 815.12 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0 1st Qu.: 0.00 - ## Median : 32582.4 Median :1253.02 Median : 17961 Median : 231.8 Median : 0 Median : 0.00 - ## Mean : 37630.7 Mean :1403.78 Mean : 33331 Mean : 346.8 Mean : 3192 Mean : 67.72 - ## 3rd Qu.: 49670.6 3rd Qu.:1830.83 3rd Qu.: 57126 3rd Qu.: 605.1 3rd Qu.: 0 3rd Qu.: 0.00 - ## Max. :215695.7 Max. :8121.56 Max. :306594 Max. :2789.8 Max. :220435 Max. :5121.27 - ## - ## BTUFO DOLLARFO TOTALBTU TOTALDOL BTUWOOD BTUPELLET - ## Min. : 0 Min. : 0.00 Min. : 201.6 Min. : 60.46 Min. : 0 Min. : 0.0 - ## 1st Qu.: 0 1st Qu.: 0.00 1st Qu.: 42655.8 1st Qu.: 1175.49 1st Qu.: 0 1st Qu.: 0.0 - ## Median : 0 Median : 0.00 Median : 68663.3 Median : 1724.60 Median : 0 Median : 0.0 - ## Mean : 3569 Mean : 64.08 Mean : 77722.9 Mean : 1882.34 Mean : 4140 Mean : 197.4 - ## 3rd Qu.: 0 3rd Qu.: 0.00 3rd Qu.:103832.9 3rd Qu.: 2385.84 3rd Qu.: 0 3rd Qu.: 0.0 - ## Max. :273608 Max. :4700.03 Max. :490187.4 Max. :10135.99 Max. :295476 Max. :115500.0 - ## - -``` r -recs_der_tmp_loc <- here("osf_dl", "recs_2015.rds") -write_rds(recs_out, recs_der_tmp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=recs_der_tmp_loc, conflicts="overwrite") -``` - - ## # A tibble: 1 × 3 - ## name id meta - ## - ## 1 recs_2015.rds 647d2c0e85df48090e7754b2 - -``` r -unlink(recs_der_tmp_loc) -``` diff --git a/DataCleaningScripts/RECS_2020_DataPrep.Rmd b/DataCleaningScripts/RECS_2020_DataPrep.Rmd index bcf9994a..e69de29b 100644 --- a/DataCleaningScripts/RECS_2020_DataPrep.Rmd +++ b/DataCleaningScripts/RECS_2020_DataPrep.Rmd @@ -1,230 +0,0 @@ ---- -title: "Residential Energy Consumption Survey (RECS) 2020 Data Prep" -output: - github_document: - html_preview: false ---- - -```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) -``` - -## Data information - -All data and resources were downloaded from https://www.eia.gov/consumption/residential/data/2020/index.php?view=microdata on September 17, 2023. - -```{r} -#| label: loadpackages - -library(tidyverse) #data manipulation -library(haven) #data import -library(tidylog) #informative logging messages -library(osfr) -``` - -## Import data and create derived variables - -```{r} -#| label: derivedata - -recs_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="RECS_2020", pattern="sas7bdat") %>% - filter(str_detect(name, "v5")) %>% - osf_download(conflicts="overwrite", path=here::here("osf_dl")) - -recs_in <- haven::read_sas(pull(recs_file_osf_det, local_path)) - -unlink(pull(recs_file_osf_det, local_path)) - - -# 2015 to 2020 differences -# Added states! -# Variables gone: METROMICRO, TOTUCSQFT (uncooled sq ft), TOTUSQFT (unheated sq ft), CDD80, HDD50, GNDHDD65, PELLETBTU -# HEATCNTL replaces EQUIPMUSE -# COOLCNTL replaces USECENAC -# CDD30YR_PUB replaces CDD30YR -# BA_climate replaces CLIMATE_REGION_PUB -# IECC_climate_code replaces IECC_CLIMATE_PUB -# HDD30YR_PUB replaces HDD30YR -# BTUWD replaces WOODBTU -# BRR weights are NWEIGHT - -recs <- recs_in %>% - select(DOEID, REGIONC, DIVISION, STATE_FIPS, state_postal, state_name, UATYP10, TYPEHUQ, YEARMADERANGE, HEATHOME, HEATCNTL, TEMPHOME, TEMPGONE, TEMPNITE, AIRCOND, COOLCNTL, TEMPHOMEAC, TEMPGONEAC, TEMPNITEAC, TOTCSQFT, TOTHSQFT, TOTSQFT_EN, NWEIGHT, starts_with("NWEIGHT"), CDD30YR=CDD30YR_PUB, CDD65, BA_climate, IECC_climate_code, HDD30YR=HDD30YR_PUB, HDD65, BTUEL, DOLLAREL, BTUNG, DOLLARNG, BTULP, DOLLARLP, BTUFO, DOLLARFO, TOTALBTU, TOTALDOL, BTUWOOD=BTUWD) %>% - mutate( - Region=parse_factor( - str_to_title(REGIONC), - levels=c("Northeast", "Midwest", "South", "West")), - Division=parse_factor( - DIVISION, levels=c("New England", "Middle Atlantic", "East North Central", "West North Central", "South Atlantic", "East South Central", "West South Central", "Mountain North", "Mountain South", "Pacific")), - Urbanicity=parse_factor( - case_when( - UATYP10=="U"~"Urban Area", - UATYP10=="C"~"Urban Cluster", - UATYP10=="R"~"Rural" - ), - levels=c("Urban Area", "Urban Cluster", "Rural") - ), - HousingUnitType=parse_factor( - case_when( - TYPEHUQ==1~"Mobile home", - TYPEHUQ==2~"Single-family detached", - TYPEHUQ==3~"Single-family attached", - TYPEHUQ==4~"Apartment: 2-4 Units", - TYPEHUQ==5~"Apartment: 5 or more units", - ), levels=c("Mobile home", "Single-family detached", "Single-family attached", "Apartment: 2-4 Units", "Apartment: 5 or more units")), - YearMade=parse_factor( - case_when( - YEARMADERANGE==1~"Before 1950", - YEARMADERANGE==2~"1950-1959", - YEARMADERANGE==3~"1960-1969", - YEARMADERANGE==4~"1970-1979", - YEARMADERANGE==5~"1980-1989", - YEARMADERANGE==6~"1990-1999", - YEARMADERANGE==7~"2000-2009", - YEARMADERANGE==8~"2010-2015", - YEARMADERANGE==9~"2016-2020" - ), - levels=c("Before 1950", "1950-1959", "1960-1969", "1970-1979", "1980-1989", "1990-1999", "2000-2009", "2010-2015", "2016-2020"), - ordered = TRUE - ), - SpaceHeatingUsed=as.logical(HEATHOME), - HeatingBehavior=parse_factor( - case_when( - HEATCNTL==1~"Set one temp and leave it", - HEATCNTL==2~"Manually adjust at night/no one home", - HEATCNTL==3~"Programmable or smart thermostat automatically adjusts the temperature", - HEATCNTL==4~"Turn on or off as needed", - HEATCNTL==5~"No control", - HEATCNTL==99~"Other", - HEATCNTL==-2~NA_character_), - levels=c("Set one temp and leave it", "Manually adjust at night/no one home", "Programmable or smart thermostat automatically adjusts the temperature", "Turn on or off as needed", "No control", "Other") - ), - WinterTempDay=if_else(TEMPHOME>0, TEMPHOME, NA_real_), - WinterTempAway=if_else(TEMPGONE>0, TEMPGONE, NA_real_), - WinterTempNight=if_else(TEMPNITE>0, TEMPNITE, NA_real_), - ACUsed=as.logical(AIRCOND), - ACBehavior=parse_factor( - case_when( - COOLCNTL==1~"Set one temp and leave it", - COOLCNTL==2~"Manually adjust at night/no one home", - COOLCNTL==3~"Programmable or smart thermostat automatically adjusts the temperature", - COOLCNTL==4~"Turn on or off as needed", - COOLCNTL==5~"No control", - COOLCNTL==99~"Other", - COOLCNTL==-2~NA_character_), - levels=c("Set one temp and leave it", "Manually adjust at night/no one home", "Programmable or smart thermostat automatically adjusts the temperature", "Turn on or off as needed", "No control", "Other") - ), - SummerTempDay=if_else(TEMPHOMEAC>0, TEMPHOMEAC, NA_real_), - SummerTempAway=if_else(TEMPGONEAC>0, TEMPGONEAC, NA_real_), - SummerTempNight=if_else(TEMPNITEAC>0, TEMPNITEAC, NA_real_), - ClimateRegion_BA=parse_factor(BA_climate), - state_name=factor(state_name), - state_postal=fct_reorder(state_postal, as.numeric(state_name)) - ) - -``` - -## Check derived variables for correct coding - -```{r} -#| label: checkvars - - -recs %>% count(Region, REGIONC) -recs %>% count(Division, DIVISION) -recs %>% count(Urbanicity, UATYP10) -recs %>% count(HousingUnitType, TYPEHUQ) -recs %>% count(YearMade, YEARMADERANGE) -recs %>% count(SpaceHeatingUsed, HEATHOME) -recs %>% count(HeatingBehavior, HEATCNTL) -recs %>% count(ACUsed, AIRCOND) -recs %>% count(ACBehavior, COOLCNTL) -recs %>% count(ClimateRegion_BA, BA_climate) -recs %>% count(state_postal, state_name, STATE_FIPS) %>% print(n=51) -``` - - -## Save data - -```{r compare-2015} -recs_out <- recs %>% - select(DOEID, starts_with("NWEIGHT"), - REGIONC, Region, Division, starts_with("state"), Urbanicity, - HousingUnitType, YearMade, SpaceHeatingUsed, HeatingBehavior, - WinterTempDay, WinterTempAway, WinterTempNight, ACUsed, - ACBehavior, SummerTempDay, SummerTempAway, SummerTempNight, - TOTCSQFT, TOTHSQFT, TOTSQFT_EN, - CDD30YR, CDD65, ClimateRegion_BA, - HDD30YR, HDD65, BTUEL, - DOLLAREL, BTUNG, DOLLARNG, BTULP, DOLLARLP, BTUFO, DOLLARFO, - TOTALBTU, TOTALDOL, BTUWOOD) - - - - -source(here::here("helper-fun", "helper-function.R")) - -recs_2015 <- read_osf("recs_2015.rds") - -setdiff(names(recs_out), names(recs_2015)) #variables in 2020 and not 2015 -setdiff(names(recs_2015), names(recs_out)) #variables in 2015 and not 2020 - -``` - - -```{r} -#| label: add-question-text -for (var in colnames(recs_out)) { - attr(recs_out[[deparse(as.name(var))]], "format.sas") <- NULL -} - -cb_in <- readxl::read_xlsx(here::here("DataCleaningScripts", "RECS 2020 Codebook Questions.xlsx"), skip=1) - -cb_ord <- cb_in %>% - mutate( - Order=row_number(), - Section=if_else(Section=="End-use Model", "CONSUMPTION AND EXPENDITURE", Section), - Section=fct_reorder(Section, Order, min)) - -cb_slim <- cb_ord %>% select(Variable=BookDerived, `Description and Labels`, Question, Section, Order) %>% - filter(!is.na(Variable)) %>% - bind_rows(select(cb_ord, Variable, `Description and Labels`, Question, Section, Order)) %>% - arrange(Section, Order) - -names(recs_out)[!(names(recs_out) %in% pull(cb_slim, Variable))] - -cb_vars <- cb_slim %>% - filter(Variable %in% c(names(recs_out))) - -nrow(cb_vars) -ncol(recs_out) - -recs_ord <- recs_out %>% select(all_of(pull(cb_vars, Variable))) - -for (var in pull(cb_vars, Variable)) { - vi <- cb_vars %>% filter(Variable==var) - attr(recs_ord[[deparse(as.name(var))]], "label") <- pull(vi, `Description and Labels`) - attr(recs_ord[[deparse(as.name(var))]], "Section") <- pull(vi, Section) %>% as.character() - if (!is.na(pull(vi, Question))) attr(recs_ord[[deparse(as.name(var))]], "Question") <- pull(vi, Question) -} - - -``` - - - - - -```{r savedat} -summary(recs_ord) -str(recs_ord) - -recs_der_tmp_loc <- here::here("osf_dl", "recs_2020.rds") -write_rds(recs_ord, recs_der_tmp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=recs_der_tmp_loc, conflicts="overwrite") -unlink(recs_der_tmp_loc) - -``` - diff --git a/DataCleaningScripts/RECS_2020_DataPrep.md b/DataCleaningScripts/RECS_2020_DataPrep.md index 45229eef..e69de29b 100644 --- a/DataCleaningScripts/RECS_2020_DataPrep.md +++ b/DataCleaningScripts/RECS_2020_DataPrep.md @@ -1,856 +0,0 @@ -Residential Energy Consumption Survey (RECS) 2020 Data Prep -================ - -## Data information - -All data and resources were downloaded from - -on September 17, 2023. - -``` r -library(tidyverse) #data manipulation -library(haven) #data import -library(tidylog) #informative logging messages -library(osfr) -``` - -## Import data and create derived variables - -``` r -recs_file_osf_det <- osf_retrieve_node("https://osf.io/z5c3m/") %>% - osf_ls_files(path="RECS_2020", pattern="sas7bdat") %>% - filter(str_detect(name, "v5")) %>% - osf_download(conflicts="overwrite", path=here::here("osf_dl")) - -recs_in <- haven::read_sas(pull(recs_file_osf_det, local_path)) - -unlink(pull(recs_file_osf_det, local_path)) - - -# 2015 to 2020 differences -# Added states! -# Variables gone: METROMICRO, TOTUCSQFT (uncooled sq ft), TOTUSQFT (unheated sq ft), CDD80, HDD50, GNDHDD65, PELLETBTU -# HEATCNTL replaces EQUIPMUSE -# COOLCNTL replaces USECENAC -# CDD30YR_PUB replaces CDD30YR -# BA_climate replaces CLIMATE_REGION_PUB -# IECC_climate_code replaces IECC_CLIMATE_PUB -# HDD30YR_PUB replaces HDD30YR -# BTUWD replaces WOODBTU -# BRR weights are NWEIGHT - -recs <- recs_in %>% - select(DOEID, REGIONC, DIVISION, STATE_FIPS, state_postal, state_name, UATYP10, TYPEHUQ, YEARMADERANGE, HEATHOME, HEATCNTL, TEMPHOME, TEMPGONE, TEMPNITE, AIRCOND, COOLCNTL, TEMPHOMEAC, TEMPGONEAC, TEMPNITEAC, TOTCSQFT, TOTHSQFT, TOTSQFT_EN, NWEIGHT, starts_with("NWEIGHT"), CDD30YR=CDD30YR_PUB, CDD65, BA_climate, IECC_climate_code, HDD30YR=HDD30YR_PUB, HDD65, BTUEL, DOLLAREL, BTUNG, DOLLARNG, BTULP, DOLLARLP, BTUFO, DOLLARFO, TOTALBTU, TOTALDOL, BTUWOOD=BTUWD) %>% - mutate( - Region=parse_factor( - str_to_title(REGIONC), - levels=c("Northeast", "Midwest", "South", "West")), - Division=parse_factor( - DIVISION, levels=c("New England", "Middle Atlantic", "East North Central", "West North Central", "South Atlantic", "East South Central", "West South Central", "Mountain North", "Mountain South", "Pacific")), - Urbanicity=parse_factor( - case_when( - UATYP10=="U"~"Urban Area", - UATYP10=="C"~"Urban Cluster", - UATYP10=="R"~"Rural" - ), - levels=c("Urban Area", "Urban Cluster", "Rural") - ), - HousingUnitType=parse_factor( - case_when( - TYPEHUQ==1~"Mobile home", - TYPEHUQ==2~"Single-family detached", - TYPEHUQ==3~"Single-family attached", - TYPEHUQ==4~"Apartment: 2-4 Units", - TYPEHUQ==5~"Apartment: 5 or more units", - ), levels=c("Mobile home", "Single-family detached", "Single-family attached", "Apartment: 2-4 Units", "Apartment: 5 or more units")), - YearMade=parse_factor( - case_when( - YEARMADERANGE==1~"Before 1950", - YEARMADERANGE==2~"1950-1959", - YEARMADERANGE==3~"1960-1969", - YEARMADERANGE==4~"1970-1979", - YEARMADERANGE==5~"1980-1989", - YEARMADERANGE==6~"1990-1999", - YEARMADERANGE==7~"2000-2009", - YEARMADERANGE==8~"2010-2015", - YEARMADERANGE==9~"2016-2020" - ), - levels=c("Before 1950", "1950-1959", "1960-1969", "1970-1979", "1980-1989", "1990-1999", "2000-2009", "2010-2015", "2016-2020"), - ordered = TRUE - ), - SpaceHeatingUsed=as.logical(HEATHOME), - HeatingBehavior=parse_factor( - case_when( - HEATCNTL==1~"Set one temp and leave it", - HEATCNTL==2~"Manually adjust at night/no one home", - HEATCNTL==3~"Programmable or smart thermostat automatically adjusts the temperature", - HEATCNTL==4~"Turn on or off as needed", - HEATCNTL==5~"No control", - HEATCNTL==99~"Other", - HEATCNTL==-2~NA_character_), - levels=c("Set one temp and leave it", "Manually adjust at night/no one home", "Programmable or smart thermostat automatically adjusts the temperature", "Turn on or off as needed", "No control", "Other") - ), - WinterTempDay=if_else(TEMPHOME>0, TEMPHOME, NA_real_), - WinterTempAway=if_else(TEMPGONE>0, TEMPGONE, NA_real_), - WinterTempNight=if_else(TEMPNITE>0, TEMPNITE, NA_real_), - ACUsed=as.logical(AIRCOND), - ACBehavior=parse_factor( - case_when( - COOLCNTL==1~"Set one temp and leave it", - COOLCNTL==2~"Manually adjust at night/no one home", - COOLCNTL==3~"Programmable or smart thermostat automatically adjusts the temperature", - COOLCNTL==4~"Turn on or off as needed", - COOLCNTL==5~"No control", - COOLCNTL==99~"Other", - COOLCNTL==-2~NA_character_), - levels=c("Set one temp and leave it", "Manually adjust at night/no one home", "Programmable or smart thermostat automatically adjusts the temperature", "Turn on or off as needed", "No control", "Other") - ), - SummerTempDay=if_else(TEMPHOMEAC>0, TEMPHOMEAC, NA_real_), - SummerTempAway=if_else(TEMPGONEAC>0, TEMPGONEAC, NA_real_), - SummerTempNight=if_else(TEMPNITEAC>0, TEMPNITEAC, NA_real_), - ClimateRegion_BA=parse_factor(BA_climate), - state_name=factor(state_name), - state_postal=fct_reorder(state_postal, as.numeric(state_name)) - ) -``` - -## Check derived variables for correct coding - -``` r -recs %>% count(Region, REGIONC) -``` - - ## # A tibble: 4 × 3 - ## Region REGIONC n - ## - ## 1 Northeast NORTHEAST 3657 - ## 2 Midwest MIDWEST 3832 - ## 3 South SOUTH 6426 - ## 4 West WEST 4581 - -``` r -recs %>% count(Division, DIVISION) -``` - - ## # A tibble: 10 × 3 - ## Division DIVISION n - ## - ## 1 New England New England 1680 - ## 2 Middle Atlantic Middle Atlantic 1977 - ## 3 East North Central East North Central 2014 - ## 4 West North Central West North Central 1818 - ## 5 South Atlantic South Atlantic 3256 - ## 6 East South Central East South Central 1343 - ## 7 West South Central West South Central 1827 - ## 8 Mountain North Mountain North 1180 - ## 9 Mountain South Mountain South 904 - ## 10 Pacific Pacific 2497 - -``` r -recs %>% count(Urbanicity, UATYP10) -``` - - ## # A tibble: 3 × 3 - ## Urbanicity UATYP10 n - ## - ## 1 Urban Area U 12395 - ## 2 Urban Cluster C 2020 - ## 3 Rural R 4081 - -``` r -recs %>% count(HousingUnitType, TYPEHUQ) -``` - - ## # A tibble: 5 × 3 - ## HousingUnitType TYPEHUQ n - ## - ## 1 Mobile home 1 974 - ## 2 Single-family detached 2 12319 - ## 3 Single-family attached 3 1751 - ## 4 Apartment: 2-4 Units 4 1013 - ## 5 Apartment: 5 or more units 5 2439 - -``` r -recs %>% count(YearMade, YEARMADERANGE) -``` - - ## # A tibble: 9 × 3 - ## YearMade YEARMADERANGE n - ## - ## 1 Before 1950 1 2721 - ## 2 1950-1959 2 1685 - ## 3 1960-1969 3 1867 - ## 4 1970-1979 4 2817 - ## 5 1980-1989 5 2435 - ## 6 1990-1999 6 2451 - ## 7 2000-2009 7 2748 - ## 8 2010-2015 8 989 - ## 9 2016-2020 9 783 - -``` r -recs %>% count(SpaceHeatingUsed, HEATHOME) -``` - - ## # A tibble: 2 × 3 - ## SpaceHeatingUsed HEATHOME n - ## - ## 1 FALSE 0 751 - ## 2 TRUE 1 17745 - -``` r -recs %>% count(HeatingBehavior, HEATCNTL) -``` - - ## # A tibble: 7 × 3 - ## HeatingBehavior HEATCNTL n - ## - ## 1 Set one temp and leave it 1 7806 - ## 2 Manually adjust at night/no one home 2 4654 - ## 3 Programmable or smart thermostat automatically adjusts the temperature 3 3310 - ## 4 Turn on or off as needed 4 1491 - ## 5 No control 5 438 - ## 6 Other 99 46 - ## 7 -2 751 - -``` r -recs %>% count(ACUsed, AIRCOND) -``` - - ## # A tibble: 2 × 3 - ## ACUsed AIRCOND n - ## - ## 1 FALSE 0 2325 - ## 2 TRUE 1 16171 - -``` r -recs %>% count(ACBehavior, COOLCNTL) -``` - - ## # A tibble: 7 × 3 - ## ACBehavior COOLCNTL n - ## - ## 1 Set one temp and leave it 1 6738 - ## 2 Manually adjust at night/no one home 2 3637 - ## 3 Programmable or smart thermostat automatically adjusts the temperature 3 2638 - ## 4 Turn on or off as needed 4 2746 - ## 5 No control 5 409 - ## 6 Other 99 3 - ## 7 -2 2325 - -``` r -recs %>% count(ClimateRegion_BA, BA_climate) -``` - - ## # A tibble: 8 × 3 - ## ClimateRegion_BA BA_climate n - ## - ## 1 Mixed-Dry Mixed-Dry 142 - ## 2 Mixed-Humid Mixed-Humid 5579 - ## 3 Hot-Humid Hot-Humid 2545 - ## 4 Hot-Dry Hot-Dry 1577 - ## 5 Very-Cold Very-Cold 572 - ## 6 Cold Cold 7116 - ## 7 Marine Marine 911 - ## 8 Subarctic Subarctic 54 - -``` r -recs %>% count(state_postal, state_name, STATE_FIPS) %>% print(n=51) -``` - - ## # A tibble: 51 × 4 - ## state_postal state_name STATE_FIPS n - ## - ## 1 AL Alabama 01 242 - ## 2 AK Alaska 02 311 - ## 3 AZ Arizona 04 495 - ## 4 AR Arkansas 05 268 - ## 5 CA California 06 1152 - ## 6 CO Colorado 08 360 - ## 7 CT Connecticut 09 294 - ## 8 DE Delaware 10 143 - ## 9 DC District of Columbia 11 221 - ## 10 FL Florida 12 655 - ## 11 GA Georgia 13 417 - ## 12 HI Hawaii 15 282 - ## 13 ID Idaho 16 270 - ## 14 IL Illinois 17 530 - ## 15 IN Indiana 18 400 - ## 16 IA Iowa 19 286 - ## 17 KS Kansas 20 208 - ## 18 KY Kentucky 21 428 - ## 19 LA Louisiana 22 311 - ## 20 ME Maine 23 223 - ## 21 MD Maryland 24 359 - ## 22 MA Massachusetts 25 552 - ## 23 MI Michigan 26 388 - ## 24 MN Minnesota 27 325 - ## 25 MS Mississippi 28 168 - ## 26 MO Missouri 29 296 - ## 27 MT Montana 30 172 - ## 28 NE Nebraska 31 189 - ## 29 NV Nevada 32 231 - ## 30 NH New Hampshire 33 175 - ## 31 NJ New Jersey 34 456 - ## 32 NM New Mexico 35 178 - ## 33 NY New York 36 904 - ## 34 NC North Carolina 37 479 - ## 35 ND North Dakota 38 331 - ## 36 OH Ohio 39 339 - ## 37 OK Oklahoma 40 232 - ## 38 OR Oregon 41 313 - ## 39 PA Pennsylvania 42 617 - ## 40 RI Rhode Island 44 191 - ## 41 SC South Carolina 45 334 - ## 42 SD South Dakota 46 183 - ## 43 TN Tennessee 47 505 - ## 44 TX Texas 48 1016 - ## 45 UT Utah 49 188 - ## 46 VT Vermont 50 245 - ## 47 VA Virginia 51 451 - ## 48 WA Washington 53 439 - ## 49 WV West Virginia 54 197 - ## 50 WI Wisconsin 55 357 - ## 51 WY Wyoming 56 190 - -## Save data - -``` r -recs_out <- recs %>% - select(DOEID, starts_with("NWEIGHT"), - REGIONC, Region, Division, starts_with("state"), Urbanicity, - HousingUnitType, YearMade, SpaceHeatingUsed, HeatingBehavior, - WinterTempDay, WinterTempAway, WinterTempNight, ACUsed, - ACBehavior, SummerTempDay, SummerTempAway, SummerTempNight, - TOTCSQFT, TOTHSQFT, TOTSQFT_EN, - CDD30YR, CDD65, ClimateRegion_BA, - HDD30YR, HDD65, BTUEL, - DOLLAREL, BTUNG, DOLLARNG, BTULP, DOLLARLP, BTUFO, DOLLARFO, - TOTALBTU, TOTALDOL, BTUWOOD) - - - - -source(here::here("helper-fun", "helper-function.R")) - -recs_2015 <- read_osf("recs_2015.rds") - -setdiff(names(recs_out), names(recs_2015)) #variables in 2020 and not 2015 -``` - - ## [1] "NWEIGHT1" "NWEIGHT2" "NWEIGHT3" "NWEIGHT4" "NWEIGHT5" "NWEIGHT6" "NWEIGHT7" "NWEIGHT8" "NWEIGHT9" "NWEIGHT10" - ## [11] "NWEIGHT11" "NWEIGHT12" "NWEIGHT13" "NWEIGHT14" "NWEIGHT15" "NWEIGHT16" "NWEIGHT17" "NWEIGHT18" "NWEIGHT19" "NWEIGHT20" - ## [21] "NWEIGHT21" "NWEIGHT22" "NWEIGHT23" "NWEIGHT24" "NWEIGHT25" "NWEIGHT26" "NWEIGHT27" "NWEIGHT28" "NWEIGHT29" "NWEIGHT30" - ## [31] "NWEIGHT31" "NWEIGHT32" "NWEIGHT33" "NWEIGHT34" "NWEIGHT35" "NWEIGHT36" "NWEIGHT37" "NWEIGHT38" "NWEIGHT39" "NWEIGHT40" - ## [41] "NWEIGHT41" "NWEIGHT42" "NWEIGHT43" "NWEIGHT44" "NWEIGHT45" "NWEIGHT46" "NWEIGHT47" "NWEIGHT48" "NWEIGHT49" "NWEIGHT50" - ## [51] "NWEIGHT51" "NWEIGHT52" "NWEIGHT53" "NWEIGHT54" "NWEIGHT55" "NWEIGHT56" "NWEIGHT57" "NWEIGHT58" "NWEIGHT59" "NWEIGHT60" - ## [61] "STATE_FIPS" "state_postal" "state_name" - -``` r -setdiff(names(recs_2015), names(recs_out)) #variables in 2015 and not 2020 -``` - - ## [1] "MSAStatus" "TOTUCSQFT" "TOTUSQFT" "BRRWT1" "BRRWT2" "BRRWT3" "BRRWT4" - ## [8] "BRRWT5" "BRRWT6" "BRRWT7" "BRRWT8" "BRRWT9" "BRRWT10" "BRRWT11" - ## [15] "BRRWT12" "BRRWT13" "BRRWT14" "BRRWT15" "BRRWT16" "BRRWT17" "BRRWT18" - ## [22] "BRRWT19" "BRRWT20" "BRRWT21" "BRRWT22" "BRRWT23" "BRRWT24" "BRRWT25" - ## [29] "BRRWT26" "BRRWT27" "BRRWT28" "BRRWT29" "BRRWT30" "BRRWT31" "BRRWT32" - ## [36] "BRRWT33" "BRRWT34" "BRRWT35" "BRRWT36" "BRRWT37" "BRRWT38" "BRRWT39" - ## [43] "BRRWT40" "BRRWT41" "BRRWT42" "BRRWT43" "BRRWT44" "BRRWT45" "BRRWT46" - ## [50] "BRRWT47" "BRRWT48" "BRRWT49" "BRRWT50" "BRRWT51" "BRRWT52" "BRRWT53" - ## [57] "BRRWT54" "BRRWT55" "BRRWT56" "BRRWT57" "BRRWT58" "BRRWT59" "BRRWT60" - ## [64] "BRRWT61" "BRRWT62" "BRRWT63" "BRRWT64" "BRRWT65" "BRRWT66" "BRRWT67" - ## [71] "BRRWT68" "BRRWT69" "BRRWT70" "BRRWT71" "BRRWT72" "BRRWT73" "BRRWT74" - ## [78] "BRRWT75" "BRRWT76" "BRRWT77" "BRRWT78" "BRRWT79" "BRRWT80" "BRRWT81" - ## [85] "BRRWT82" "BRRWT83" "BRRWT84" "BRRWT85" "BRRWT86" "BRRWT87" "BRRWT88" - ## [92] "BRRWT89" "BRRWT90" "BRRWT91" "BRRWT92" "BRRWT93" "BRRWT94" "BRRWT95" - ## [99] "BRRWT96" "CDD80" "ClimateRegion_IECC" "HDD50" "GNDHDD65" "BTUPELLET" - -``` r -for (var in colnames(recs_out)) { - attr(recs_out[[deparse(as.name(var))]], "format.sas") <- NULL -} - -cb_in <- readxl::read_xlsx(here::here("DataCleaningScripts", "RECS 2020 Codebook Questions.xlsx"), skip=1) - -cb_ord <- cb_in %>% - mutate( - Order=row_number(), - Section=if_else(Section=="End-use Model", "CONSUMPTION AND EXPENDITURE", Section), - Section=fct_reorder(Section, Order, min)) - -cb_slim <- cb_ord %>% select(Variable=BookDerived, `Description and Labels`, Question, Section, Order) %>% - filter(!is.na(Variable)) %>% - bind_rows(select(cb_ord, Variable, `Description and Labels`, Question, Section, Order)) %>% - arrange(Section, Order) - -names(recs_out)[!(names(recs_out) %in% pull(cb_slim, Variable))] -``` - - ## character(0) - -``` r -cb_vars <- cb_slim %>% - filter(Variable %in% c(names(recs_out))) - -nrow(cb_vars) -``` - - ## [1] 100 - -``` r -ncol(recs_out) -``` - - ## [1] 100 - -``` r -recs_ord <- recs_out %>% select(all_of(pull(cb_vars, Variable))) - -for (var in pull(cb_vars, Variable)) { - vi <- cb_vars %>% filter(Variable==var) - attr(recs_ord[[deparse(as.name(var))]], "label") <- pull(vi, `Description and Labels`) - attr(recs_ord[[deparse(as.name(var))]], "Section") <- pull(vi, Section) %>% as.character() - if (!is.na(pull(vi, Question))) attr(recs_ord[[deparse(as.name(var))]], "Question") <- pull(vi, Question) -} -``` - -``` r -summary(recs_ord) -``` - - ## DOEID ClimateRegion_BA Urbanicity Region REGIONC Division STATE_FIPS - ## Min. :100001 Cold :7116 Urban Area :12395 Northeast:3657 Length:18496 South Atlantic :3256 Length:18496 - ## 1st Qu.:104625 Mixed-Humid:5579 Urban Cluster: 2020 Midwest :3832 Class :character Pacific :2497 Class :character - ## Median :109249 Hot-Humid :2545 Rural : 4081 South :6426 Mode :character East North Central:2014 Mode :character - ## Mean :109249 Hot-Dry :1577 West :4581 Middle Atlantic :1977 - ## 3rd Qu.:113872 Marine : 911 West South Central:1827 - ## Max. :118496 Very-Cold : 572 West North Central:1818 - ## (Other) : 196 (Other) :5107 - ## state_postal state_name HDD65 CDD65 HDD30YR CDD30YR HousingUnitType YearMade - ## CA : 1152 California : 1152 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Mobile home : 974 1970-1979 :2817 - ## TX : 1016 Texas : 1016 1st Qu.: 2434 1st Qu.: 814 1st Qu.: 2898 1st Qu.: 601 Single-family detached :12319 2000-2009 :2748 - ## NY : 904 New York : 904 Median : 4396 Median :1179 Median : 4825 Median :1020 Single-family attached : 1751 Before 1950:2721 - ## FL : 655 Florida : 655 Mean : 4272 Mean :1526 Mean : 4679 Mean :1310 Apartment: 2-4 Units : 1013 1990-1999 :2451 - ## PA : 617 Pennsylvania : 617 3rd Qu.: 5810 3rd Qu.:1805 3rd Qu.: 6290 3rd Qu.:1703 Apartment: 5 or more units: 2439 1980-1989 :2435 - ## MA : 552 Massachusetts: 552 Max. :17383 Max. :5534 Max. :16071 Max. :4905 1960-1969 :1867 - ## (Other):13600 (Other) :13600 (Other) :3457 - ## TOTSQFT_EN TOTHSQFT TOTCSQFT SpaceHeatingUsed ACUsed - ## Min. : 200 Min. : 0 Min. : 0 Mode :logical Mode :logical - ## 1st Qu.: 1100 1st Qu.: 1000 1st Qu.: 460 FALSE:751 FALSE:2325 - ## Median : 1700 Median : 1520 Median : 1200 TRUE :17745 TRUE :16171 - ## Mean : 1960 Mean : 1744 Mean : 1394 - ## 3rd Qu.: 2510 3rd Qu.: 2300 3rd Qu.: 2000 - ## Max. :15000 Max. :15000 Max. :14600 - ## - ## HeatingBehavior WinterTempDay WinterTempAway WinterTempNight - ## Set one temp and leave it :7806 Min. :50.00 Min. :50.00 Min. :50.00 - ## Manually adjust at night/no one home :4654 1st Qu.:68.00 1st Qu.:65.00 1st Qu.:65.00 - ## Programmable or smart thermostat automatically adjusts the temperature:3310 Median :70.00 Median :68.00 Median :68.00 - ## Turn on or off as needed :1491 Mean :69.77 Mean :67.45 Mean :68.01 - ## No control : 438 3rd Qu.:72.00 3rd Qu.:70.00 3rd Qu.:70.00 - ## Other : 46 Max. :90.00 Max. :90.00 Max. :90.00 - ## NA : 751 NA's :751 NA's :751 NA's :751 - ## ACBehavior SummerTempDay SummerTempAway SummerTempNight NWEIGHT - ## Set one temp and leave it :6738 Min. :50.00 Min. :50.00 Min. :50.00 Min. : 437.9 - ## Manually adjust at night/no one home :3637 1st Qu.:70.00 1st Qu.:70.00 1st Qu.:68.00 1st Qu.: 4018.7 - ## Programmable or smart thermostat automatically adjusts the temperature:2638 Median :72.00 Median :74.00 Median :72.00 Median : 6119.4 - ## Turn on or off as needed :2746 Mean :72.01 Mean :73.45 Mean :71.22 Mean : 6678.7 - ## No control : 409 3rd Qu.:75.00 3rd Qu.:78.00 3rd Qu.:74.00 3rd Qu.: 8890.0 - ## Other : 3 Max. :90.00 Max. :90.00 Max. :90.00 Max. :29279.1 - ## NA :2325 NA's :2325 NA's :2325 NA's :2325 - ## NWEIGHT1 NWEIGHT2 NWEIGHT3 NWEIGHT4 NWEIGHT5 NWEIGHT6 NWEIGHT7 NWEIGHT8 NWEIGHT9 - ## Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 - ## 1st Qu.: 3950 1st Qu.: 3951 1st Qu.: 3954 1st Qu.: 3953 1st Qu.: 3957 1st Qu.: 3966 1st Qu.: 3944 1st Qu.: 3956 1st Qu.: 3947 - ## Median : 6136 Median : 6151 Median : 6151 Median : 6153 Median : 6134 Median : 6147 Median : 6135 Median : 6151 Median : 6139 - ## Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 - ## 3rd Qu.: 8976 3rd Qu.: 8979 3rd Qu.: 8994 3rd Qu.: 8998 3rd Qu.: 8987 3rd Qu.: 8984 3rd Qu.: 8998 3rd Qu.: 8988 3rd Qu.: 8974 - ## Max. :30015 Max. :29422 Max. :29431 Max. :29494 Max. :30039 Max. :29419 Max. :29586 Max. :29499 Max. :29845 - ## - ## NWEIGHT10 NWEIGHT11 NWEIGHT12 NWEIGHT13 NWEIGHT14 NWEIGHT15 NWEIGHT16 NWEIGHT17 NWEIGHT18 - ## Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 - ## 1st Qu.: 3961 1st Qu.: 3950 1st Qu.: 3947 1st Qu.: 3967 1st Qu.: 3962 1st Qu.: 3958 1st Qu.: 3958 1st Qu.: 3958 1st Qu.: 3937 - ## Median : 6163 Median : 6140 Median : 6160 Median : 6142 Median : 6154 Median : 6145 Median : 6133 Median : 6126 Median : 6155 - ## Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 - ## 3rd Qu.: 8994 3rd Qu.: 8991 3rd Qu.: 8988 3rd Qu.: 8977 3rd Qu.: 8981 3rd Qu.: 8997 3rd Qu.: 8979 3rd Qu.: 8977 3rd Qu.: 8993 - ## Max. :29635 Max. :29681 Max. :29849 Max. :29843 Max. :30184 Max. :29970 Max. :29825 Max. :30606 Max. :29689 - ## - ## NWEIGHT19 NWEIGHT20 NWEIGHT21 NWEIGHT22 NWEIGHT23 NWEIGHT24 NWEIGHT25 NWEIGHT26 NWEIGHT27 - ## Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 - ## 1st Qu.: 3947 1st Qu.: 3943 1st Qu.: 3960 1st Qu.: 3964 1st Qu.: 3943 1st Qu.: 3946 1st Qu.: 3952 1st Qu.: 3966 1st Qu.: 3942 - ## Median : 6153 Median : 6139 Median : 6135 Median : 6149 Median : 6148 Median : 6136 Median : 6150 Median : 6136 Median : 6125 - ## Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 - ## 3rd Qu.: 8979 3rd Qu.: 8992 3rd Qu.: 8956 3rd Qu.: 8988 3rd Qu.: 8980 3rd Qu.: 8978 3rd Qu.: 8972 3rd Qu.: 8980 3rd Qu.: 8996 - ## Max. :29336 Max. :30274 Max. :29766 Max. :29791 Max. :30126 Max. :29946 Max. :30445 Max. :29893 Max. :30030 - ## - ## NWEIGHT28 NWEIGHT29 NWEIGHT30 NWEIGHT31 NWEIGHT32 NWEIGHT33 NWEIGHT34 NWEIGHT35 NWEIGHT36 - ## Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 - ## 1st Qu.: 3956 1st Qu.: 3970 1st Qu.: 3956 1st Qu.: 3944 1st Qu.: 3954 1st Qu.: 3964 1st Qu.: 3950 1st Qu.: 3967 1st Qu.: 3948 - ## Median : 6149 Median : 6146 Median : 6149 Median : 6144 Median : 6159 Median : 6148 Median : 6139 Median : 6141 Median : 6149 - ## Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 - ## 3rd Qu.: 8989 3rd Qu.: 8979 3rd Qu.: 8991 3rd Qu.: 8994 3rd Qu.: 8982 3rd Qu.: 8993 3rd Qu.: 8985 3rd Qu.: 8990 3rd Qu.: 8979 - ## Max. :29599 Max. :30136 Max. :29895 Max. :29604 Max. :29310 Max. :29408 Max. :29564 Max. :30437 Max. :27896 - ## - ## NWEIGHT37 NWEIGHT38 NWEIGHT39 NWEIGHT40 NWEIGHT41 NWEIGHT42 NWEIGHT43 NWEIGHT44 NWEIGHT45 - ## Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 - ## 1st Qu.: 3955 1st Qu.: 3954 1st Qu.: 3940 1st Qu.: 3959 1st Qu.: 3975 1st Qu.: 3949 1st Qu.: 3947 1st Qu.: 3956 1st Qu.: 3952 - ## Median : 6133 Median : 6139 Median : 6147 Median : 6144 Median : 6153 Median : 6137 Median : 6157 Median : 6148 Median : 6149 - ## Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 - ## 3rd Qu.: 8975 3rd Qu.: 8974 3rd Qu.: 8991 3rd Qu.: 8980 3rd Qu.: 8982 3rd Qu.: 8988 3rd Qu.: 9005 3rd Qu.: 8986 3rd Qu.: 8992 - ## Max. :30596 Max. :30130 Max. :29262 Max. :30344 Max. :29594 Max. :29938 Max. :29878 Max. :29896 Max. :29729 - ## - ## NWEIGHT46 NWEIGHT47 NWEIGHT48 NWEIGHT49 NWEIGHT50 NWEIGHT51 NWEIGHT52 NWEIGHT53 NWEIGHT54 - ## Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 - ## 1st Qu.: 3966 1st Qu.: 3938 1st Qu.: 3953 1st Qu.: 3947 1st Qu.: 3948 1st Qu.: 3958 1st Qu.: 3938 1st Qu.: 3959 1st Qu.: 3954 - ## Median : 6152 Median : 6150 Median : 6139 Median : 6146 Median : 6159 Median : 6150 Median : 6154 Median : 6156 Median : 6151 - ## Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 - ## 3rd Qu.: 8959 3rd Qu.: 8991 3rd Qu.: 8991 3rd Qu.: 8990 3rd Qu.: 8995 3rd Qu.: 8992 3rd Qu.: 9012 3rd Qu.: 8979 3rd Qu.: 8973 - ## Max. :29103 Max. :30070 Max. :29343 Max. :29590 Max. :30027 Max. :29247 Max. :29445 Max. :30131 Max. :29439 - ## - ## NWEIGHT55 NWEIGHT56 NWEIGHT57 NWEIGHT58 NWEIGHT59 NWEIGHT60 BTUEL DOLLAREL BTUNG - ## Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 143.3 Min. : -889.5 Min. : 0 - ## 1st Qu.: 3945 1st Qu.: 3957 1st Qu.: 3942 1st Qu.: 3962 1st Qu.: 3965 1st Qu.: 3953 1st Qu.: 20205.8 1st Qu.: 836.5 1st Qu.: 0 - ## Median : 6143 Median : 6153 Median : 6138 Median : 6137 Median : 6144 Median : 6140 Median : 31890.0 Median : 1257.9 Median : 22012 - ## Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 6679 Mean : 37016.2 Mean : 1424.8 Mean : 36961 - ## 3rd Qu.: 8977 3rd Qu.: 8995 3rd Qu.: 9004 3rd Qu.: 8986 3rd Qu.: 8977 3rd Qu.: 8983 3rd Qu.: 48298.0 3rd Qu.: 1819.0 3rd Qu.: 62714 - ## Max. :29216 Max. :29203 Max. :29819 Max. :29818 Max. :29606 Max. :29818 Max. :628155.5 Max. :15680.2 Max. :1134709 - ## - ## DOLLARNG BTULP DOLLARLP BTUFO DOLLARFO BTUWOOD TOTALBTU TOTALDOL - ## Min. : 0.0 Min. : 0 Min. : 0.00 Min. : 0 Min. : 0.00 Min. : 0 Min. : 1182 Min. : -150.5 - ## 1st Qu.: 0.0 1st Qu.: 0 1st Qu.: 0.00 1st Qu.: 0 1st Qu.: 0.00 1st Qu.: 0 1st Qu.: 45565 1st Qu.: 1258.3 - ## Median : 313.9 Median : 0 Median : 0.00 Median : 0 Median : 0.00 Median : 0 Median : 74180 Median : 1793.2 - ## Mean : 396.0 Mean : 3917 Mean : 80.89 Mean : 5109 Mean : 88.43 Mean : 3596 Mean : 83002 Mean : 1990.2 - ## 3rd Qu.: 644.9 3rd Qu.: 0 3rd Qu.: 0.00 3rd Qu.: 0 3rd Qu.: 0.00 3rd Qu.: 0 3rd Qu.: 108535 3rd Qu.: 2472.0 - ## Max. :8155.0 Max. :364215 Max. :6621.44 Max. :426269 Max. :7003.69 Max. :500000 Max. :1367548 Max. :20043.4 - ## - -``` r -str(recs_ord) -``` - - ## tibble [18,496 × 100] (S3: tbl_df/tbl/data.frame) - ## $ DOEID : num [1:18496] 1e+05 1e+05 1e+05 1e+05 1e+05 ... - ## ..- attr(*, "label")= chr "Unique identifier for each respondent" - ## ..- attr(*, "Section")= chr "ADMIN" - ## $ ClimateRegion_BA: Factor w/ 8 levels "Mixed-Dry","Mixed-Humid",..: 1 2 1 2 2 3 2 2 2 4 ... - ## ..- attr(*, "label")= chr "Building America Climate Zone" - ## ..- attr(*, "Section")= chr "ADMIN" - ## $ Urbanicity : Factor w/ 3 levels "Urban Area","Urban Cluster",..: 1 1 1 1 1 1 1 2 1 1 ... - ## ..- attr(*, "label")= chr "2010 Census Urban Type Code" - ## ..- attr(*, "Section")= chr "ADMIN" - ## $ Region : Factor w/ 4 levels "Northeast","Midwest",..: 4 3 4 3 1 3 3 3 3 4 ... - ## ..- attr(*, "label")= chr "Census Region" - ## ..- attr(*, "Section")= chr "GEOGRAPHY" - ## $ REGIONC : chr [1:18496] "WEST" "SOUTH" "WEST" "SOUTH" ... - ## ..- attr(*, "label")= chr "Census Region" - ## ..- attr(*, "Section")= chr "GEOGRAPHY" - ## $ Division : Factor w/ 10 levels "New England",..: 9 7 9 5 2 7 7 6 5 9 ... - ## ..- attr(*, "label")= chr "Census Division, Mountain Division is divided into North and South for RECS purposes" - ## ..- attr(*, "Section")= chr "GEOGRAPHY" - ## $ STATE_FIPS : chr [1:18496] "35" "05" "35" "45" ... - ## ..- attr(*, "label")= chr "State Federal Information Processing System Code" - ## ..- attr(*, "Section")= chr "GEOGRAPHY" - ## $ state_postal : Factor w/ 51 levels "AL","AK","AZ",..: 32 4 32 41 31 44 37 25 9 3 ... - ## ..- attr(*, "label")= chr "State Postal Code" - ## ..- attr(*, "Section")= chr "GEOGRAPHY" - ## $ state_name : Factor w/ 51 levels "Alabama","Alaska",..: 32 4 32 41 31 44 37 25 9 3 ... - ## ..- attr(*, "label")= chr "State Name" - ## ..- attr(*, "Section")= chr "GEOGRAPHY" - ## $ HDD65 : num [1:18496] 3844 3766 3819 2614 4219 ... - ## ..- attr(*, "label")= chr "Heating degree days in 2020, base temperature 65F; Derived from the weighted temperatures of nearby weather stations" - ## ..- attr(*, "Section")= chr "WEATHER" - ## $ CDD65 : num [1:18496] 1679 1458 1696 1718 1363 ... - ## ..- attr(*, "label")= chr "Cooling degree days in 2020, base temperature 65F; Derived from the weighted temperatures of nearby weather stations" - ## ..- attr(*, "Section")= chr "WEATHER" - ## $ HDD30YR : num [1:18496] 4451 4429 4500 3229 4896 ... - ## ..- attr(*, "label")= chr "Heating degree days, 30-year average 1981-2010, base temperature 65F; Taken from nearest weather station, inocu"| __truncated__ - ## ..- attr(*, "Section")= chr "WEATHER" - ## $ CDD30YR : num [1:18496] 1027 1305 1010 1653 1059 ... - ## ..- attr(*, "label")= chr "Cooling degree days, 30-year average 1981-2010, base temperature 65F; Taken from nearest weather station, inocu"| __truncated__ - ## ..- attr(*, "Section")= chr "WEATHER" - ## $ HousingUnitType : Factor w/ 5 levels "Mobile home",..: 2 5 5 2 5 2 2 5 5 5 ... - ## ..- attr(*, "label")= chr "Type of housing unit" - ## ..- attr(*, "Section")= chr "YOUR HOME" - ## ..- attr(*, "Question")= chr "Which best describes your home?" - ## $ YearMade : Ord.factor w/ 9 levels "Before 1950"<..: 4 5 3 5 3 6 2 7 7 5 ... - ## ..- attr(*, "label")= chr "Range when housing unit was built" - ## ..- attr(*, "Section")= chr "YOUR HOME" - ## ..- attr(*, "Question")= chr "Derived from: In what year was your home built? AND Although you do not know the exact year your home was built"| __truncated__ - ## $ TOTSQFT_EN : num [1:18496] 2100 590 900 2100 800 4520 2100 900 750 760 ... - ## ..- attr(*, "label")= chr "Total energy-consuming area (square footage) of the housing unit. Includes all main living areas; all basements"| __truncated__ - ## ..- attr(*, "Section")= chr "YOUR HOME" - ## $ TOTHSQFT : num [1:18496] 2100 590 900 2100 800 3010 1200 900 750 760 ... - ## ..- attr(*, "label")= chr "Square footage of the housing unit that is heated by space heating equipment. A derived variable rounded to the nearest 10" - ## ..- attr(*, "Section")= chr "YOUR HOME" - ## $ TOTCSQFT : num [1:18496] 2100 590 900 2100 800 3010 1200 0 500 760 ... - ## ..- attr(*, "label")= chr "Square footage of the housing unit that is cooled by air-conditioning equipment or evaporative cooler, a derive"| __truncated__ - ## ..- attr(*, "Section")= chr "YOUR HOME" - ## $ SpaceHeatingUsed: logi [1:18496] TRUE TRUE TRUE TRUE TRUE TRUE ... - ## ..- attr(*, "label")= chr "Space heating equipment used" - ## ..- attr(*, "Section")= chr "SPACE HEATING" - ## ..- attr(*, "Question")= chr "Is your home heated during the winter?" - ## $ ACUsed : logi [1:18496] TRUE TRUE TRUE TRUE TRUE TRUE ... - ## ..- attr(*, "label")= chr "Air conditioning equipment used" - ## ..- attr(*, "Section")= chr "AIR CONDITIONING" - ## ..- attr(*, "Question")= chr "Is any air conditioning equipment used in your home?" - ## $ HeatingBehavior : Factor w/ 7 levels "Set one temp and leave it",..: 1 4 1 1 1 3 2 2 1 2 ... - ## ..- attr(*, "label")= chr "Winter temperature control method" - ## ..- attr(*, "Section")= chr "THERMOSTAT" - ## ..- attr(*, "Question")= chr "Which of the following best describes how your household controls the indoor temperature during the winter?" - ## $ WinterTempDay : num [1:18496] 70 70 69 68 68 76 74 70 68 70 ... - ## ..- attr(*, "label")= chr "Winter thermostat setting or temperature in home when someone is home during the day" - ## ..- attr(*, "Section")= chr "THERMOSTAT" - ## ..- attr(*, "Question")= chr "During the winter, what is your home’s typical indoor temperature when someone is home during the day?" - ## $ WinterTempAway : num [1:18496] 70 65 68 68 68 76 65 70 60 70 ... - ## ..- attr(*, "label")= chr "Winter thermostat setting or temperature in home when no one is home during the day" - ## ..- attr(*, "Section")= chr "THERMOSTAT" - ## ..- attr(*, "Question")= chr "During the winter, what is your home’s typical indoor temperature when no one is inside your home during the day?" - ## $ WinterTempNight : num [1:18496] 68 65 67 68 68 68 74 68 62 68 ... - ## ..- attr(*, "label")= chr "Winter thermostat setting or temperature in home at night" - ## ..- attr(*, "Section")= chr "THERMOSTAT" - ## ..- attr(*, "Question")= chr "During the winter, what is your home’s typical indoor temperature inside your home at night?" - ## $ ACBehavior : Factor w/ 7 levels "Set one temp and leave it",..: 1 4 1 1 2 3 2 7 2 2 ... - ## ..- attr(*, "label")= chr "Summer temperature control method" - ## ..- attr(*, "Section")= chr "THERMOSTAT" - ## ..- attr(*, "Question")= chr "Which of the following best describes how your household controls the indoor temperature during the summer?" - ## $ SummerTempDay : num [1:18496] 71 68 70 72 72 69 68 NA 72 74 ... - ## ..- attr(*, "label")= chr "Summer thermostat setting or temperature in home when someone is home during the day" - ## ..- attr(*, "Section")= chr "THERMOSTAT" - ## ..- attr(*, "Question")= chr "During the summer, what is your home’s typical indoor temperature when someone is home during the day?" - ## $ SummerTempAway : num [1:18496] 71 68 68 72 72 74 70 NA 76 74 ... - ## ..- attr(*, "label")= chr "Summer thermostat setting or temperature in home when no one is home during the day" - ## ..- attr(*, "Section")= chr "THERMOSTAT" - ## ..- attr(*, "Question")= chr "During the summer, what is your home’s typical indoor temperature when no one is inside your home during the day?" - ## $ SummerTempNight : num [1:18496] 71 68 68 72 72 68 70 NA 68 72 ... - ## ..- attr(*, "label")= chr "Summer thermostat setting or temperature in home at night" - ## ..- attr(*, "Section")= chr "THERMOSTAT" - ## ..- attr(*, "Question")= chr "During the summer, what is your home’s typical indoor temperature inside your home at night?" - ## $ NWEIGHT : num [1:18496] 3284 9007 5669 5294 9935 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT1 : num [1:18496] 3273 9020 5793 5361 10048 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 1" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT2 : num [1:18496] 3349 9081 5914 5362 10262 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 2" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT3 : num [1:18496] 3345 9020 5763 5371 10037 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 3" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT4 : num [1:18496] 3437 9213 5870 5393 9961 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 4" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT5 : num [1:18496] 3416 9117 5721 5328 10108 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 5" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT6 : num [1:18496] 3355 9179 5663 5354 10298 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 6" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT7 : num [1:18496] 3372 9096 5700 5325 10065 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 7" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT8 : num [1:18496] 3364 8920 5704 5376 10097 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 8" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT9 : num [1:18496] 3362 9189 5668 5391 10321 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 9" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT10 : num [1:18496] 3302 9060 5793 5501 9944 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 10" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT11 : num [1:18496] 3211 9127 5806 5427 10267 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 11" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT12 : num [1:18496] 3500 9264 5650 5384 10127 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 12" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT13 : num [1:18496] 3314 9222 5648 5302 10241 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 13" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT14 : num [1:18496] 3359 9199 5829 5362 9872 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 14" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT15 : num [1:18496] 3424 9143 5642 5383 10275 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 15" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT16 : num [1:18496] 3384 9042 5718 5381 9921 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 16" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT17 : num [1:18496] 3312 9417 5969 5418 10312 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 17" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT18 : num [1:18496] 3324 9163 5828 5356 10004 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 18" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT19 : num [1:18496] 3367 9192 5814 5343 10437 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 19" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT20 : num [1:18496] 3327 9092 5697 5360 10101 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 20" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT21 : num [1:18496] 3340 0 5687 5336 9982 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 21" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT22 : num [1:18496] 3292 9098 5739 5390 10000 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 22" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT23 : num [1:18496] 3278 9320 5945 5397 10180 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 23" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT24 : num [1:18496] 3340 9081 5820 5448 9826 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 24" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT25 : num [1:18496] 3386 9406 5823 5382 10149 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 25" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT26 : num [1:18496] 3301 9256 5650 5387 0 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 26" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT27 : num [1:18496] 3312 9318 5862 5351 10141 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 27" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT28 : num [1:18496] 3348 9154 5707 5371 9948 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 28" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT29 : num [1:18496] 3356 9372 5619 5362 10065 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 29" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT30 : num [1:18496] 3322 9137 5796 5381 10083 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 30" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT31 : num [1:18496] 3256 9233 5995 5320 10133 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 31" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT32 : num [1:18496] 3318 9115 0 5339 9978 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 32" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT33 : num [1:18496] 3402 9177 5638 0 10213 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 33" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT34 : num [1:18496] 3364 9191 5619 5380 9964 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 34" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT35 : num [1:18496] 3304 9100 5652 5363 10071 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 35" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT36 : num [1:18496] 3333 9072 5834 5477 9988 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 36" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT37 : num [1:18496] 3390 9263 5712 5386 10120 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 37" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT38 : num [1:18496] 3382 9078 5765 5326 10024 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 38" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT39 : num [1:18496] 3329 9011 5887 5421 10024 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 39" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT40 : num [1:18496] 3293 9166 5650 5370 10185 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 40" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT41 : num [1:18496] 3295 9091 5958 5339 10069 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 41" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT42 : num [1:18496] 3414 9194 5593 5329 9959 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 42" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT43 : num [1:18496] 3264 9215 6035 5409 10352 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 43" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT44 : num [1:18496] 3342 9048 5732 5416 10092 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 44" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT45 : num [1:18496] 3275 9259 5877 5453 10228 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 45" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT46 : num [1:18496] 3364 9171 5654 5449 10069 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 46" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT47 : num [1:18496] 3336 9260 5763 5376 9996 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 47" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT48 : num [1:18496] 3329 9105 5929 5408 10198 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 48" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT49 : num [1:18496] 3348 9117 5772 5400 10094 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 49" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT50 : num [1:18496] 3357 9261 5785 5359 10196 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 50" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT51 : num [1:18496] 3335 8955 5636 5448 10017 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 51" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT52 : num [1:18496] 3240 9000 5944 5344 9954 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 52" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT53 : num [1:18496] 3430 9290 5684 5438 10051 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 53" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT54 : num [1:18496] 3294 9199 5736 5378 10019 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 54" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT55 : num [1:18496] 3398 8959 5675 5357 10310 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 55" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT56 : num [1:18496] 3293 9233 5661 5421 10143 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 56" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT57 : num [1:18496] 0 9140 5917 5365 10177 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 57" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT58 : num [1:18496] 3370 9307 5571 5402 10043 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 58" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT59 : num [1:18496] 3358 9062 5887 5403 10248 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 59" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ NWEIGHT60 : num [1:18496] 3404 8958 5838 5351 10110 ... - ## ..- attr(*, "label")= chr "Final Analysis Weight for replicate 60" - ## ..- attr(*, "Section")= chr "WEIGHTS" - ## $ BTUEL : num [1:18496] 42723 17889 8147 31647 20027 ... - ## ..- attr(*, "label")= chr "Total electricity use, in thousand Btu, 2020, including self-generation of solar power" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## $ DOLLAREL : num [1:18496] 1955 713 335 1425 1087 ... - ## ..- attr(*, "label")= chr "Total electricity cost, in dollars, 2020" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## $ BTUNG : num [1:18496] 101924 10145 22603 55119 39100 ... - ## ..- attr(*, "label")= chr "Total natural gas use, in thousand Btu, 2020" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## $ DOLLARNG : num [1:18496] 702 262 188 637 376 ... - ## ..- attr(*, "label")= chr "Total natural gas cost, in dollars, 2020" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## $ BTULP : num [1:18496] 0 0 0 0 0 0 0 0 0 0 ... - ## ..- attr(*, "label")= chr "Total propane use, in thousand Btu, 2020" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## $ DOLLARLP : num [1:18496] 0 0 0 0 0 0 0 0 0 0 ... - ## ..- attr(*, "label")= chr "Total propane cost, in dollars, 2020" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## $ BTUFO : num [1:18496] 0 0 0 0 0 0 0 0 0 0 ... - ## ..- attr(*, "label")= chr "Total fuel oil/kerosene use, in thousand Btu, 2020" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## $ DOLLARFO : num [1:18496] 0 0 0 0 0 0 0 0 0 0 ... - ## ..- attr(*, "label")= chr "Total fuel oil/kerosene cost, in dollars, 2020" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## $ BTUWOOD : num [1:18496] 0 0 0 0 0 3000 0 0 0 0 ... - ## ..- attr(*, "label")= chr "Total wood use, in thousand Btu, 2020" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## $ TOTALBTU : num [1:18496] 144648 28035 30750 86765 59127 ... - ## ..- attr(*, "label")= chr "Total usage including electricity, natural gas, propane, and fuel oil, in thousand Btu, 2020" - ## ..- attr(*, "Section")= chr "CONSUMPTION AND EXPENDITURE" - ## [list output truncated] - -``` r -recs_der_tmp_loc <- here::here("osf_dl", "recs_2020.rds") -write_rds(recs_ord, recs_der_tmp_loc) -target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") -osf_upload(target_dir, path=recs_der_tmp_loc, conflicts="overwrite") -``` - - ## # A tibble: 1 × 3 - ## name id meta - ## - ## 1 recs_2020.rds 647d2d6bbf3d0f09b6d87a32 - -``` r -unlink(recs_der_tmp_loc) -``` diff --git a/helper-fun/helper-function.R b/helper-fun/helper-function.R deleted file mode 100644 index 1bf2f2d6..00000000 --- a/helper-fun/helper-function.R +++ /dev/null @@ -1,30 +0,0 @@ -read_osf <- function(filename){ - #' Downloads file from OSF project - #' Reads in file - #' Deletes file from computer - - osf_dl_del_later <- !dir.exists("osf_dl") - - if (osf_dl_del_later) { - osf_dl_del_later <- TRUE - dir.create("osf_dl") - } - - dat_det <- - osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") %>% - osf_ls_files() %>% - dplyr::filter(name == filename) %>% - osf_download(conflicts = "overwrite", path = "osf_dl") - - out <- dat_det %>% - dplyr::pull(local_path) %>% - readr::read_rds() - - if (osf_dl_del_later) { - unlink("osf_dl", recursive = TRUE) - } else{ - unlink(dplyr::pull(dat_det, local_path)) - } - - return(out) -} \ No newline at end of file diff --git a/renv.lock b/renv.lock index dc399ea1..4feb4a43 100644 --- a/renv.lock +++ b/renv.lock @@ -1844,6 +1844,21 @@ ], "Hash": "c77ebba142d814788bab0092bf102f6d" }, + "srvyr.data": { + "Package": "srvyr.data", + "Version": "0.1.0", + "Source": "GitHub", + "RemoteType": "github", + "RemoteHost": "api.github.com", + "RemoteUsername": "tidy-survey-r", + "RemoteRepo": "srvyr.data", + "RemoteRef": "main", + "RemoteSha": "1f84dfdd630dde7fecc1a26b44543cd45674d08d", + "Requirements": [ + "R" + ], + "Hash": "5a90f75ff1373bd3a1906d6712576c14" + }, "stringi": { "Package": "stringi", "Version": "1.8.3", @@ -1984,33 +1999,6 @@ ], "Hash": "a84e2cc86d07289b3b6f5069df7a004c" }, - "tidycensus": { - "Package": "tidycensus", - "Version": "1.4.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "crayon", - "dplyr", - "httr", - "jsonlite", - "purrr", - "rappdirs", - "readr", - "rlang", - "rvest", - "sf", - "stringr", - "tidyr", - "tidyselect", - "tigris", - "units", - "utils", - "xml2" - ], - "Hash": "dabd8f284f9b186872cce03640ef829a" - }, "tidylog": { "Package": "tidylog", "Version": "1.0.2", @@ -2103,25 +2091,6 @@ ], "Hash": "c328568cd14ea89a83bd4ca7f54ae07e" }, - "tigris": { - "Package": "tigris", - "Version": "2.0.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "dplyr", - "httr", - "magrittr", - "methods", - "rappdirs", - "sf", - "stringr", - "utils", - "uuid" - ], - "Hash": "6dd14cb88733b84d2b9af9fb8f64dbc5" - }, "timechange": { "Package": "timechange", "Version": "0.2.0",