One voice (#128)

* Making data plural * Standardize A/C format * Standardize cross-tab format * change final section header in chapter 9 to not be "summary" to match all other chapters. * Removing "you" language. * Adjusting tense "we will.." to just "we..." * Remove markdown comments * Changing from target population to population of interest. * Updates to ch1 from one voice review * Edits to ch02 from one-voice * Ch03 edits from one-voice review * Ch04 updates from one-voice * Fix broken reference link in ch04. * Ch05 edits from one-voice * Ch06 edits from one-voice * Ch07 edits from one-voice * Ch08 edits from one-voice * Ch09 edits from one-voice * Ch10 edits from one-voice * Ch11 edits from one-voice * Ch12 edits from one-voice * Ch13 edits from one-voice * Ch14 edits from one-voice * Appendix A edits from one-voice * Adding blank line, to add a comment. * Fixing reference type for Scott2007 to have author show up in bibliography. * Adding spaces at ends of lines to add comment. * Fixing typo in formula in ch7. * Adding space to end of line to add a comment. * Adding space at end of line to add comment. * Fix ref to C10 * SZ full book review (#129) * Change interaction example (#130) * IV one voice review --------- Co-authored-by: Stephanie Zimmer <[email protected]> Co-authored-by: Isabella Velasquez <[email protected]>
tidy-survey-r · Apr 24, 2024 · 13ceea6 · 13ceea6
1 parent f7dcc4c
commit 13ceea6
Show file tree

Hide file tree

Showing 20 changed files with 970 additions and 755 deletions.
diff --git a/01-introduction.Rmd b/01-introduction.Rmd
diff --git a/02-overview-surveys.Rmd b/02-overview-surveys.Rmd
diff --git a/03-survey-data-documentation.Rmd b/03-survey-data-documentation.Rmd
diff --git a/04-set-up.Rmd b/04-set-up.Rmd
diff --git a/05-descriptive-analysis.Rmd b/05-descriptive-analysis.Rmd
diff --git a/06-statistical-testing.Rmd b/06-statistical-testing.Rmd
diff --git a/07-modeling.Rmd b/07-modeling.Rmd
diff --git a/08-communicating-results.Rmd b/08-communicating-results.Rmd
diff --git a/09-reproducible-data.Rmd b/09-reproducible-data.Rmd
diff --git a/10-sample-designs-replicate-weights.Rmd b/10-sample-designs-replicate-weights.Rmd
diff --git a/11-missing-data.Rmd b/11-missing-data.Rmd
diff --git a/12-successful-survey-data-analysis.Rmd b/12-successful-survey-data-analysis.Rmd
diff --git a/13-ncvs-vignette.Rmd b/13-ncvs-vignette.Rmd
diff --git a/14-ambarom-vignette.Rmd b/14-ambarom-vignette.Rmd
diff --git a/89-Appendix-DataImport.Rmd b/89-Appendix-DataImport.Rmd
diff --git a/93-AppendixD.Rmd b/93-AppendixD.Rmd
@@ -6,7 +6,7 @@
 knitr::opts_chunk$set(tidy = 'styler')
 ```
 
-The chapter exercises use the survey design objects and packages provided in the Prerequisites box in the beginning of the chapter. Please ensure they are loaded in your environment before running the exercise solutions. Code chunks to load these are also included below.
+The chapter exercises use the survey design objects and packages provided in the Prerequisites box in the beginning of the chapter. Please ensure they are loaded in the environment before running the exercise solutions. Code chunks to load these are also included below.
 
 
 ```r
@@ -243,7 +243,7 @@ pers_des <- pers_vsum_slim %>%
     nest = TRUE
   )
 ```
-
+The chapter exercises use the survey design objects and packages provided in the Prerequisites box in the beginning of the chapter. Please ensure they are loaded in the environment before running the exercise solutions.
 
 ## 5 - Descriptive analysis {-}
 
@@ -420,7 +420,7 @@ quant_baenergyexp %>%
 
 ## 6 - Statistical testing {-}
 
-1. Using the RECS data, do more than 50% of U.S. households use AC (`ACUsed`)?
+1. Using the RECS data, do more than 50% of U.S. households use A/C (`ACUsed`)?
 
 ```{r}
 #| label: stattest-ex-solution1
@@ -472,9 +472,7 @@ ttest_solution3
 
 On average, those who voted for Joseph Biden in 2020 were `r ttest_solution3$estimate %>% round(1)` years younger than voters for other candidates and this is significantly different (p `r ttest_solution3$p.value %>% pretty_p_value()`).
 
-
-
-4. If you wanted to determine if the political party affiliation differed for males and females, what test would you use?
+4. If we wanted to determine if the political party affiliation differed for males and females, what test would we use?
 
   a. Goodness of fit test (`svygofchisq()`)
   b. Test of independence (`svychisq()`)
@@ -546,7 +544,7 @@ tidy(exp_unit_out)
 Answer: The reference level should be `r expense_by_hut %>% slice(1) %>% pull(HousingUnitType) %>% as.character()`. All p-values are very small indicating there is a significant relationship between housing unit type and total energy expenditure.
 
 
-2.  Does temperature play a role in electricity expenditure (`DOLLAREL`)? Cooling degree days are a measure of how hot a place is. CDD65 for a given day indicates the number of degrees Fahrenheit warmer than 65°F (18.3°C) it is in a location. On a day that averages 65°F and below, CDD65=0. While a day that averages 85°F (29.4°C) would have CDD65=20 because it is 20 degrees Fahrenheit warmer. For each day in the year, this is summed to give an indicator of how hot the place is throughout the year. Similarly, HDD65 indicates the days colder than 65°F^[<https://www.eia.gov/energyexplained/units-and-calculators/degree-days.php>]. Can energy expenditure be predicted using these temperature indicators along with square footage? Is there a significant relationship? Include main effects and two-way interactions.
+2.  Does temperature play a role in electricity expenditure? Cooling degree days are a measure of how hot a place is. CDD65 for a given day indicates the number of degrees Fahrenheit warmer than 65°F (18.3°C) it is in a location. On a day that averages 65°F and below, CDD65=0. While a day that averages 85°F (29.4°C) would have CDD65=20 because it is 20 degrees Fahrenheit warmer [@eia-cdd]. For each day in the year, this is summed to give an indicator of how hot the place is throughout the year. Similarly, HDD65 indicates the days colder than 65°F. Can energy expenditure be predicted using these temperature indicators along with square footage? Is there a significant relationship? Include main effects and two-way interactions.
 
 ```{r}
 #| label: model-ex-solution2
@@ -601,7 +599,7 @@ temps_sqft_exp_fit %>%
   theme_minimal()
 ```
 
-4.  Early voting expanded in 2020^[<https://www.npr.org/2020/10/26/927803214/62-million-and-counting-americans-are-breaking-early-voting-records>]. Build a logistic model predicting early voting in 2020 (`EarlyVote2020`) using age (`Age`), education (`Education`), and party identification (`PartyID`). Include two-way interactions.
+4.  Early voting expanded in 2020 [@npr-voting-trend]. Build a logistic model predicting early voting in 2020 (`EarlyVote2020`) using age (`Age`), education (`Education`), and party identification (`PartyID`.) Include two-way interactions.
 
 Answer: 
 ```{r}
@@ -644,7 +642,8 @@ Answer: We predict that the 28 year old with a graduate degree who identifies as
 
 ## 10 - Specifying sample designs and replicate weights in {srvyr}  {-}
 
-1. The National Health Interview Survey (NHIS) is an annual household survey conducted by the National Center for Health Statistics (NCHS). The NHIS includes a wide variety of health topics for adults including health status and conditions, functioning and disability, health care access and health service utilization, health-related behaviors, health promotion, mental health, barriers to care, and community engagement. Like many national in-person surveys, the sampling design is a stratified clustered design with details included in the Survey Description [@nhis-svy-des]. The Survey Description provides information on setting up syntax in SUDAAN, Stata, SPSS, SAS, and R ({survey} package implementation). You have imported the data and the variable containing the data is: `nhis_adult_data`. How would you specify the design using {srvyr} using either `as_survey_design()` or `as_survey_rep()`?
+
+1. The National Health Interview Survey (NHIS) is an annual household survey conducted by the National Center for Health Statistics (NCHS.) The NHIS includes a wide variety of health topics for adults including health status and conditions, functioning and disability, health care access and health service utilization, health-related behaviors, health promotion, mental health, barriers to care, and community engagement. Like many national in-person surveys, the sampling design is a stratified clustered design with details included in the Survey Description [@nhis-svy-des]. The Survey Description provides information on setting up syntax in SUDAAN, Stata, SPSS, SAS, and R ({survey} package implementation.) We have imported the data and the variable containing the data as: `nhis_adult_data`. How would we specify the design using either `as_survey_design()` or `as_survey_rep()`?
 
 Answer: 
 
@@ -660,7 +659,7 @@ nhis_adult_des <- nhis_adult_data %>%
   )
 ```
 
-2. The General Social Survey is a survey that has been administered since 1972 on social, behavioral, and attitudinal topics. The 2016-2020 GSS Panel codebook provides examples of setting up syntax in SAS and Stata but not R [@gss-codebook]. You have imported the data and the variable containing the data is: `gss_data`. How would you specify the design in R using either `as_survey_design()` or `as_survey_rep()`?
+2. The General Social Survey is a survey that has been administered since 1972 on social, behavioral, and attitudinal topics. The 2016-2020 GSS Panel codebook provides examples of setting up syntax in SAS and Stata but not R [@gss-codebook]. We have imported the data and the variable containing the data as: `gss_data`. How would we specify the design in R using either `as_survey_design()` or `as_survey_rep()`?
 
 Answer: 
 
@@ -675,7 +674,7 @@ gss_des <- gss_data %>%
 
 ## 13 - National Crime Victimization Survey Vignette {-}
 
-1. What proportion of completed motor vehicle thefts are **not** reported to the police? Hint: Use the codebook to look at the definition of Type of Crime (V4529).
+1. What proportion of completed motor vehicle thefts are **not** reported to the police? Hint: Use the codebook to look at the definition of Type of Crime (V4529.)
 
 ```{r}
 #| label: ncvs-vign-ex-solution1
@@ -750,8 +749,7 @@ Answer: The difference between male and female victimization rate is estimated a
 
 ## 14 - AmericasBarometer Vignette {-}
 
-1. Calculate the percentage of households with broadband internet in  and those with any internet at home, including from a phone or tablet in Latin America and the Caribbean. Hint: if you come across countries with 0% internet usage, you may want to filter by something first.
-
+1. Calculate the percentage of households with broadband internet and those with any internet at home, including from a phone or tablet in Latin America and the Caribbean. Hint: if there are countries with 0% internet usage, try filtering by something first.
 Answer: 
 
 ```{r}

diff --git a/99-references.Rmd b/99-references.Rmd
@@ -43,6 +43,10 @@ our_write_bib <- function (x = .packages(), file = "", tweak = TRUE, width = NUL
                         "\\1", cite$title)
       cite$title = gsub(pkg, paste0("{", pkg, "}"), cite$title)
       cite$title = gsub("\\b(R)\\b", "{R}", cite$title)
+      cite$title = gsub("\\b(ggplot2)\\b", "{ggplot2}", cite$title)
+      cite$title = gsub("\\b(dplyr)\\b", "{dplyr}", cite$title)
+      cite$title = gsub("\\b(tidyverse)\\b", "{tidyverse}", cite$title)
+      cite$title = gsub("\\b(sf)\\b", "{sf}", cite$title)
       cite$title = gsub(" & ", " \\\\& ", cite$title)
     }
     entry = toBibtex(cite)
@@ -58,8 +62,8 @@ our_write_bib <- function (x = .packages(), file = "", tweak = TRUE, width = NUL
     bib = lapply(bib, function(b) {
       b["author"] = sub("Duncan Temple Lang", "Duncan {Temple Lang}", 
                         b["author"])
-      b["title"] = gsub("(^|\\W)'([^']+)'(\\W|$)", "\\1\\2\\3", 
-                        b["title"])
+      # b["title"] = gsub("(^|\\W)'([^']+)'(\\W|$)", "\\1\\2\\3", 
+      #                   b["title"])
       if (!is.na(b["note"])) 
         b["note"] = gsub("(^.*?https?://.*?),\\s+https?://.*?(},\\s*)$", 
                          "\\1\\2", b["note"])

diff --git a/book.bib b/book.bib
@@ -296,7 +296,7 @@ @misc{recs-2020-meth
 }
 @misc{anes-2020-tech,
 	title        = {{Methodology Report for the ANES 2020 Time Series Study}},
-	author       = {{DeBell, Matthew and Amsbary, Michelle and Brader, Ted and Brock, Shelley and Good, Cindy and Kamens, Justin and Maisel, Natalya and Pinto, Sarah}},
+	author       = {DeBell, Matthew and Amsbary, Michelle and Brader, Ted and Brock, Shelley and Good, Cindy and Kamens, Justin and Maisel, Natalya and Pinto, Sarah},
 	year         = 2022,
 	howpublished = {\url{https://electionstudies.org/wp-content/uploads/2022/08/anes_timeseries_2020_methodology_report.pdf}}
 }
@@ -494,4 +494,56 @@ @misc{gss-codebook
 	editor       = {NORC, Chicago},
 	year         = 2021,
 	howpublished = {\url{https://gss.norc.org/Documents/codebook/2016-2020%20GSS%20Panel%20Codebook%20-%20R1a.pdf}}
-}
+}
+
+@Book{ggplot2wickham,
+  author = {Hadley Wickham},
+  title = {{ggplot2}: Elegant Graphics for Data Analysis},
+  publisher = {Springer-Verlag New York},
+  year = {2016},
+  isbn = {978-3-319-24277-4},
+  url = {https://ggplot2.tidyverse.org},
+}
+
+@Article{gtsummarysjo,
+  author = {Daniel D. Sjoberg and Karissa Whiting and Michael Curry and Jessica A. Lavery and Joseph Larmarange},
+  title = {Reproducible Summary Tables with the {gtsummary} Package},
+  journal = {{The R Journal}},
+  year = {2021},
+  url = {https://doi.org/10.32614/RJ-2021-053},
+  doi = {10.32614/RJ-2021-053},
+  volume = {13},
+  issue = {1},
+  pages = {570-580},
+}
+
+@Article{targetslandau,
+  title = {The {targets} {R} package: a dynamic {Make}-like function-oriented pipeline toolkit for reproducibility and high-performance computing},
+  author = {William Michael Landau},
+  journal = {Journal of Open Source Software},
+  year = {2021},
+  volume = {6},
+  number = {57},
+  pages = {2959},
+  url = {https://doi.org/10.21105/joss.02959},
+}
+
+@Article{jsonliteooms,
+  title = {The {jsonlite} Package: A Practical and Consistent Mapping Between JSON Data and {R} Objects},
+  author = {Jeroen Ooms},
+  journal = {arXiv:1403.2805 [stat.CO]},
+  year = {2014},
+  url = {https://arxiv.org/abs/1403.2805},
+}
+
+@Article{visdattierney,
+  title = {{visdat}: Visualising Whole Data Frames},
+  author = {Nicholas Tierney},
+  doi = {10.21105/joss.00355},
+  url = {http://dx.doi.org/10.21105/joss.00355},
+  year = {2017},
+  journal = {Journal of Open Source Software},
+  volume = {2},
+  number = {16},
+  pages = {355}
+}
diff --git a/index.Rmd b/index.Rmd
@@ -16,12 +16,9 @@ github-repo: tidy-survey-r/tidy-survey-book
 graphics: yes
 #cover-image: images/cover.jpg
 header-includes:
-   - \usepackage{draftwatermark}
    - \usepackage[titles]{tocloft}
 ---
 
-\SetWatermarkText{DRAFT}
-
 
 ```{r setup}
 #| include: false

diff --git a/renv.lock b/renv.lock
@@ -2025,18 +2025,18 @@
     },
     "srvyrexploR": {
       "Package": "srvyrexploR",
-      "Version": "0.0.0.9000",
+      "Version": "1.0.0",
       "Source": "GitHub",
       "RemoteType": "github",
-      "RemoteHost": "api.github.com",
-      "RemoteRepo": "srvyrexploR",
       "RemoteUsername": "tidy-survey-r",
+      "RemoteRepo": "srvyrexploR",
       "RemoteRef": "HEAD",
-      "RemoteSha": "914fc0fd0b7812d7d7260e15da882561602b21d2",
+      "RemoteSha": "e03f36c51c34f7d0f1a036246a15d3ed67806b4f",
+      "RemoteHost": "api.github.com",
       "Requirements": [
         "R"
       ],
-      "Hash": "3586abacf9e95b432824b9e9e60037d0"
+      "Hash": "30a1302b8eabd8d1a72228c799794665"
     },
     "stringi": {
       "Package": "stringi",