diff --git a/08-communicating-results.Rmd b/08-communicating-results.Rmd index ca0e967..3ca0cb5 100644 --- a/08-communicating-results.Rmd +++ b/08-communicating-results.Rmd @@ -27,7 +27,7 @@ library(gt) library(gtsummary) ``` -We are using data from ANES as described in Chapter \@ref(c04-getting-started). As a reminder, here is the code to create the design objects for each to use throughout this chapter. For ANES, we need to adjust the weight so it sums to the population instead of the sample (see the ANES documentation and Chapter \@ref(c04-getting-started) for more information.) +We are using data from ANES as described in Chapter \@ref(c04-getting-started). As a reminder, here is the code to create the design objects for each to use throughout this chapter. For ANES, we need to adjust the weight so it sums to the population instead of the sample (see the ANES documentation and Chapter \@ref(c04-getting-started) for more information). ```{r} #| label: results-anes-des @@ -53,8 +53,8 @@ After finishing the analysis and modeling, we proceed to the task of communicati Before beginning any dissemination of results, consider questions such as: - - How are we presenting results? Examples include a website, print, or other media. Based on the media type, we might limit or enhance the use of graphical representation. - - What is the audience's familiarity with the study and/or data? Audiences can range from the general public to data experts. If we anticipate limited knowledge about the study, we should provide detailed descriptions (we discuss recommendations later in the chapter.) + - How are we presenting results? Examples include a website, print, or other media. Based on the medium, we might limit or enhance the use of graphical representation. + - What is the audience's familiarity with the study and/or data? Audiences can range from the general public to data experts. If we anticipate limited knowledge about the study, we should provide detailed descriptions (we discuss recommendations later in the chapter). - What are we trying to communicate? It could be summary statistics, trends, patterns, or other insights. Tables may suit summary statistics, while plots are better at conveying trends and patterns. - Is the audience accustomed to interpreting plots? If not, include explanatory text to guide them on how to interpret the plots effectively. - What is the audience's statistical knowledge? If the audience does not have a strong statistics background, provide text on standard errors, confidence intervals, and other estimate types to enhance understanding. @@ -65,20 +65,20 @@ As analysts, we often emphasize the data, and communicating results can sometime ### Methodology -If we are using existing data, methodologically-sound surveys provide documentation about how the survey was fielded, the questionnaires, and other necessary information for analyses. \index{Population of interest|(}For example, the survey's methodology reports should include the population of interest, sampling procedures, response rates, questionnaire documentation, weighting, and a general overview of disclosure statements.\index{Population of interest|)} Many American organizations follow the American Association for Public Opinion Research's (AAPOR) [Transparency Initiative.](https://aapor.org/standards-and-ethics/transparency-initiative) The AAPOR Transparency Initiative requires organizations to include specific details in their methodology, making it clear how we can and should analyze and interpret the results. Being transparent about these methods is vital for the scientific rigor of the field. +If we are using existing data, methodologically sound surveys provide documentation about how the survey was fielded, the questionnaires, and other necessary information for analyses. \index{Population of interest|(}For example, the survey's methodology reports should include the population of interest, sampling procedures, response rates, questionnaire documentation, weighting, and a general overview of disclosure statements.\index{Population of interest|)} Many American organizations follow the American Association for Public Opinion Research's (AAPOR) [Transparency Initiative](https://aapor.org/standards-and-ethics/transparency-initiative). The AAPOR Transparency Initiative requires organizations to include specific details in their methodology, making it clear how we can and should analyze and interpret the results. Being transparent about these methods is vital for the scientific rigor of the field. -The details provided in Chapter \@ref(c02-overview-surveys) about the survey process should be shared with the audience when presenting the results. When using publicly-available data, like the examples in this book, we can often link to the methodology report in our final output. We should also provide high-level information for the audience to quickly grasp the context around the findings. For example, we can mention when and where the study was conducted, the population's age range, or other contextual details. This information helps the audience understand how generalizable the results are. +The details provided in Chapter \@ref(c02-overview-surveys) about the survey process should be shared with the audience when presenting the results. When using publicly available data, like the examples in this book, we can often link to the methodology report in our final output. We should also provide high-level information for the audience to quickly grasp the context around the findings. For example, we can mention when and where the study was conducted, the population's age range, or other contextual details. This information helps the audience understand how generalizable the results are. Providing this material is especially important when no methodology report is available for the analyzed data. For example, if we conducted a new survey for a specific purpose, we should document and present all the pertinent information during the analysis and reporting process. Adhering to the AAPOR Transparency Initiative guidelines is a reliable method to guarantee that all essential information is communicated to the audience. ### Analysis \index{American Community Survey (ACS)|(} -Along with the survey methodology and weight calculations, we should also share our approach to preparing, cleaning, and analyzing the data. For example, in Chapter \@ref(c06-statistical-testing), we compared education distributions from the ANES survey to the American Community Survey (ACS.) To make the comparison, we had to collapse the education categories provided in the ANES data to match the ACS. The process for this particular example may seem straightforward (like combining Bachelor's and Graduate Degrees into a single category), but there are multiple ways to deal with the data. Our choice is just one of many. We should document both the original ANES question and response options and the steps we took to match them with ACS data. This transparency helps clarify our analysis to our audience. +Along with the survey methodology and weight calculations, we should also share our approach to preparing, cleaning, and analyzing the data. For example, in Chapter \@ref(c06-statistical-testing), we compared education distributions from the ANES survey to the American Community Survey (ACS). To make the comparison, we had to collapse the education categories provided in the ANES data to match the ACS. The process for this particular example may seem straightforward (like combining bachelor's and graduate degrees into a single category), but there are multiple ways to deal with the data. Our choice is just one of many. We should document both the original ANES question and response options and the steps we took to match them with ACS data. This transparency helps clarify our analysis to our audience. \index{American Community Survey (ACS)|)} \index{Missing data|(} -Missing data is another instance where we want to be unambiguous and upfront with our audience. In this book, numerous examples and exercises remove missing data, as this is often the easiest way to handle them. However, there are circumstances where missing data holds substantive importance, and excluding them could introduce bias (see Chapter \@ref(c11-missing-data).) Being transparent about our handling of missing data is important to maintaining the integrity of our analysis and ensuring a comprehensive understanding of the results. +Missing data is another instance where we want to be unambiguous and upfront with our audience. In this book, numerous examples and exercises remove missing data, as this is often the easiest way to handle them. However, there are circumstances where missing data holds substantive importance, and excluding them could introduce bias (see Chapter \@ref(c11-missing-data)). Being transparent about our handling of missing data is important to maintaining the integrity of our analysis and ensuring a comprehensive understanding of the results. \index{Missing data|)} ### Results @@ -91,16 +91,16 @@ First, we can highlight important data elements in a sentence using plain langua This sentence provides key pieces of information in a straightforward way: - 1. **[DATE]**: Given that polling data are time-specific, providing the date of reference lets the audience know when these data were valid. - 2. **Registered U.S. voters**: This tells the audience who we surveyed, letting them know the population of interest. - 3. **XX%**: This part provides the estimated percentage of people voting for a specific candidate for a specific office. - 4. **[YEAR] general election**: As with the bullet above, adding this gives more context about the election type and year. The estimate would take on a different meaning if we changed it to a *primary* election instead of a *general* election. + 1. [DATE\]: Given that polling data are time-specific, providing the date of reference lets the audience know when these data were valid. + 2. Registered U.S. voters: This tells the audience who we surveyed, letting them know the population of interest. + 3. XX%: This part provides the estimated percentage of people voting for a specific candidate for a specific office. + 4. [YEAR] general election: Adding this gives more context about the election type and year. The estimate would take on a different meaning if we changed it to a primary election instead of a general election. We also included the word "estimated." When presenting aggregate survey results, we have errors around each estimate. We want to convey this uncertainty rather than talk in absolutes. Words like "estimated," "on average," or "around" can help communicate this uncertainty to the audience. Instead of saying "XX%," we can also say "XX% (+/- Y%)" to show the margin of error. Confidence intervals can also be incorporated into the text to assist readers. -Second, providing context and discussing the *meaning* behind a point estimate can help the audience glean some insight into why the data are important. For example, when comparing two values, it can be helpful to highlight if there are statistically significant differences and explain the impact and relevance of this information. This is where we should do our best to be mindful of biases and present the facts logically. +Second, providing context and discussing the meaning behind a point estimate can help the audience glean some insight into why the data are important. For example, when comparing two values, it can be helpful to highlight if there are statistically significant differences and explain the impact and relevance of this information. This is where we should do our best to be mindful of biases and present the facts logically. -Keep in mind how we discuss these findings can greatly influence how the audience interprets them. If we include speculation, phrases like "the authors speculate" or "these findings may indicate" relays the uncertainty around the notion while still lending a plausible solution. Additionally, we can present alternative viewpoints or competing discussion points to explain the uncertainty in the results. +Keep in mind how we discuss these findings can greatly influence how the audience interprets them. If we include speculation, phrases like "the authors speculate" or "these findings may indicate", it relays the uncertainty around the notion while still lending a plausible solution. Additionally, we can present alternative viewpoints or competing discussion points to explain the uncertainty in the results. ## Visualizing data @@ -110,7 +110,7 @@ Although discussing key findings in the text is important, presenting large amou \index{Publication-ready tables|see {gtsummary}} \index{Publication-ready tables|see {gt tables}} -Tables are a great way to provide a large amount of data when individual data points need to be examined. However, it is important to present tables in a reader-friendly format. Numbers should align, rows and columns should be easy to follow, and the table size should not compromise readability. Using key visualization techniques, we can create tables that are informative and nice to look at. \index{gt package|(}Many packages create easy-to-read tables (e.g., {kable} \+ {kableExtra}, {gt}, {gtsummary}, {DT}, {formattable}, {flextable}, {reactable}.) We appreciate the flexibility, ability to use pipes (e.g., `%>%`), and numerous extensions of the {gt} package. While we focus on {gt} here, we encourage learning about others as they may have additional helpful features.\index{gt package|)} \index{Replicate weights|(}\index{gtsummary|(} Please note, at this time, {gtsummary} needs additional features to be widely used for survey analysis, particularly due to its lack of ability to work with replicate designs.\index{Replicate weights|)} We provide one example using {gtsummary} and hope it evolves into a more comprehensive tool over time.\index{gtsummary|)} +Tables are a great way to provide a large amount of data when individual data points need to be examined. However, it is important to present tables in a reader-friendly format. Numbers should align, rows and columns should be easy to follow, and the table size should not compromise readability. Using key visualization techniques, we can create tables that are informative and nice to look at. \index{gt package|(}Many packages create easy-to-read tables (e.g., {kable} \+ {kableExtra}, {gt}, {gtsummary}, {DT}, {formattable}, {flextable}, {reactable}). We appreciate the flexibility, ability to use pipes (e.g., `%>%`), and numerous extensions of the {gt} package. While we focus on {gt} here, we encourage learning about others, as they may have additional helpful features.\index{gt package|)} \index{Replicate weights|(}\index{gtsummary|(} Please note, at this time, {gtsummary} needs additional features to be widely used for survey analysis, particularly due to its lack of ability to work with replicate designs.\index{Replicate weights|)} We provide one example using {gtsummary} and hope it evolves into a more comprehensive tool over time.\index{gtsummary|)} #### Transitioning {srvyr} output to a {gt} table {-} @@ -132,7 +132,7 @@ The default output generated by R may work for initial viewing inside our IDE or Looking at the output from `trust_gov`, a couple of improvements stand out: (1) switching to percentages instead of proportions and (2) removing the variable names as column headers. The {gt} package is a good tool for implementing better labeling and creating publishable tables. Let's walk through some code as we make a few changes to improve the table's usefulness. -First, we initiate the formatted table with the `gt()` function on the `trust_gov` tibble previously created. Next, we use the argument `rowname_col()` to designate the `TrustGovernment` column as the label for each row (called the table "stub".) We apply the `cols_label()` function to create informative column labels instead of variable names and then the `tab_spanner()` function to add a label across multiple columns. In this case, we label all columns except the stub with "Trust in Government, 2020". We then format the proportions into percentages with the `fmt_percent()` function and reduce the number of decimals shown to one with `decimals = 1`. Finally, the `tab_caption()` function adds a table title for the HTML version of the book. We can use the caption for cross-referencing in R Markdown, Quarto, and bookdown, as well as adding it to the list of tables in the book. These changes are all seen in Table \@ref(tab:results-table-gt1-tab). +First, we initiate the formatted table with the `gt()` function on the `trust_gov` tibble previously created. Next, we use the argument `rowname_col()` to designate the `TrustGovernment` column as the label for each row (called the table "stub"). We apply the `cols_label()` function to create informative column labels instead of variable names and then the `tab_spanner()` function to add a label across multiple columns. In this case, we label all columns except the stub with "Trust in Government, 2020." We then format the proportions into percentages with the `fmt_percent()` function and reduce the number of decimals shown to one with `decimals = 1`. Finally, the `tab_caption()` function adds a table title for the HTML version of the book. We can use the caption for cross-referencing in R Markdown, Quarto, and bookdown, as well as adding it to the list of tables in the book. These changes are all seen in Table \@ref(tab:results-table-gt1-tab). ```{r} #| label: results-table-gt1 @@ -311,9 +311,8 @@ anes_des_gtsum3 <- anes_des %>% in the federal government, 2020") %>% tab_source_note("American National Election Studies, 2020") %>% tab_footnote( - "Question text: How often can you trust - the federal government in Washington - to do what is right?" + "Question text: How often can you trust the federal government + in Washington to do what is right?" ) ``` @@ -383,7 +382,7 @@ anes_des_gtsum4 %>% print_gt_book(knitr::opts_current$get()[["label"]]) ``` -With {gtsummary}, we can also calculate statistics by different groups. Let's modify the previous example (displayed in Table \@ref(tab:results-gts-ex-4-tab)) (displayed in Table \@ref(tab:results-gts-ex-4-tab) to analyze data on whether a respondent voted for president in 2020. We update the `by` argument and refine the header. The resulting table is displayed in Table \@ref(tab:results-gts-ex-5-tab). The resulting table is displayed in Table \@ref(tab:results-gts-ex-5-tab). \index{Functions in srvyr!drop\_na} +With {gtsummary}, we can also calculate statistics by different groups. Let's modify the previous example (displayed in Table \@ref(tab:results-gts-ex-4-tab)) to analyze data on whether a respondent voted for president in 2020. We update the `by` argument and refine the header. The resulting table is displayed in Table \@ref(tab:results-gts-ex-5-tab). \index{Functions in srvyr!drop\_na} ```{r} #| label: results-gts-ex-5 @@ -412,7 +411,7 @@ anes_des_gtsum5 <- anes_des %>% tab_source_note("American National Election Studies, 2020") %>% tab_footnote( "Question text: How often can you trust the federal government - in Washington to do what is right?" + in Washington to do what is right?" ) ``` @@ -442,7 +441,7 @@ Survey analysis can yield an abundance of printed summary statistics and models. R has numerous packages for creating compelling and insightful charts. In this section, we focus on {ggplot2}, a member of the {tidyverse} collection of packages. Known for its power and flexibility, {ggplot2} is an invaluable tool for creating a wide range of data visualizations [@ggplot2wickham]. -The {ggplot2} package follows the "grammar of graphics," a framework that incrementally adds layers of chart components. This approach allows us to customize visual elements such as scales, colors, labels, and annotations to enhance the clarity of our results. After creating the survey design object, we can modify it to include additional outcomes and calculate estimates for our desired data points. Below, we create a binary variable `TrustGovernmentUsually`, which is `TRUE` when `TrustGovernment` is "Always" or "Most of the time" and `FALSE` otherwise. Then, we calculate the percentage of people who usually trust the government based on their vote in the 2020 presidential election (`VotedPres2020_selection`.) We remove the cases where people did not vote or did not indicate their choice. \index{Functions in srvyr!survey\_mean|(} \index{Functions in srvyr!summarize|(} \index{Functions in srvyr!drop\_na} +The {ggplot2} package follows the "grammar of graphics," a framework that incrementally adds layers of chart components. This approach allows us to customize visual elements such as scales, colors, labels, and annotations to enhance the clarity of our results. After creating the survey design object, we can modify it to include additional outcomes and calculate estimates for our desired data points. Below, we create a binary variable `TrustGovernmentUsually`, which is `TRUE` when `TrustGovernment` is "Always" or "Most of the time" and `FALSE` otherwise. Then, we calculate the percentage of people who usually trust the government based on their vote in the 2020 presidential election (`VotedPres2020_selection`). We remove the cases where people did not vote or did not indicate their choice. \index{Functions in srvyr!survey\_mean|(} \index{Functions in srvyr!summarize|(} \index{Functions in srvyr!drop\_na} ```{r} #| label: results-anes-prep @@ -467,11 +466,11 @@ anes_des_der ``` \index{Functions in srvyr!summarize|)} \index{Functions in srvyr!survey\_mean|)} -Now, we can begin creating our chart with {ggplot2}. First, we set up our plot with `ggplot()`. Next, we define the data points to be displayed using aesthetics, or `aes`. Aesthetics represent the visual properties of the objects in the plot. In the following example, we create a bar chart of the percentage of people who usually trust the government by who they voted for in the 2020 election. To do this, we want to have who they voted for on the x-axis (`VotedPres2020_selection`) and the percent they usually trust the government on the y-axis (`pct_trust`.) We specify these variables in `ggplot()` and then indicate we want a bar chart with `geom_bar()`. The resulting plot is displayed in Figure \@ref(fig:results-plot1). +Now, we can begin creating our chart with {ggplot2}. First, we set up our plot with `ggplot()`. Next, we define the data points to be displayed using aesthetics, or `aes`. Aesthetics represent the visual properties of the objects in the plot. In the following example, we create a bar chart of the percentage of people who usually trust the government by who they voted for in the 2020 election. To do this, we want to have who they voted for on the x-axis (`VotedPres2020_selection`) and the percent they usually trust the government on the y-axis (`pct_trust`). We specify these variables in `ggplot()` and then indicate we want a bar chart with `geom_bar()`. The resulting plot is displayed in Figure \@ref(fig:results-plot1). ```{r} #| label: results-plot1 -#| fig.cap: "Bar chart of trust in government by chosen 2020 presidential candidate" +#| fig.cap: "Bar chart of trust in government, by chosen 2020 presidential candidate" #| fig.alt: "Bar chart with x-axis of 'VotedPres2020_selection' with labels Biden, Trump and Other. It has y-axis 'pct_trust' with labels 0.00, 0.05, 0.10 and 0.15. The chart is a bar chart with 3 vertical bars. Bar 1 (Biden) has a height of 0.12. Bar 2 (Trump) has a height of 0.17. Bar 3 (Other) has a height of 0.06." p <- anes_des_der %>% ggplot(aes(x = VotedPres2020_selection, @@ -481,11 +480,11 @@ p <- anes_des_der %>% p ``` -This is a great starting point: it appears that a higher percentage of people state they usually trust the government among those who voted for Trump compared to those who voted for Biden or other candidates. Now, what if we want to introduce color to better differentiate the three groups? We can add `fill` under `aesthetics`, indicating that we want to use distinct colors for each value of `VotedPres2020_selection`. In this instance, Biden and Trump are displayed in different colors (shades in the print version of this book) in Figure \@ref(fig:results-plot2). +This is a great starting point: it appears that a higher percentage of people state they usually trust the government among those who voted for Trump compared to those who voted for Biden or other candidates. Now, what if we want to introduce color to better differentiate the three groups? We can add `fill` under `aesthetics`, indicating that we want to use distinct colors for each value of `VotedPres2020_selection`. In this instance, Biden and Trump are displayed in different colors in Figure \@ref(fig:results-plot2). ```{r} #| label: results-plot2 -#| fig.cap: "Bar chart of trust in government by chosen 2020 presidential candidate with colors" +#| fig.cap: "Bar chart of trust in government by chosen 2020 presidential candidate, with colors" #| fig.alt: "Bar chart with x-axis of 'VotedPres2020_selection' with labels Biden, Trump and Other. It has y-axis 'pct_trust' with labels 0.00, 0.05, 0.10 and 0.15. The chart is a bar chart with 3 vertical bars. Bar 1 (Biden) has a height of 0.12 and a color of strong reddish orange. Bar 2 (Trump) has a height of 0.17 and a color of vivid yellowish green. Bar 3 (Other) has a height of 0.06 and color of brilliant blue." pcolor <- anes_des_der %>% ggplot(aes(x = VotedPres2020_selection, @@ -500,7 +499,7 @@ Let's say we wanted to follow proper statistical analysis practice and incorpora ```{r} #| label: results-plot3 -#| fig.cap: "Bar chart of trust in government by chosen 2020 presidential candidate with colors and error bars" +#| fig.cap: "Bar chart of trust in government by chosen 2020 presidential candidate, with colors and error bars" #| fig.alt: "Bar chart with x-axis of 'VotedPres2020_selection' with labels Biden, Trump and Other. It has y-axis 'pct_trust' with labels 0.00, 0.05, 0.10 and 0.15. The chart is a bar chart with 3 vertical bars. Bar 1 (Biden) has a height of 0.12 and a color of strong reddish orange. Bar 2 (Trump) has a height of 0.17 and a color of vivid yellowish green. Bar 3 (Other) has a height of 0.06 and color of brilliant blue. Error bars are added with the Bar 1 (Biden) error ranging from 0.11 to 0.14, Bar 2 (Trump) error ranging from 0.16 to 0.19, and the Bar 3 (Other) error ranging from 0.02 to 0.14." pcol_error <- anes_des_der %>% ggplot(aes(x = VotedPres2020_selection, @@ -514,7 +513,7 @@ pcol_error <- anes_des_der %>% pcol_error ``` -We can continue adding to our plot until we achieve our desired look. For example, we can eliminate the color legend as it doesn't contribute meaningful information with `guides(fill = "none")`. We can also specify colors for `fill` using `scale_fill_manual()`. Inside this function, we provide a vector of values corresponding to the colors in our plot. These values are hexadecimal (hex) color codes, denoted by a leading pound sign `#` followed by six letters or numbers. The hex code `#0b3954` used below is dark blue. There are many tools online that help pick hex codes, such as [htmlcolorcodes.com](https://htmlcolorcodes.com/). Additionally, Figure \@ref(fig:results-plot4) incorporates better labels for the x and y axes (`xlab()`, `ylab()`), a title (`labs(title=)`), and a footnote with the data source (`labs(caption=)`.) +We can continue adding to our plot until we achieve our desired look. For example, since the color legend does not contribute meaningful information, we can eliminate it with `guides(fill = "none")`. We can also specify colors for `fill` using `scale_fill_manual()`. Inside this function, we provide a vector of values corresponding to the colors in our plot. These values are hexadecimal (hex) color codes, denoted by a leading pound sign `#` followed by six letters or numbers. The hex code `#0b3954` used below is dark blue. There are many tools online that help pick hex codes, such as htmlcolorcodes.com. Additionally, Figure \@ref(fig:results-plot4) incorporates better labels for the x and y axes (`xlab()`, `ylab()`), a title (`labs(title=)`), and a footnote with the data source (`labs(caption=)`). ```{r} #| label: results-plot4 @@ -541,5 +540,5 @@ pfull <- pfull ``` -What we've explored in this section are just the foundational aspects of {ggplot2}, and the capabilities of this package extend far beyond what we've covered. Advanced features such as annotation, faceting, and theming allow for more sophisticated and customized visualizations. The ggplot2 book by @ggplot2wickham is a comprehensive guide to learning more about this powerful tool. +What we have explored in this section are just the foundational aspects of {ggplot2}, and the capabilities of this package extend far beyond what we have covered. Advanced features such as annotation, faceting, and theming allow for more sophisticated and customized visualizations. The {ggplot2} book by @ggplot2wickham is a comprehensive guide to learning more about this powerful tool. \index{American National Election Studies (ANES)|)} \index{Plots|)} \ No newline at end of file