From c6ed1df698f952b298f2ef20505e1cd968265357 Mon Sep 17 00:00:00 2001 From: nrosed Date: Fri, 15 Mar 2024 11:40:59 -0600 Subject: [PATCH] reordered sections to match elife, small changes in supp tables with new calculation of name enrichment, supplemental figure renaming to match elife --- content/{04.results.md => 03.results.md} | 12 ++--- .../{05.discussion.md => 04.discussion.md} | 0 content/{03.methods.md => 05.methods.md} | 0 content/07.supplement.md | 52 +++++++++---------- 4 files changed, 32 insertions(+), 32 deletions(-) rename content/{04.results.md => 03.results.md} (98%) rename content/{05.discussion.md => 04.discussion.md} (100%) rename content/{03.methods.md => 05.methods.md} (100%) diff --git a/content/04.results.md b/content/03.results.md similarity index 98% rename from content/04.results.md rename to content/03.results.md index fa1892d..c642cb9 100644 --- a/content/04.results.md +++ b/content/03.results.md @@ -87,7 +87,7 @@ Figure @fig:fig2 shows an overview of the process and example input data for thi These analyses relied upon accurate gender prediction of both authors and speakers. To predict the gender of the speaker or author, we used the package genderizeR [@doi:10.32614/rj-2016-002], an R package wrapper to access the genderize.io API [@https://genderize.io] to get binary gender predictions for each identified first name. We unfortunately cannot identify non-binary gender expression with the tools we used. -Performance of binary prediction was evaluated on a benchmark data set of thirty randomly selected news articles, ten from each of the following years: 2005, 2010, 2015 (Figure {@fig:suppfig1}a). +Performance of binary prediction was evaluated on a benchmark data set of thirty randomly selected news articles, ten from each of the following years: 2005, 2010, 2015 (Figure {@fig:suppfig1}). In addition, genderize.io has been found by independent researchers to have an error rate comparable to other published gender prediction methods, with a error-rate on predicted names below 6% [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]. However, it should be noted that the error rate varies by name origin with the largest decrease in performance on names with an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]. In our analysis, we did not observe a large difference in names predicted to come from a man or woman between predicted East Asian and other name origins (Table{@tbl:tableGenderNameOrigin}). @@ -167,9 +167,9 @@ Minimum and median per data type over all years: _Nature_ papers, (568, 684); _S In comparing the citation rate of first and last author name origins in news articles, we decided to additionally analyze scientist-written articles. Though fewer in number, scientist-written news articles have many citations, making the set sufficient for analysis and providing an opportunity to measure differences in citation patterns between journalists and scientists. -In both journalist- and scientist-written articles, we found that most cited name origins were predicted Celtic/English or European, both with a bootstrapped estimated citation rate between 19.2-42.8% (Figure {@fig:suppfig3}b,c). -East Asian predicted name origins are the third highest proportion of cited names, with a bootstrapped estimated citation rate between 6.4-28.8%. -All other predicted name origins individually account for less than 7.9% of total cited authors. +In both journalist- and scientist-written articles, we found that most cited name origins were predicted Celtic/English or European, both with a bootstrapped estimated citation rate between 17.9-39.6% (Figure {@fig:suppfig3}b,c). +East Asian predicted name origins are the third highest proportion of cited names, with a bootstrapped estimated citation rate between 7.3-28.1%. +All other predicted name origins individually account for less than 8.1% of total cited authors. We analyzed how these distributions compare to the composition of the first and last authors in _Nature_ (Figure {@fig:supfig_nameorigin_bg}), by examining the top three most frequent predicted name origins (Figure {@fig:fig3}b,c, Table {@tbl:tableFCNature}). When considering only first authors, we found a slight over-representation for predicted Celtic/English name origins and a small under-representation for predicted East Asian name origins in scientist-written and journalist-written news articles when compared to the composition of first authors in _Nature_ (Figure {@fig:fig3}b, c). @@ -183,8 +183,8 @@ Additionally, we see a difference in predicted Arabic/Turkish/Persian name origi We then sought to determine whether or not the quoted speaker demographic replicated the cited authors’ over- and under-representation patterns. We found a much stronger Celtic/English over-representation in comparison to citation patterns, with quotes from those with Celtic/English name origins at a much higher frequency than quotes from those with European name origins (Figure {@fig:suppfig3}d, Table {@tbl:tableFCNature}). -We also found a much stronger reduction of quotes from people with predicted East Asian name origins (Figure {@fig:suppfig3}b), with never more than 7.7% of quotes (Figure {@fig:fig3}d, Table {@tbl:tableFCNature}). -This reveals a large disparity when considering that people with a predicted East Asian name origin constitute between 14.3-33.6% of last authors cited in either journalist- or scientist-written news articles (Figure {@fig:fig3}b,c, Table {@tbl:tableFCNature}). +We also found a much stronger reduction of quotes from people with predicted East Asian name origins (Figure {@fig:suppfig3}b), with never more than 8.2% of quotes (Figure {@fig:fig3}d, Table {@tbl:tableFCNature}). +This reveals a large disparity when considering that people with a predicted East Asian name origin constitute between 7.3-24.6% of last authors cited in either journalist- or scientist-written news articles (Figure {@fig:fig3}b,c, Table {@tbl:tableFCNature}). When we compare against first and last authorship in _Nature_ across all predicted name origins, we find that for all other name origins except for East Asian and Celtic/English, the quote rates closely matches the predicted name origin rate of first and last authors in _Nature_ (Figure {@fig:suppfig4}c, dark grey and light blue lines compare to the purple lines). To further understand the source of Celtic/English over-representation and East Asian under-representation, we selected a subset of quotes from people whose works were also cited in the news article. diff --git a/content/05.discussion.md b/content/04.discussion.md similarity index 100% rename from content/05.discussion.md rename to content/04.discussion.md diff --git a/content/03.methods.md b/content/05.methods.md similarity index 100% rename from content/03.methods.md rename to content/05.methods.md diff --git a/content/07.supplement.md b/content/07.supplement.md index 97fa49a..8bf9691 100644 --- a/content/07.supplement.md +++ b/content/07.supplement.md @@ -3,7 +3,7 @@ ![ **Benchmark Data ** The performance of gender prediction for pipeline-identified quoted speakers. -](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/supp_fig1_tmp/supp_fig1.png "Supplementary Figure 1"){#fig:suppfig1 tag="Supplemental 1" width=6in} +](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/supp_fig1_tmp/supp_fig1.png "Figure 1 – figure supplemental 1"){#fig:suppfig1 tag="1 – figure supplemental 1" width=6in} ![ **Speakers predicted to be men are overrepresented in news quotes regardless of predicted journalist gender** @@ -11,7 +11,7 @@ Panel a depicts two trend lines: Yellow: Proportion of _Nature_ news articles wr We observe a moderate gender difference in the number of articles written by men and women journalists. Panel b depicts two trend lines: Yellow: Proportion of quotes predicted to be from men in an article written by a journalist predicted to be a woman; Blue: Proportion of quotes predicted to be from men in an article written by a journalist predicted to be a man. In all plots, the colored bands represent the 5th and 95th bootstrap quantiles and the point is the mean calculated from 1,000 bootstrap samples. -](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/supp_journalist_contingency_tab_tmp/supp_fig.png "Supplementary Figure 2"){#fig:suppfig_j_gender tag="Supplemental 2" width=6in} +](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/supp_journalist_contingency_tab_tmp/supp_fig.png "Figure 2 - figure supplemental 1"){#fig:suppfig_j_gender tag="2 - figure supplemental 1" width=6in} ![ @@ -20,14 +20,14 @@ Panel a depicts three trend lines: Purple: Proportion of _Nature_ quotes for a s We observe a larger gender difference between first and last authors in _Springer Nature_ articles, however the proportion of speakers estimated to be men is less than observed in _Nature_ research articles. Panel b depicts the proportion of quotes from predicted men broken down by article type. In all plots the colored bands represent the 5th and 95th bootstrap quantiles and the point is the mean calculated from 1,000 bootstrap samples. -](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/fig2_tmp/fig2_supp.png "Supplementary Figure 3"){#fig:suppfig2 tag="Supplemental 3" width=6in} +](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/fig2_tmp/fig2_supp.png "Figure 2 - figure supplemental 2"){#fig:suppfig2 tag="2 - figure supplemental 2" width=6in} ![ **Predicted Celtic/English, and European name origins are the highest cited, quoted, and mentioned** Panel a, depicts the number of quotes, mentions, citations, or research articles considered in the name origin analysis. Panels b-g depicts the proportion of a name origin in a given dataset, citations in articles written by journalists or writers, quoted speakers or mentions. In all plots the colored bands represent the 5th and 95th bootstrap quantiles and the point is the mean calculated from 1,000 bootstrap samples. -](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/fig3_tmp/fig3_supp.png "Supplementary Figure 4"){#fig:suppfig3 tag="Supplemental 4" width=6in} +](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/fig3_tmp/fig3_supp.png "Figure 3 – figure supplemental 1"){#fig:suppfig3 tag="3 – figure supplemental 1" width=6in} ![ @@ -35,7 +35,7 @@ In all plots the colored bands represent the 5th and 95th bootstrap quantiles an Panels a-d depicts the predicted name origins of first and last authors in our background sets. Panel a and b show the predicted name origins of _Nature_ first and last authors, respectively. Panel c and d show the predicted name origins of _Springer Nature_ first and last authors, respectively. -](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/fig3_tmp/fig3_supp3.png "Supplementary Figure 5"){#fig:supfig_nameorigin_bg tag="Supplemental 5" width=6in} +](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/fig3_tmp/fig3_supp3.png "Figure 3 – figure supplemental 2"){#fig:supfig_nameorigin_bg tag="3 – figure supplemental 2" width=6in} ![ @@ -45,7 +45,7 @@ Panel a, c, and e compare the citation (a), quote (c), or mention (e) rate again Panel b, d, and f compare the citation (a), quote (c), or mention (e) rate against _Springer Nature_ first and last author name origins. Panels a and b additionally partition the citation rates by journalist-written articles and scientist-written articles, each further divided into first or last author position. For c-f, only journalist written articles are considered. -](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/fig3_tmp/fig3_supp2.png "Supplementary Figure 6"){#fig:suppfig4 tag="Supplemental 6" width=6in} +](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/fig3_tmp/fig3_supp2.png "Figure 3 – figure supplemental 3"){#fig:suppfig4 tag="3 – figure supplemental 3" width=6in} ![ @@ -54,7 +54,7 @@ Panels a-d depicts twelve plots, each for a possible name origin comparison agai Panels a and b compare name origin proportions of quotes from people that were also cited in the same article. Panels c and d compare name origin proportions from mentions of people that were also cited in the same article. In all plots the colored bands represent the 5th and 95th bootstrap quantiles and the point is the mean calculated from 1,000 bootstrap samples. -](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/supp_country_specific_analysis_tmp/supp_fig.png "Supplementary Figure 7"){#fig:suppfig_quote_cite tag="Supplemental 7" width=6in} +](https://github.com/nrosed/nature_news_disparities/raw/main/figure_notebooks/manuscript_figs/supp_country_specific_analysis_tmp/supp_fig.png "Figure 3 - figure supplemental 4"){#fig:suppfig_quote_cite tag="3 - figure supplemental 4" width=6in}