unit.scale = function(x) (x*100 - min(x*100)) / (max(x*100) - min(x*100))
evaluations_table <- evals_pub %>%
select(paper_abbrev, eval_name, cat_1, source_main, overall, adv_knowledge, methods, logic_comms, journal_predict) %>%
arrange(desc(paper_abbrev))
@@ -1313,7 +1344,7 @@
Notes on sources and approaches
-
-
+
(Consult, e.g., repliCATS/Hanea and others work; meta-science and meta-analysis approaches)
aggrecat
package
@@ -1336,7 +1367,7 @@
-
-
+
link
… we show how experts can be ranked based on their knowledge and their level of (un)certainty. By letting experts specify their knowledge in the form of a probability distribution, we can assess how accurately they can predict new data, and how appropriate their level of (un)certainty is. The expert’s specified probability distribution can be seen as a prior in a Bayesian statistical setting. We evaluate these priors by extending an existing prior-data (dis)agreement measure, the Data Agreement Criterion, and compare this approach to using Bayes factors to assess prior specification. We compare experts with each other and the data to evaluate their appropriateness. Using this method, new research questions can be asked and answered, for instance: Which expert predicts the new data best? Is there agreement between my experts and the data? Which experts’ representation is more valid or useful? Can we reach convergence between expert judgement and data? We provided an empirical example ranking (regional) directors of a large financial institution based on their predictions of turnover.
@@ -1354,7 +1385,7 @@
-
-
+
See Gsheet HERE, generated from an Elicit.org inquiry.
@@ -1570,705 +1601,736 @@
diff --git a/docs/chapters/evaluation_data_files/figure-html/unnamed-chunk-16-1.png b/docs/chapters/evaluation_data_files/figure-html/unnamed-chunk-16-1.png
index 4690280..f11e507 100644
Binary files a/docs/chapters/evaluation_data_files/figure-html/unnamed-chunk-16-1.png and b/docs/chapters/evaluation_data_files/figure-html/unnamed-chunk-16-1.png differ
diff --git a/docs/chapters/evaluation_data_files/figure-html/unnamed-chunk-20-1.png b/docs/chapters/evaluation_data_files/figure-html/unnamed-chunk-20-1.png
index 89cd41b..29df110 100644
Binary files a/docs/chapters/evaluation_data_files/figure-html/unnamed-chunk-20-1.png and b/docs/chapters/evaluation_data_files/figure-html/unnamed-chunk-20-1.png differ
diff --git a/docs/search.json b/docs/search.json
index 55ba045..5e538d8 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -18,7 +18,7 @@
"href": "chapters/evaluation_data.html#what-sorts-of-papersprojects-are-we-considering-and-evaluating",
"title": "\n1 Evaluation data: description, exploration, checks\n",
"section": "\n2.1 What sorts of papers/projects are we considering and evaluating?",
- "text": "2.1 What sorts of papers/projects are we considering and evaluating?\nIn this section, we give some simple data summaries and visualizations, for a broad description of The Unjournal’s coverage.\nIn the interactive tables below we give some key attributes of the papers and the evaluators, and a preview of the evaluations.\n\n\nCode(\n all_evals_dt <- evals_pub %>%\n arrange(paper_abbrev, eval_name) %>%\n dplyr::select(paper_abbrev, crucial_rsx, eval_name, cat_1, cat_2, source_main_wrapped, author_agreement) %>%\n dplyr::select(-matches(\"ub_|lb_|conf\")) %>% \n #rename_all(~ gsub(\"_\", \" \", .)) %>% \n rename(\"Research _____________________\" = \"crucial_rsx\" \n ) %>%\n DT::datatable(\n caption = \"Evaluations (confidence bounds not shown)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 7)\n )\n)\n\n\n\n\n\n\n\nNext, the ‘middle ratings and predictions’.\n\nData datable (all shareable relevant data)(\n all_evals_dt <- evals_pub %>%\n arrange(paper_abbrev, eval_name, overall) %>%\n dplyr::select(paper_abbrev, eval_name, all_of(rating_cats)) %>%\n DT::datatable(\n caption = \"Evaluations and predictions (confidence bounds not shown)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 7)\n )\n)\n\n\n\n\n\n\n\n\n\nCode(\n all_evals_dt_ci <- evals_pub %>%\n arrange(paper_abbrev, eval_name) %>%\n dplyr::select(paper_abbrev, eval_name, conf_overall, rating_cats, matches(\"ub_imp|lb_imp\")) %>%\n DT::datatable(\n caption = \"Evaluations and (imputed*) confidence bounds)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 7)\n )\n)\n\n\n\n\n\n\n\n\n\nNext consider…\n\n\n\n\n\n\nComposition of research evaluated\n\nBy field (economics, psychology, etc.)\nBy subfield of economics\nBy topic/cause area (Global health, economic development, impact of technology, global catastrophic risks, etc. )\nBy source (submitted, identified with author permission, direct evaluation)\n\n\nTiming of intake and evaluation2\n\n\n\n\n\nThe funnel plot below starts with the paper we prioritized for likely Unjournal evaluation, marking these as ‘considering’.\n\nCode#Add in the 3 different evaluation input sources\n#update to be automated rather than hard-coded - to look at David's work here\n\npapers_considered <- all_pub_records %>% nrow()\n\npapers_deprio <- all_pub_records %>% filter(`stage of process/todo` == \"de-prioritized\") %>% nrow()\n\npapers_evaluated <- all_pub_records %>% filter(`stage of process/todo` %in% c(\"published\",\n \"contacting/awaiting_authors_response_to_evaluation\",\n \"awaiting_publication_ME_comments\",\n \"awaiting_evaluations\")) %>% nrow()\n\npapers_complete <- all_pub_records %>% filter(`stage of process/todo` == \"published\") %>% \nnrow()\n\npapers_in_progress <- papers_evaluated-papers_complete\n\npapers_still_in_consideration <- all_pub_records %>% filter(`stage of process/todo` == \"considering\") %>% nrow()\n\n\nfig <- plot_ly(\n type = \"sankey\",\n orientation = \"h\",\n\n node = list(\n label = c(\"Prioritized\", \"Eval uated\", \"Complete\", \"In progress\", \"Still in consideration\", \"De-prioritized\"),\n color = c(\"orange\", \"green\", \"green\", \"orange\", \"orange\", \"red\"),\n pad = 15,\n thickness = 20,\n line = list(\n color = \"black\",\n width = 0.5\n )\n ),\n\n link = list(\n source = c(0,1,1,0,0),\n target = c(1,2,3,4,5),\n value = c(\n papers_evaluated,\n papers_complete,\n papers_in_progress,\n papers_still_in_consideration,\n papers_deprio\n ))\n )\nfig <- fig %>% layout(\n title = \"Unjournal paper funnel\",\n font = list(\n size = 10\n )\n)\n\nfig \n\n\n\n\n\n\nCodesummary_df <- evals_pub %>%\n distinct(crucial_rsx, .keep_all = T) %>% \n group_by(cat_1) %>%\n summarise(count = n()) \n\nsummary_df$cat_1[is.na(summary_df$cat_1)] <- \"Unknown\"\n\nsummary_df <- summary_df %>%\n arrange(-desc(count)) %>%\n mutate(cat_1 = factor(cat_1, levels = unique(cat_1)))\n\n# Create stacked bar chart\nggplot(summary_df, aes(x = cat_1, y = count)) +\n geom_bar(stat = \"identity\") + \n theme_minimal() +\n labs(x = \"Paper category\", y = \"Count\", \n title = \"Count of evaluated papers by primary category\") \n\n\n\n\n\nCode# Bar plot\nggplot(evals_pub, aes(x = source_main_wrapped)) + \n geom_bar(position = \"stack\", stat = \"count\") +\n labs(x = \"Source\", y = \"Count\") +\n theme_light() +\n theme_minimal() +\n ggtitle(\"Evaluations by source of the paper\") + # add title\n theme(\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n text = element_text(size = 16), # changing all text size to 16\n axis.text.y = element_text(size = 10),\n axis.text.x = element_text(size = 14)\n )\n\n\n\n\n\nCodeall_pub_records$is_evaluated = all_pub_records$`stage of process/todo` %in% c(\"published\",\n \"contacting/awaiting_authors_response_to_evaluation\",\n \"awaiting_publication_ME_comments\",\n \"awaiting_evaluations\")\n\nall_pub_records$source_main[all_pub_records$source_main == \"NA\"] <- \"Not applicable\" \nall_pub_records$source_main[all_pub_records$source_main == \"internal-from-syllabus-agenda-policy-database\"] <- \"Internal: syllabus, agenda, etc.\" \nall_pub_records$source_main = tidyr::replace_na(all_pub_records$source_main, \"Unknown\")\n\n\n\nggplot(all_pub_records, aes(x = fct_infreq(source_main), fill = is_evaluated)) + \n geom_bar(position = \"stack\", stat = \"count\") +\n labs(x = \"Source\", y = \"Count\", fill = \"Selected for\\nevaluation?\") +\n coord_flip() + # flipping the coordinates to have categories on y-axis (on the left)\n theme_light() +\n theme_minimal() +\n ggtitle(\"Evaluations by source of the paper\") +# add title\n theme(\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n text = element_text(size = 16), # changing all text size to 16\n axis.text.y = element_text(size = 12),\n axis.text.x = element_text(size = 14)\n )\n\n\n\n\nThe distribution of ratings and predictions\nNext, we present the ratings and predictions along with ‘uncertainty measures’.3 Where evaluators gave only a 1-5 confidence level4, we use the imputations discussed and coded above.\n\nFor each category and prediction (overall and by paper)\n\n\n\nCodewrap_text <- function(text, width) {\n sapply(strwrap(text, width = width, simplify = FALSE), paste, collapse = \"\\n\")\n}\n\nevals_pub$wrapped_pub_names <- wrap_text(evals_pub$paper_abbrev, width = 15)\n\n\n\n# Dot plot\nggplot(evals_pub, aes(x = paper_abbrev, y = overall)) +\n geom_point(stat = \"identity\", size = 4, shape = 1, colour = \"lightblue\", stroke = 3) +\n geom_text_repel(aes(label = eval_name), \n size = 3, \n box.padding = unit(0.35, \"lines\"),\n point.padding = unit(0.3, \"lines\")) +\n coord_flip() + # flipping the coordinates to have categories on y-axis (on the left)\n theme_light() +\n xlab(\"Paper\") + # remove x-axis label\n ylab(\"Overall score\") + # name y-axis\n ggtitle(\"Overall scores of evaluated papers\") +# add title\n theme(\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n text = element_text(size = 14), # changing all text size to 16\n axis.text.y = element_text(size = 8),\n axis.text.x = element_text(size = 12)\n )\n\n\n\n\n\n\n\nCodeunit.scale = function(x) (x*100 - min(x*100)) / (max(x*100) - min(x*100))\nevaluations_table <- evals_pub %>%\n select(paper_abbrev, eval_name, cat_1, source_main, overall, adv_knowledge, methods, logic_comms, journal_predict) %>%\n arrange(desc(paper_abbrev))\n\nout = formattable(\n evaluations_table,\n list(\n #area(col = 5:8) ~ function(x) percent(x / 100, digits = 0),\n area(col = 5:8) ~ color_tile(\"#FA614B66\",\"#3E7DCC\"),\n `journal_predict` = proportion_bar(\"#DeF7E9\", unit.scale)\n )\n)\nout\n\n\n\n\n\npaper_abbrev\n\n\neval_name\n\n\ncat_1\n\n\nsource_main\n\n\noverall\n\n\nadv_knowledge\n\n\nmethods\n\n\nlogic_comms\n\n\njournal_predict\n\n\n\n\n\nWell-being: Cash vs. psychotherapy\n\n\nAnonymous_13\n\n\nGH&D\n\n\ninternal-NBER\n\n\n90\n\n\n90\n\n\n90\n\n\n80\n\n\n4.0\n\n\n\n\nWell-being: Cash vs. psychotherapy\n\n\nHannah Metzler\n\n\nGH&D\n\n\ninternal-NBER\n\n\n75\n\n\n70\n\n\n90\n\n\n75\n\n\n3.0\n\n\n\n\nNonprofit Govc.: Randomized healthcare DRC\n\n\nWayne Aaron Sandholtz\n\n\nGH&D\n\n\ninternal-NBER\n\n\n65\n\n\n70\n\n\n60\n\n\n55\n\n\n3.6\n\n\n\n\nLT CEA: Resilient foods vs. AGI safety\n\n\nScott Janzwood\n\n\nlong-term-relevant\n\n\nsubmitted\n\n\n65\n\n\nNA\n\n\nNA\n\n\nNA\n\n\nNA\n\n\n\n\nLT CEA: Resilient foods vs. AGI safety\n\n\nAnca Hanea\n\n\nlong-term-relevant\n\n\nsubmitted\n\n\n80\n\n\n80\n\n\n70\n\n\n85\n\n\n3.5\n\n\n\n\nLT CEA: Resilient foods vs. AGI safety\n\n\nAlex Bates\n\n\nlong-term-relevant\n\n\nsubmitted\n\n\n40\n\n\n30\n\n\n50\n\n\n60\n\n\n2.0\n\n\n\n\nEnv. fx of prod.: ecological obs\n\n\nElias Cisneros\n\n\nNA\n\n\ninternal-NBER\n\n\n88\n\n\n90\n\n\n75\n\n\n80\n\n\n4.0\n\n\n\n\nEnv. fx of prod.: ecological obs\n\n\nAnonymous_12\n\n\nNA\n\n\ninternal-NBER\n\n\n70\n\n\n70\n\n\n70\n\n\n75\n\n\n4.0\n\n\n\n\nCBT Human K, Ghana\n\n\nAnonymous_11\n\n\nNA\n\n\ninternal-NBER\n\n\n75\n\n\n60\n\n\n90\n\n\n70\n\n\n4.0\n\n\n\n\nCBT Human K, Ghana\n\n\nAnonymous_16\n\n\nNA\n\n\ninternal-NBER\n\n\n75\n\n\n65\n\n\n60\n\n\n75\n\n\nNA\n\n\n\n\nBanning wildlife trade can boost demand\n\n\nAnonymous_3\n\n\nconservation\n\n\nsubmitted\n\n\n75\n\n\n70\n\n\n80\n\n\n70\n\n\n3.0\n\n\n\n\nBanning wildlife trade can boost demand\n\n\nLiew Jia Huan\n\n\nconservation\n\n\nsubmitted\n\n\n75\n\n\n80\n\n\n50\n\n\n70\n\n\n2.5\n\n\n\n\nAdvance market commit. (vaccines)\n\n\nDavid Manheim\n\n\npolicy\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n80\n\n\n25\n\n\n95\n\n\n75\n\n\n3.0\n\n\n\n\nAdvance market commit. (vaccines)\n\n\nJoel Tan\n\n\npolicy\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n79\n\n\n90\n\n\n70\n\n\n70\n\n\n5.0\n\n\n\n\nAdvance market commit. (vaccines)\n\n\nDan Tortorice\n\n\npolicy\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n80\n\n\n90\n\n\n80\n\n\n80\n\n\n4.0\n\n\n\n\nAI and econ. growth\n\n\nSeth Benzell\n\n\nmacroeconomics\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n80\n\n\n75\n\n\n80\n\n\n70\n\n\nNA\n\n\n\n\nAI and econ. growth\n\n\nPhil Trammel\n\n\nmacroeconomics\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n92\n\n\n97\n\n\n70\n\n\n45\n\n\n3.5\n\n\n\n\n\n\n\nNext, look for systematic variation\n\nBy field and topic area of paper\nBy submission/selection route\nBy evaluation manager\n\n… perhaps building a model of this. We are looking for systematic ‘biases and trends’, loosely speaking, to help us better understand how our evaluation system is working.\n\nRelationship among the ratings (and predictions)\n\nCorrelation matrix\nANOVA\nPCA (Principle components)\nWith other ‘control’ factors?\n\nHow do the specific measures predict the aggregate ones (overall rating, merited publication)\n\nCF ‘our suggested weighting’"
+ "text": "2.1 What sorts of papers/projects are we considering and evaluating?\nIn this section, we give some simple data summaries and visualizations, for a broad description of The Unjournal’s coverage.\nIn the interactive tables below we give some key attributes of the papers and the evaluators, and a preview of the evaluations.\n\n\nCode(\n all_evals_dt <- evals_pub %>%\n arrange(paper_abbrev, eval_name) %>%\n dplyr::select(paper_abbrev, crucial_rsx, eval_name, cat_1, cat_2, source_main_wrapped, author_agreement) %>%\n dplyr::select(-matches(\"ub_|lb_|conf\")) %>% \n #rename_all(~ gsub(\"_\", \" \", .)) %>% \n rename(\"Research _____________________\" = \"crucial_rsx\" \n ) %>%\n DT::datatable(\n caption = \"Evaluations (confidence bounds not shown)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 7)\n )\n)\n\n\n\n\n\n\n\nNext, the ‘middle ratings and predictions’.\n\nData datable (all shareable relevant data)(\n all_evals_dt <- evals_pub %>%\n arrange(paper_abbrev, eval_name, overall) %>%\n dplyr::select(paper_abbrev, eval_name, all_of(rating_cats)) %>%\n DT::datatable(\n caption = \"Evaluations and predictions (confidence bounds not shown)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 7)\n )\n)\n\n\n\n\n\n\n\n\n\nCode(\n all_evals_dt_ci <- evals_pub %>%\n arrange(paper_abbrev, eval_name) %>%\n dplyr::select(paper_abbrev, eval_name, conf_overall, rating_cats, matches(\"ub_imp|lb_imp\")) %>%\n DT::datatable(\n caption = \"Evaluations and (imputed*) confidence bounds)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 7)\n )\n)\n\n\n\n\n\n\n\n\n\nNext consider…\n\n\n\n\n\n\nComposition of research evaluated\n\nBy field (economics, psychology, etc.)\nBy subfield of economics\nBy topic/cause area (Global health, economic development, impact of technology, global catastrophic risks, etc. )\nBy source (submitted, identified with author permission, direct evaluation)\n\n\nTiming of intake and evaluation2\n\n\n\n\n\nThe funnel plot below starts with the paper we prioritized for likely Unjournal evaluation, marking these as ‘considering’.\n\nCode#Add in the 3 different evaluation input sources\n#update to be automated rather than hard-coded - to look at David's work here\n\npapers_considered <- all_pub_records %>% nrow()\n\npapers_deprio <- all_pub_records %>% filter(`stage of process/todo` == \"de-prioritized\") %>% nrow()\n\npapers_evaluated <- all_pub_records %>% filter(`stage of process/todo` %in% c(\"published\",\n \"contacting/awaiting_authors_response_to_evaluation\",\n \"awaiting_publication_ME_comments\",\n \"awaiting_evaluations\")) %>% nrow()\n\npapers_complete <- all_pub_records %>% filter(`stage of process/todo` == \"published\") %>% \nnrow()\n\npapers_in_progress <- papers_evaluated - papers_complete\n\npapers_still_in_consideration <- all_pub_records %>% filter(`stage of process/todo` == \"considering\") %>% nrow()\n\n\n#todo: adjust wording of hover notes ('source, target...etc')\n\nfig <- plot_ly(\n type = \"sankey\",\n orientation = \"h\",\n\n node = list(\n label = c(\"Prioritized\", \"Evaluating(ed)\", \"Complete\", \"In progress\", \"Still in consideration\", \"De-prioritized\"),\n color = c(\"orange\", \"green\", \"green\", \"orange\", \"orange\", \"red\"),\n#Todo: adjust 'location' to group these left to right\n pad = 15,\n thickness = 20,\n line = list(\n color = \"black\",\n width = 0.5\n )\n ),\n\n link = list(\n source = c(0,1,1,0,0),\n target = c(1,2,3,4,5),\n value = c(\n papers_evaluated,\n papers_complete,\n papers_in_progress,\n papers_still_in_consideration,\n papers_deprio\n ))\n )\nfig <- fig %>% layout(\n title = \"Unjournal paper funnel\",\n font = list(\n size = 10\n )\n)\n\nfig \n\n\n\n\n\n(In future, will make interactive/dashboards of the elements below)\n\nCodesummary_df <- evals_pub %>%\n distinct(crucial_rsx, .keep_all = T) %>% \n group_by(cat_1) %>%\n summarise(count = n()) \n\nsummary_df$cat_1[is.na(summary_df$cat_1)] <- \"Unknown\"\n\nsummary_df <- summary_df %>%\n arrange(-desc(count)) %>%\n mutate(cat_1 = factor(cat_1, levels = unique(cat_1)))\n\n# Create stacked bar chart\nggplot(summary_df, aes(x = cat_1, y = count)) +\n geom_bar(stat = \"identity\") + \n theme_minimal() +\n labs(x = \"Paper category\", y = \"Count\", \n title = \"Count of evaluated papers by primary category\") \n\n\n\n\n\nCode# Bar plot\nggplot(evals_pub, aes(x = source_main_wrapped)) + \n geom_bar(position = \"stack\", stat = \"count\") +\n labs(x = \"Source\", y = \"Count\") +\n theme_light() +\n theme_minimal() +\n ggtitle(\"Pool of research/evaluations by paper source\") + # add title\n theme(\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n text = element_text(size = 16), # changing all text size to 16\n axis.text.y = element_text(size = 10),\n axis.text.x = element_text(size = 14)\n )\n\n\n\n\n\nCodeall_pub_records$is_evaluated = all_pub_records$`stage of process/todo` %in% c(\"published\",\n \"contacting/awaiting_authors_response_to_evaluation\",\n \"awaiting_publication_ME_comments\",\n \"awaiting_evaluations\")\n\nall_pub_records$source_main[all_pub_records$source_main == \"NA\"] <- \"Not applicable\" \nall_pub_records$source_main[all_pub_records$source_main == \"internal-from-syllabus-agenda-policy-database\"] <- \"Internal: syllabus, agenda, etc.\" \nall_pub_records$source_main = tidyr::replace_na(all_pub_records$source_main, \"Unknown\")\n\n\n\nggplot(all_pub_records, aes(x = fct_infreq(source_main), fill = is_evaluated)) + \n geom_bar(position = \"stack\", stat = \"count\") +\n labs(x = \"Source\", y = \"Count\", fill = \"Selected for\\nevaluation?\") +\n coord_flip() + # flipping the coordinates to have categories on y-axis (on the left)\n theme_light() +\n theme_minimal() +\n ggtitle(\"Evaluations by source of the paper\") +# add title\n theme(\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n text = element_text(size = 16), # changing all text size to 16\n axis.text.y = element_text(size = 12),\n axis.text.x = element_text(size = 14)\n )\n\n\n\n\nThe distribution of ratings and predictions\nNext, we present the ratings and predictions along with ‘uncertainty measures’.3 Where evaluators gave only a 1-5 confidence level4, we use the imputations discussed and coded above.\n\nFor each category and prediction (overall and by paper)\n\n\n\nCodewrap_text <- function(text, width) {\n sapply(strwrap(text, width = width, simplify = FALSE), paste, collapse = \"\\n\")\n}\n\nevals_pub$wrapped_pub_names <- wrap_text(evals_pub$paper_abbrev, width = 15)\n\n\n#todo -- sort by average overall, use color and vertical spacing more\n#todo: introduce a carriage return into the paper names (workaround) to wrap these and save horizontal space\n\n\n# Dot plot\nggplot(evals_pub, aes(x = paper_abbrev, y = overall)) +\n geom_point(stat = \"identity\", size = 3, shape = 1, colour = \"lightblue\", stroke = 2) +\n geom_text_repel(aes(label = eval_name), \n size = 3, \n box.padding = unit(0.35, \"lines\"),\n point.padding = unit(0.3, \"lines\")) +\n coord_flip() + # flipping the coordinates to have categories on y-axis (on the left)\n theme_light() +\n xlab(\"Paper\") + # remove x-axis label\n ylab(\"Overall score\") + # name y-axis\n ggtitle(\"Overall scores of evaluated papers\") +# add title\n theme(\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n text = element_text(size = 14), # changing all text size to 16\n axis.text.y = element_text(size = 8),\n axis.text.x = element_text(size = 12)\n )\n\n\n\nCode#todo -- add more vertical space between papers\n\n\n\nIn future (todo), we aim to build a dashboard allowing people to use the complete set of ratings and predictions, and choose their own weightings. (Also incorporating the evaluator uncertainty in reasonable ways.)\nThe below should be fixed – the column widths below are misleading\n\n\n\n\n\n\nFuture vis\n\n\n\n\n\nSpider or radial chart\nEach rating is a dimension or attribute (potentially normalized) potentially superimpose a ‘circle’ for the suggested weighting or overall.\nEach paper gets its own spider, with all others (or the average) in faded color behind it as a comparator.\nIdeally user can switch on/off\nBeware – people infer things from the shape’s size\n\n\n\n\n\nCodeunit.scale = function(x) (x*100 - min(x*100)) / (max(x*100) - min(x*100))\nevaluations_table <- evals_pub %>%\n select(paper_abbrev, eval_name, cat_1, source_main, overall, adv_knowledge, methods, logic_comms, journal_predict) %>%\n arrange(desc(paper_abbrev))\n\nout = formattable(\n evaluations_table,\n list(\n #area(col = 5:8) ~ function(x) percent(x / 100, digits = 0),\n area(col = 5:8) ~ color_tile(\"#FA614B66\",\"#3E7DCC\"),\n `journal_predict` = proportion_bar(\"#DeF7E9\", unit.scale)\n )\n)\nout\n\n\n\n\n\npaper_abbrev\n\n\neval_name\n\n\ncat_1\n\n\nsource_main\n\n\noverall\n\n\nadv_knowledge\n\n\nmethods\n\n\nlogic_comms\n\n\njournal_predict\n\n\n\n\n\nWell-being: Cash vs. psychotherapy\n\n\nAnonymous_13\n\n\nGH&D\n\n\ninternal-NBER\n\n\n90\n\n\n90\n\n\n90\n\n\n80\n\n\n4.0\n\n\n\n\nWell-being: Cash vs. psychotherapy\n\n\nHannah Metzler\n\n\nGH&D\n\n\ninternal-NBER\n\n\n75\n\n\n70\n\n\n90\n\n\n75\n\n\n3.0\n\n\n\n\nNonprofit Govc.: Randomized healthcare DRC\n\n\nWayne Aaron Sandholtz\n\n\nGH&D\n\n\ninternal-NBER\n\n\n65\n\n\n70\n\n\n60\n\n\n55\n\n\n3.6\n\n\n\n\nLT CEA: Resilient foods vs. AGI safety\n\n\nScott Janzwood\n\n\nlong-term-relevant\n\n\nsubmitted\n\n\n65\n\n\nNA\n\n\nNA\n\n\nNA\n\n\nNA\n\n\n\n\nLT CEA: Resilient foods vs. AGI safety\n\n\nAnca Hanea\n\n\nlong-term-relevant\n\n\nsubmitted\n\n\n80\n\n\n80\n\n\n70\n\n\n85\n\n\n3.5\n\n\n\n\nLT CEA: Resilient foods vs. AGI safety\n\n\nAlex Bates\n\n\nlong-term-relevant\n\n\nsubmitted\n\n\n40\n\n\n30\n\n\n50\n\n\n60\n\n\n2.0\n\n\n\n\nEnv. fx of prod.: ecological obs\n\n\nElias Cisneros\n\n\nNA\n\n\ninternal-NBER\n\n\n88\n\n\n90\n\n\n75\n\n\n80\n\n\n4.0\n\n\n\n\nEnv. fx of prod.: ecological obs\n\n\nAnonymous_12\n\n\nNA\n\n\ninternal-NBER\n\n\n70\n\n\n70\n\n\n70\n\n\n75\n\n\n4.0\n\n\n\n\nCBT Human K, Ghana\n\n\nAnonymous_11\n\n\nNA\n\n\ninternal-NBER\n\n\n75\n\n\n60\n\n\n90\n\n\n70\n\n\n4.0\n\n\n\n\nCBT Human K, Ghana\n\n\nAnonymous_16\n\n\nNA\n\n\ninternal-NBER\n\n\n75\n\n\n65\n\n\n60\n\n\n75\n\n\nNA\n\n\n\n\nBanning wildlife trade can boost demand\n\n\nAnonymous_3\n\n\nconservation\n\n\nsubmitted\n\n\n75\n\n\n70\n\n\n80\n\n\n70\n\n\n3.0\n\n\n\n\nBanning wildlife trade can boost demand\n\n\nLiew Jia Huan\n\n\nconservation\n\n\nsubmitted\n\n\n75\n\n\n80\n\n\n50\n\n\n70\n\n\n2.5\n\n\n\n\nAdvance market commit. (vaccines)\n\n\nDavid Manheim\n\n\npolicy\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n80\n\n\n25\n\n\n95\n\n\n75\n\n\n3.0\n\n\n\n\nAdvance market commit. (vaccines)\n\n\nJoel Tan\n\n\npolicy\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n79\n\n\n90\n\n\n70\n\n\n70\n\n\n5.0\n\n\n\n\nAdvance market commit. (vaccines)\n\n\nDan Tortorice\n\n\npolicy\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n80\n\n\n90\n\n\n80\n\n\n80\n\n\n4.0\n\n\n\n\nAI and econ. growth\n\n\nSeth Benzell\n\n\nmacroeconomics\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n80\n\n\n75\n\n\n80\n\n\n70\n\n\nNA\n\n\n\n\nAI and econ. growth\n\n\nPhil Trammel\n\n\nmacroeconomics\n\n\ninternal-from-syllabus-agenda-policy-database\n\n\n92\n\n\n97\n\n\n70\n\n\n45\n\n\n3.5\n\n\n\n\n\n\n\nNext, look for systematic variation\n\nBy field and topic area of paper\nBy submission/selection route\nBy evaluation manager\n\n… perhaps building a model of this. We are looking for systematic ‘biases and trends’, loosely speaking, to help us better understand how our evaluation system is working.\n\nRelationship among the ratings (and predictions)\n\nCorrelation matrix\nANOVA\nPCA (Principle components)\nWith other ‘control’ factors?\n\nHow do the specific measures predict the aggregate ones (overall rating, merited publication)\n\nCF ‘our suggested weighting’"
},
{
"objectID": "chapters/evaluation_data.html#aggregation-of-expert-opinion-modeling",