diff --git a/_freeze/chapters/aggregation/execute-results/html.json b/_freeze/chapters/aggregation/execute-results/html.json index 8381503..673406f 100644 --- a/_freeze/chapters/aggregation/execute-results/html.json +++ b/_freeze/chapters/aggregation/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "68440e6d131ccee0354bc095732ca0e2", + "hash": "2726ad363da560877f7c0b4345be5343", "result": { - "markdown": "# Aggregation of evaluators judgments (modeling)\n\n\n\n\n\n\n## Notes on sources and approaches\n\n\n::: {.callout-note collapse=\"true\"}\n\n## Hanea et al {-}\n(Consult, e.g., repliCATS/Hanea and others work; meta-science and meta-analysis approaches)\n\n`aggrecat` package\n\n> Although the accuracy, calibration, and informativeness of the majority of methods are very similar, a couple of the aggregation methods consistently distinguish themselves as among the best or worst. Moreover, the majority of methods outperform the usual benchmarks provided by the simple average or the median of estimates.\n\n[Hanea et al, 2021](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0256919#sec007)\n\n However, these are in a different context. Most of those measures are designed to deal with probablistic forecasts for binary outcomes, where the predictor also gives a 'lower bound' and 'upper bound' for that probability. We could roughly compare that to our continuous metrics with 90% CI's (or imputations for these).\n\nFurthermore, many (all their successful measures?) use 'performance-based weights', accessing metrics from prior prediction performance of the same forecasters We do not have these, nor do we have a sensible proxy for this. \n:::\n\n\n::: {.callout-note collapse=\"true\"}\n## D Veen et al (2017)\n\n[link](https://www.researchgate.net/profile/Duco-Veen/publication/319662351_Using_the_Data_Agreement_Criterion_to_Rank_Experts'_Beliefs/links/5b73e2dc299bf14c6da6c663/Using-the-Data-Agreement-Criterion-to-Rank-Experts-Beliefs.pdf)\n\n... we show how experts can be ranked based on their knowledge and their level of (un)certainty. By letting experts specify their knowledge in the form of a probability distribution, we can assess how accurately they can predict new data, and how appropriate their level of (un)certainty is. The expert’s specified probability distribution can be seen as a prior in a Bayesian statistical setting. We evaluate these priors by extending an existing prior-data (dis)agreement measure, the Data Agreement Criterion, and compare this approach to using Bayes factors to assess prior specification. We compare experts with each other and the data to evaluate their appropriateness. Using this method, new research questions can be asked and answered, for instance: Which expert predicts the new data best? Is there agreement between my experts and the data? Which experts’ representation is more valid or useful? Can we reach convergence between expert judgement and data? We provided an empirical example ranking (regional) directors of a large financial institution based on their predictions of turnover. \n\nBe sure to consult the [correction made here](https://www.semanticscholar.org/paper/Correction%3A-Veen%2C-D.%3B-Stoel%2C-D.%3B-Schalken%2C-N.%3B-K.%3B-Veen-Stoel/a2882e0e8606ef876133f25a901771259e7033b1)\n\n::: \n\n\n::: {.callout-note collapse=\"true\"}\n## Also seems relevant:\n\nSee [Gsheet HERE](https://docs.google.com/spreadsheets/d/14japw6eLGpGjEWy1MjHNJXU1skZY_GAIc2uC2HIUalM/edit#gid=0), generated from an Elicit.org inquiry.\n\n\n::: \n\n\n\nIn spite of the caveats in the fold above, we construct some measures of aggregate beliefs using the `aggrecat` package. We will make (and explain) some ad-hoc choices here. We present these:\n\n1. For each paper\n2. For categories of papers and cross-paper categories of evaluations\n3. For the overall set of papers and evaluations\n\nWe can also hold onto these aggregated metrics for later use in modeling.\n\n\n- Simple averaging\n\n- Bayesian approaches \n\n- Best-performing approaches from elsewhere \n\n- Assumptions over unit-level random terms \n\n\n### Simple rating aggregation {-}\n\nBelow, we are preparing the data for the aggreCATS package.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# JB: This section is a work in progress, please do not edit\n\n# paper_ratings: one row per rating category and 'type' (score, upper, lower bound.)\nevals_pub %>% \n select(id, eval_name, paper_abbrev, \n overall, overall_lb_imp, overall_ub_imp,\n adv_knowledge, adv_knowledge_lb_imp, adv_knowledge_ub_imp,\n methods, methods_lb_imp, methods_ub_imp,\n logic_comms, logic_comms_lb_imp, logic_comms_ub_imp,\n real_world, real_world_lb_imp, real_world_ub_imp,\n gp_relevance, gp_relevance_lb_imp, gp_relevance_ub_imp,\n open_sci, open_sci_lb_imp, open_sci_ub_imp) %>% \n rename_with(function(x) paste0(x,\"_score\"), all_of(rating_cats)) %>%\n pivot_longer(cols = c(-id, -eval_name, -paper_abbrev),\n names_pattern = \"(.+)_(score|[ul]b_imp)\",\n names_to = c(\"criterion\",\"element\"),\n values_to = \"value\") -> paper_ratings\n\n# renaming to conform with aggreCATS expectations\npaper_ratings <- paper_ratings %>% \n rename(paper_id = paper_abbrev,\n user_name = eval_name) %>% \n mutate(round = \"round_1\",\n element = case_when(element == \"lb_imp\" ~ \"three_point_lower\",\n element == \"ub_imp\" ~ \"three_point_upper\",\n element == \"score\" ~ \"three_point_best\"))\n\n# filter only overall for now\npaper_ratings %>% \n filter(criterion == \"overall\") %>% \n group_by(user_name, paper_id) %>% \n filter(sum(is.na(value))==0) %>% \n ungroup() -> temp\n \n\nAverageWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"ArMean\")\n\nIntervalWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"IntWAgg\")\n\naggreCAT::DistributionWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"DistribArMean\", percent_toggle = T)\n\n# EXAMPLE CODE ===============================\n# data(data_ratings)\n# set.seed(1234)\n# \n# participant_subset <- data_ratings %>%\n# distinct(user_name) %>%\n# sample_n(5) %>%\n# mutate(participant_name = paste(\"participant\", rep(1:n())))\n# \n# single_claim <- data_ratings %>%\n# filter(paper_id == \"28\") %>%\n# right_join(participant_subset, by = \"user_name\") %>%\n# filter(grepl(x = element, pattern = \"three_.+\")) %>%\n# select(-group, -participant_name, -question)\n# \n# DistributionWAgg(expert_judgements = single_claim,\n# type = \"DistribArMean\", percent_toggle = T)\n# \n```\n:::\n\n\n\n\n\n### Explicit modeling of 'research quality' (for use in prizes, etc.) {-}\n\n- Use the above aggregation as the outcome of interest, or weight towards categories of greater interest?\n\n- Model with controls -- look for greatest positive residual? \n\n\n## Inter-rater reliability\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](aggregation_files/figure-html/unnamed-chunk-1-1.png){width=672}\n:::\n:::\n\n\n\n\n## Decomposing variation, dimension reduction, simple linear models\n\n\n## Later possiblities\n\n- Relation to evaluation text content (NLP?)\n\n- Relation/prediction of later outcomes (traditional publication, citations, replication)\n", + "markdown": "# Aggregation of evaluators judgments (modeling)\n\n\n\n\n\n\n## Notes on sources and approaches\n\n\n::: {.callout-note collapse=\"true\"}\n\n## Hanea et al {-}\n(Consult, e.g., repliCATS/Hanea and others work; meta-science and meta-analysis approaches)\n\n`aggrecat` package\n\n> Although the accuracy, calibration, and informativeness of the majority of methods are very similar, a couple of the aggregation methods consistently distinguish themselves as among the best or worst. Moreover, the majority of methods outperform the usual benchmarks provided by the simple average or the median of estimates.\n\n[Hanea et al, 2021](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0256919#sec007)\n\n However, these are in a different context. Most of those measures are designed to deal with probablistic forecasts for binary outcomes, where the predictor also gives a 'lower bound' and 'upper bound' for that probability. We could roughly compare that to our continuous metrics with 90% CI's (or imputations for these).\n\nFurthermore, many (all their successful measures?) use 'performance-based weights', accessing metrics from prior prediction performance of the same forecasters We do not have these, nor do we have a sensible proxy for this. \n:::\n\n\n::: {.callout-note collapse=\"true\"}\n## D Veen et al (2017)\n\n[link](https://www.researchgate.net/profile/Duco-Veen/publication/319662351_Using_the_Data_Agreement_Criterion_to_Rank_Experts'_Beliefs/links/5b73e2dc299bf14c6da6c663/Using-the-Data-Agreement-Criterion-to-Rank-Experts-Beliefs.pdf)\n\n... we show how experts can be ranked based on their knowledge and their level of (un)certainty. By letting experts specify their knowledge in the form of a probability distribution, we can assess how accurately they can predict new data, and how appropriate their level of (un)certainty is. The expert’s specified probability distribution can be seen as a prior in a Bayesian statistical setting. We evaluate these priors by extending an existing prior-data (dis)agreement measure, the Data Agreement Criterion, and compare this approach to using Bayes factors to assess prior specification. We compare experts with each other and the data to evaluate their appropriateness. Using this method, new research questions can be asked and answered, for instance: Which expert predicts the new data best? Is there agreement between my experts and the data? Which experts’ representation is more valid or useful? Can we reach convergence between expert judgement and data? We provided an empirical example ranking (regional) directors of a large financial institution based on their predictions of turnover. \n\nBe sure to consult the [correction made here](https://www.semanticscholar.org/paper/Correction%3A-Veen%2C-D.%3B-Stoel%2C-D.%3B-Schalken%2C-N.%3B-K.%3B-Veen-Stoel/a2882e0e8606ef876133f25a901771259e7033b1)\n\n::: \n\n\n::: {.callout-note collapse=\"true\"}\n## Also seems relevant:\n\nSee [Gsheet HERE](https://docs.google.com/spreadsheets/d/14japw6eLGpGjEWy1MjHNJXU1skZY_GAIc2uC2HIUalM/edit#gid=0), generated from an Elicit.org inquiry.\n\n\n::: \n\n\n\nIn spite of the caveats in the fold above, we construct some measures of aggregate beliefs using the `aggrecat` package. We will make (and explain) some ad-hoc choices here. We present these:\n\n1. For each paper\n2. For categories of papers and cross-paper categories of evaluations\n3. For the overall set of papers and evaluations\n\nWe can also hold onto these aggregated metrics for later use in modeling.\n\n\n- Simple averaging\n\n- Bayesian approaches \n\n- Best-performing approaches from elsewhere \n\n- Assumptions over unit-level random terms \n\n\n### Simple rating aggregation {-}\n\nBelow, we are preparing the data for the aggreCATS package.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# JB: This section is a work in progress, please do not edit\n\n# paper_ratings: one row per rating category and 'type' (score, upper, lower bound.)\nevals_pub %>% \n select(id, eval_name, paper_abbrev, \n overall, overall_lb_imp, overall_ub_imp,\n adv_knowledge, adv_knowledge_lb_imp, adv_knowledge_ub_imp,\n methods, methods_lb_imp, methods_ub_imp,\n logic_comms, logic_comms_lb_imp, logic_comms_ub_imp,\n real_world, real_world_lb_imp, real_world_ub_imp,\n gp_relevance, gp_relevance_lb_imp, gp_relevance_ub_imp,\n open_sci, open_sci_lb_imp, open_sci_ub_imp) %>% \n rename_with(function(x) paste0(x,\"_score\"), all_of(rating_cats)) %>%\n pivot_longer(cols = c(-id, -eval_name, -paper_abbrev),\n names_pattern = \"(.+)_(score|[ul]b_imp)\",\n names_to = c(\"criterion\",\"element\"),\n values_to = \"value\") -> paper_ratings\n\n# renaming to conform with aggreCATS expectations\npaper_ratings <- paper_ratings %>% \n rename(paper_id = paper_abbrev,\n user_name = eval_name) %>% \n mutate(round = \"round_1\",\n element = case_when(element == \"lb_imp\" ~ \"three_point_lower\",\n element == \"ub_imp\" ~ \"three_point_upper\",\n element == \"score\" ~ \"three_point_best\"))\n\n# filter only overall for now\npaper_ratings %>% \n filter(criterion == \"overall\") %>% \n group_by(user_name, paper_id) %>% \n filter(sum(is.na(value))==0) %>% \n ungroup() -> temp\n \n\nAverageWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"ArMean\")\n\nIntervalWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"IntWAgg\")\n\naggreCAT::DistributionWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"DistribArMean\", percent_toggle = T)\n\n# EXAMPLE CODE ===============================\n# data(data_ratings)\n# set.seed(1234)\n# \n# participant_subset <- data_ratings %>%\n# distinct(user_name) %>%\n# sample_n(5) %>%\n# mutate(participant_name = paste(\"participant\", rep(1:n())))\n# \n# single_claim <- data_ratings %>%\n# filter(paper_id == \"28\") %>%\n# right_join(participant_subset, by = \"user_name\") %>%\n# filter(grepl(x = element, pattern = \"three_.+\")) %>%\n# select(-group, -participant_name, -question)\n# \n# DistributionWAgg(expert_judgements = single_claim,\n# type = \"DistribArMean\", percent_toggle = T)\n# \n```\n:::\n\n\n\n\n\n### Explicit modeling of 'research quality' (for use in prizes, etc.) {-}\n\n- Use the above aggregation as the outcome of interest, or weight towards categories of greater interest?\n\n- Model with controls -- look for greatest positive residual? \n\n\n## Inter-rater reliability\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](aggregation_files/figure-html/unnamed-chunk-1-1.png){width=672}\n:::\n:::\n\n\n\n\n\n\n## Decomposing variation, dimension reduction, simple linear models\n\n\n## Later possiblities\n\n- Relation to evaluation text content (NLP?)\n\n- Relation/prediction of later outcomes (traditional publication, citations, replication)\n", "supporting": [ "aggregation_files" ], diff --git a/_freeze/chapters/evaluation_data_analysis/execute-results/html.json b/_freeze/chapters/evaluation_data_analysis/execute-results/html.json index 8ab47bb..367f547 100644 --- a/_freeze/chapters/evaluation_data_analysis/execute-results/html.json +++ b/_freeze/chapters/evaluation_data_analysis/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "248c42795c0e7b2fbe512822cb705803", + "hash": "2782bd0646e3cbf4457779179f0ba134", "result": { - "markdown": "# Evaluation data: description, exploration, checks\n\n## Data input, cleaning, feature construction and imputation \n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"load packages\"}\nlibrary(tidyverse) \n\n# markdown et al. ----\nlibrary(knitr)\nlibrary(bookdown)\nlibrary(rmarkdown)\nlibrary(shiny)\nlibrary(quarto)\nlibrary(formattable) # Create 'Formattable' Data Structures\nlibrary(DT) # R interface to DataTables library (JavaScript)\n\n# dataviz ----\nlibrary(ggrepel)\nlibrary(plotly) # Create Interactive Web Graphics via 'plotly.js'\n\n# others ----\nlibrary(here) # A Simpler Way to Find Your Files\n# renv::install(packages = \"metamelb-repliCATS/aggreCAT\")\n#library(aggreCAT)\n\n# Make sure select is always the dplyr version\nselect <- dplyr::select \n\n# options\noptions(knitr.duplicate.label = \"allow\")\noptions(mc.cores = parallel::detectCores())\n```\n:::\n\n\n\n::: {.callout-note collapse=\"true\"}\n## Note on data input (10-Aug-23)\n\nBelow, the evaluation data is input from an Airtable, which itself was largely hand-input from evaluators' reports. As PubPub builds (target: end of Sept. 2023), this will allow us to include the ratings and predictions as structured data objects. We then plan to access and input this data *directly* from the PubPub (API?) into the present analysis. This will improve automation and limit the potential for data entry errors.\n\n::: \n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"Input evaluation data\"}\nevals_pub <- readRDS(file = here(\"data\", \"evals.Rdata\"))\nall_papers_p <- readRDS(file = here(\"data\", \"all_papers_p.Rdata\"))\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"Define lists of columns to use later\"}\n# Lists of categories\nrating_cats <- c(\"overall\", \"adv_knowledge\", \"methods\", \"logic_comms\", \"real_world\", \"gp_relevance\", \"open_sci\")\n\n#... 'predictions' are currently 1-5 (0-5?)\npred_cats <- c(\"journal_predict\", \"merits_journal\")\n```\n:::\n\n\n\n \n## Basic presentation\n\n### What sorts of papers/projects are we considering and evaluating? {-}\n\nIn this section, we give some simple data summaries and visualizations, for a broad description of The Unjournal's coverage. \n\nIn the interactive table below we give some key attributes of the papers and the evaluators.\n\n\n::: column-body-outset \n\n\n::: {.cell}\n\n```{.r .cell-code}\nevals_pub_df_overview <- evals_pub %>%\n arrange(paper_abbrev, eval_name) %>%\n dplyr::select(paper_abbrev, crucial_rsx, eval_name, cat_1, cat_2, source_main, author_agreement) %>%\n dplyr::select(-matches(\"ub_|lb_|conf\")) \n\nevals_pub_df_overview %>% \n rename(\n \"Paper Abbreviation\" = paper_abbrev,\n \"Paper name\" = crucial_rsx,\n \"Evaluator Name\" = eval_name,\n \"Main category\" = cat_1,\n \"Category 2\" = cat_2,\n \"Main source\" = source_main,\n \"Author contact\" = author_agreement,\n ) %>% \n DT::datatable(\n caption = \"Evaluations (confidence bounds not shown)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 5,\n columnDefs = list(list(width = '150px', targets = 1)))) %>% \n formatStyle(columns = 2:ncol(evals_pub_df_overview), \n textAlign = 'center') %>% \nformatStyle(\n \"Paper name\",\n fontSize = '10px'\n )\n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n\n```{.r .cell-code}\nrm(evals_pub_df_overview)\n```\n:::\n\n\n:::\n\n\n\n#### Evaluation metrics (ratings) {-}\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrename_dtstuff <- function(df){\n df %>% \n rename(\n \"Paper Abbreviation\" = paper_abbrev,\n \"Evaluator Name\" = eval_name,\n \"Advancing knowledge\" = adv_knowledge,\n \"Methods\" = methods,\n \"Logic & comm.\" = logic_comms,\n \"Real world engagement\" = real_world,\n \"Global priorities relevance\" = gp_relevance,\n \"Open Science\" = open_sci\n )\n}\n```\n:::\n\n\n\n::: column-body-outset \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Need to find a way to control column width but it seems to be a problem with DT\n# https://github.com/rstudio/DT/issues/29\n\n# we didn't seem to be using all_evals_dt so I removed it to increase readability\n\n\n\nevals_pub_df <- evals_pub %>%\n # Arrange data\n arrange(paper_abbrev, eval_name, overall) %>%\n \n # Select and rename columns\n dplyr::select(paper_abbrev, eval_name, all_of(rating_cats)) %>%\n rename_dtstuff \n\n\n(\n evals_pub_dt <- evals_pub_df %>% \n # Convert to a datatable and apply styling\n datatable(\n caption = \"Evaluations and predictions (confidence bounds not shown)\", \n filter = 'top',\n rownames = FALSE,\n options = list(pageLength = 5, \n columnDefs = list(list(width = '150px', targets = 0)))) %>% \n formatStyle(columns = 2:ncol(evals_pub_df), \n textAlign = 'center')\n)\n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n\n:::\n\n\\\n\nNext, a preview of the evaluations, focusing on the 'middle ratings and predictions':\n\n::: column-body-outset \n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"Data datable (all shareable relevant data)\"}\n# we didn't seem to be using all_evals_dt so I removed it to increase readability\n\n\nevals_pub %>%\n arrange(paper_abbrev, eval_name, overall) %>%\n dplyr::select(paper_abbrev, eval_name, all_of(rating_cats)) %>%\n rename_dtstuff %>% \n DT::datatable(\n caption = \"Evaluations and predictions (confidence bounds not shown)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 5,\n columnDefs = list(list(width = '150px', targets = 0))) \n\n )\n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n\n:::\n\n\\ \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# we did not seem to be using all_evals_dt_ci so I removed it to improve readability\nevals_pub %>%\n arrange(paper_abbrev, eval_name) %>%\n dplyr::select(paper_abbrev, eval_name, conf_overall, all_of(rating_cats), matches(\"ub_imp|lb_imp\")) %>%\n rename_dtstuff %>% \n DT::datatable(\n caption = \"Evaluations and (imputed*) confidence bounds)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 5)\n )\n```\n:::\n\n:::\n\n\n\n\n\n::: {.callout-note collapse=\"true\"}\n##### Next consider...\n\n- Composition of research evaluated\n - By field (economics, psychology, etc.)\n - By subfield of economics \n - By topic/cause area (Global health, economic development, impact of technology, global catastrophic risks, etc. )\n - By source (submitted, identified with author permission, direct evaluation)\n \n- Timing of intake and evaluation^[Consider: timing might be its own section or chapter; this is a major thing journals track, and we want to keep track of ourselves]\n\n:::\n\n#### Paper selection {-}\n\nThe Sankey diagram below starts with the papers we prioritized for likely *Unjournal* evaluation:^[Those marked as 'considering' in the Airtable].\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#Add in the 3 different evaluation input sources\n#update to be automated rather than hard-coded - to look at David's work here\n\npapers_considered <- all_papers_p %>% \n nrow()\n\npapers_deprio <- all_papers_p %>% \n filter(`stage of process/todo` == \"de-prioritized\") %>% \n nrow()\n\npapers_evaluated <- all_papers_p %>% \n filter(`stage of process/todo` %in% c(\"published\",\n \"contacting/awaiting_authors_response_to_evaluation\",\n \"awaiting_publication_ME_comments\",\"awaiting_evaluations\")) %>% \n nrow()\n\npapers_complete <- all_papers_p %>% \n filter(`stage of process/todo` == \"published\") %>%\n nrow()\n\npapers_in_progress <- papers_evaluated - papers_complete\n\npapers_still_in_consideration <- all_papers_p %>% filter(`stage of process/todo` == \"considering\") %>% nrow()\n\n\n#todo: adjust wording of hover notes ('source, target...etc')\n\nfig <- plot_ly(\n type = \"sankey\",\n orientation = \"h\",\n \n node = list(\n label = c(\"Prioritized\", \"Evaluating\", \"Complete\", \"In progress\", \"Still in consideration\", \"De-prioritized\"),\n color = c(\"orange\", \"green\", \"green\", \"orange\", \"orange\", \"red\"),\n #Todo: adjust 'location' to group these left to right\n pad = 15,\n thickness = 20,\n line = list(\n color = \"black\",\n width = 0.5\n )\n ),\n \n link = list(\n source = c(0,1,1,0,0),\n target = c(1,2,3,4,5),\n value = c(\n papers_evaluated,\n papers_complete,\n papers_in_progress,\n papers_still_in_consideration,\n papers_deprio\n ))\n)\nfig <- fig %>% layout(\n title = \"Unjournal paper funnel\",\n font = list(\n size = 10\n )\n)\n\nfig \n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n\n\n\nTodo: ^[Make interactive/dashboards of the elements below]\n\n#### Paper categories {-}\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nevals_pub %>% \n select(paper_abbrev, starts_with(\"cat_\")) %>%\n distinct() %>% \n pivot_longer(cols = starts_with(\"cat_\"), names_to = \"CatNum\", values_to = \"Category\") %>% \n group_by(CatNum, Category) %>% \n count() %>% \n filter(!is.na(Category)) %>% \n mutate(Category = str_to_title(Category),\n CatNum = ordered(CatNum, \n levels = c(\"cat_1\", \"cat_2\", \"cat_3\"),\n labels = c(\"Primary\", \"Secondary\", \"Tertiary\"))) %>%\n ggplot(aes(x = reorder(Category, -n), y = n)) +\n geom_bar(aes(fill = CatNum), stat = \"identity\", color = \"grey30\") + \n labs(x = \"Paper category\", y = \"Count\", fill = \"Cat Level\",\n title = \"Paper categories represented in pilot data\") +\n theme_bw() +\n facet_grid(~CatNum, scales=\"free_x\", space=\"free_x\") +\n theme(axis.text.x=element_text(angle=45,hjust=1)) +\n theme(legend.position = \"none\")\n```\n\n::: {.cell-output-display}\n![](evaluation_data_analysis_files/figure-html/all_categories-1.png){width=672}\n:::\n:::\n\n\n#### Paper source {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Bar plot\nevals_pub %>% \n rowwise() %>% \n mutate(source_main = str_replace_all(string = source_main, \n pattern = \"-\", \n replace = \" \") %>% str_to_title()) %>%\n select(paper_abbrev, source_main) %>% \n distinct() %>%\n ggplot(aes(x = source_main)) + \n geom_bar(position = \"stack\", stat = \"count\", color = \"grey30\", fill = \"grey80\") +\n labs(x = \"Source\", y = \"Count\") +\n labs(title = \"Pool of research/evaluations by paper source\") +\n theme_bw() +\n theme(text = element_text(size = 15)) +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 20))\n```\n\n::: {.cell-output-display}\n![](evaluation_data_analysis_files/figure-html/paper_source-1.png){width=672}\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# JB: Most of these should probably be cleaned in data storage\n\nlibrary(RColorBrewer) # for color palettes\n\n# paper statuses that are considered \"being evaluated\"\neval_true = c(\"published\", \n \"contacting/awaiting_authors_response_to_evaluation\",\n \"awaiting_publication_ME_comments\",\n \"awaiting_evaluations\")\n\n# Is the paper being evaluated? \nall_papers_p <- all_papers_p %>% \n mutate(is_evaluated = if_else(`stage of process/todo` %in% eval_true, TRUE, FALSE))\n\n# main source clean\nall_papers_p <- all_papers_p %>% \n mutate(source_main = case_when(source_main == \"NA\" ~ \"Not applicable\",\n source_main == \"internal-from-syllabus-agenda-policy-database\" ~ \"Internal: syllabus, agenda, etc.\",\n is.na(source_main) ~ \"Unknown\",\n TRUE ~ source_main))\n\nall_papers_p %>% \nggplot(aes(x = fct_infreq(source_main), fill = is_evaluated)) + \n geom_bar(position = \"stack\", stat = \"count\") +\n labs(x = \"Source\", y = \"Count\", fill = \"Selected for\\nevaluation?\") +\n coord_flip() + # flipping the coordinates to have categories on y-axis (on the left)\n labs(title = \"Evaluations by source of the paper\") +\n theme_bw() +\n theme(text = element_text(size = 15)) +\n scale_fill_brewer(palette = \"Set1\") +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 20))\n```\n\n::: {.cell-output-display}\n![](evaluation_data_analysis_files/figure-html/data clean-1.png){width=672}\n:::\n:::\n\n\n\n## The distribution of ratings and predictions {-}\n\nNext, we present the ratings and predictions along with 'uncertainty measures'.^[We use \"ub imp\" (and \"lb imp\") to denote the upper and lower bounds given by evaluators.] Where evaluators gave only a 1-5 confidence level^[More or less, the ones who report a level for 'conf overall', although some people did this for some but not others], we use the imputations discussed and coded above. \n\n\n- For each category and prediction (overall and by paper)\n\n::: {.cell}\n\n```{.r .cell-code}\n# evals_pub %>% \n# select(matches(\"overall\")) %>% \n# view()\n```\n:::\n\n\n\n::: column-body-outset\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Generate a color palette with more colors\ncolor_count <- length(unique(evals_pub$paper_abbrev))\ncolor_palette <- colorRampPalette(brewer.pal(8, \"Set1\"))(color_count)\n\n# set one \"set\" of dodge width values across layers\npd = position_dodge(width = 0.8)\n\n# Dot plot\ng1 <- evals_pub %>% \n ggplot(aes(x = paper_abbrev, y = overall, \n text = paste0('Evaluator: ', eval_name, # tooltip data\n '
Rating [CI]: ', overall, \" [\", overall_lb_imp, \", \", overall_ub_imp, \"]\"))) +\n geom_point(aes(color = paper_abbrev), \n stat = \"identity\", size = 2, shape = 18, stroke = 1, \n position = pd) +\n geom_linerange(aes(ymin = overall_lb_imp, ymax = overall_ub_imp, color = paper_abbrev), \n position = pd) +\n geom_text(data = subset(evals_pub, str_detect(eval_name, \"Anonymous\")),\n aes(label = \"anon.\"), size=3) +\n coord_flip() + # flipping the coordinates to have categories on y-axis (on the left)\n labs(x = \"Paper\", y = \"Overall score\",\n title = \"Overall scores of evaluated papers\") +\n theme_bw() +\n theme(text = element_text(size = 15)) +\n theme(legend.position = \"none\") +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 20)) + \n scale_color_manual(values = color_palette)\n\n\nggplotly(g1, tooltip = c(\"text\"))\n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n\n:::\n\nIn future, we aim to build a dashboard allowing people to use the complete set of ratings and predictions, and choose their own weightings. (Also incorporating the evaluator uncertainty in reasonable ways.)\n\n### Shiny dashboard {-}\n\n::: column-body-outset\n\n\n```{=html}\n\n\n\n```\n\n\n:::\n\n::: {.callout-note collapse=\"true\"}\n## Future vis\n\nSpider or radial chart \n\nEach rating is a dimension or attribute (potentially normalized)\npotentially superimpose a 'circle' for the suggested weighting or overall. \n\nEach paper gets its own spider, with all others (or the average) in faded color behind it as a comparator. \n\nIdeally user can switch on/off \n\nBeware -- people infer things from the shape's size\n\n::: \n\n\n::: column-body-outset\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# JB: what is the purpose of this table? It's very large and I'm not totally\n# sure what it's doing so I'm just turning it off for now\nunit.scale = function(x) (x*100 - min(x*100)) / (max(x*100) - min(x*100))\n\nevaluations_table <- evals_pub %>%\n select(paper_abbrev, eval_name, cat_1, \n source_main, overall, adv_knowledge,\n methods, logic_comms, journal_predict) %>%\n arrange(desc(paper_abbrev))\n\nformattable(\n evaluations_table,\n list(\n #area(col = 5:8) ~ function(x) percent(x / 100, digits = 0),\n area(col = 5:8) ~ color_tile(\"#FA614B66\",\"#3E7DCC\"),\n `journal_predict` = proportion_bar(\"#DeF7E9\", unit.scale)\n )\n)\n```\n:::\n\n:::\n\n\n### Sources of variation {-}\n\nNext, look for systematic variation in the ratings \n\n- By field and topic area of paper\n\n- By submission/selection route\n\n- By evaluation manager (or their seniority, or whether they are US/Commonwealth/Other)^[DR: My theory is that people in commonwealth countries target a 70+ as 'strong' (because of their marking system) and that may drive a bias.]\n\n... perhaps building a model of this. We are looking for systematic 'biases and trends', loosely speaking, to help us better understand how our evaluation system is working.\n\n\\\n\n\n### Relationship among the ratings (and predictions) {-} \n\n::: {.callout-note collapse=\"true\"}\n## Next steps (suggested analyses)\n\n- Correlation matrix\n\n- ANOVA\n\n- PCA (Principle components)\n\n- With other 'control' factors?\n\n- How do the specific measures predict the aggregate ones (overall rating, merited publication)\n - CF 'our suggested weighting'\n\n::: \n\n\nNext chapter (analysis): *aggregation of evaluator judgment*\n\n\n::: {.callout-note collapse=\"true\"}\n## Scoping our future coverage\n\nWe have funding to evaluate roughly 50-70 papers/projects per year, given our proposed incentives.\n\nConsider:\n\n- How many relevant NBER papers come out per year?\n\n- How much relevant work in other prestige archives?\n\n- What quotas do we want (by cause, etc.) and how feasible are these?\n\n::: \n\n", + "markdown": "# Evaluation data: description, exploration, checks\n\n## Data input, cleaning, feature construction and imputation \n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"load packages\"}\nlibrary(tidyverse) \n\n# markdown et al. ----\nlibrary(knitr)\nlibrary(bookdown)\nlibrary(rmarkdown)\nlibrary(shiny)\nlibrary(quarto)\nlibrary(formattable) # Create 'Formattable' Data Structures\nlibrary(DT) # R interface to DataTables library (JavaScript)\n\n# dataviz ----\nlibrary(ggrepel)\nlibrary(plotly) # Create Interactive Web Graphics via 'plotly.js'\n\n# others ----\nlibrary(here) # A Simpler Way to Find Your Files\n# renv::install(packages = \"metamelb-repliCATS/aggreCAT\")\n#library(aggreCAT)\n\n# Make sure select is always the dplyr version\nselect <- dplyr::select \n\n# options\noptions(knitr.duplicate.label = \"allow\")\noptions(mc.cores = parallel::detectCores())\n```\n:::\n\n\n\n::: {.callout-note collapse=\"true\"}\n## Note on data input (10-Aug-23)\n\nBelow, the evaluation data is input from an Airtable, which itself was largely hand-input from evaluators' reports. As PubPub builds (target: end of Sept. 2023), this will allow us to include the ratings and predictions as structured data objects. We then plan to access and input this data *directly* from the PubPub (API?) into the present analysis. This will improve automation and limit the potential for data entry errors.\n\n::: \n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"Input evaluation data\"}\nevals_pub <- readRDS(file = here(\"data\", \"evals.Rdata\"))\nall_papers_p <- readRDS(file = here(\"data\", \"all_papers_p.Rdata\"))\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"Define lists of columns to use later\"}\n# Lists of categories\nrating_cats <- c(\"overall\", \"adv_knowledge\", \"methods\", \"logic_comms\", \"real_world\", \"gp_relevance\", \"open_sci\")\n\n#... 'predictions' are currently 1-5 (0-5?)\npred_cats <- c(\"journal_predict\", \"merits_journal\")\n```\n:::\n\n\n\n \n## Basic presentation\n\n### What sorts of papers/projects are we considering and evaluating? {-}\n\nIn this section, we give some simple data summaries and visualizations, for a broad description of The Unjournal's coverage. \n\nIn the interactive table below we give some key attributes of the papers and the evaluators.\n\n\n::: column-body-outset \n\n\n::: {.cell}\n\n```{.r .cell-code}\nevals_pub_df_overview <- evals_pub %>%\n arrange(paper_abbrev, eval_name) %>%\n dplyr::select(paper_abbrev, crucial_rsx, eval_name, cat_1, cat_2, source_main, author_agreement) %>%\n dplyr::select(-matches(\"ub_|lb_|conf\")) \n\nevals_pub_df_overview %>% \n rename(\n \"Paper Abbreviation\" = paper_abbrev,\n \"Paper name\" = crucial_rsx,\n \"Evaluator Name\" = eval_name,\n \"Main category\" = cat_1,\n \"Category 2\" = cat_2,\n \"Main source\" = source_main,\n \"Author contact\" = author_agreement,\n ) %>% \n DT::datatable(\n caption = \"Evaluations (confidence bounds not shown)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 5,\n columnDefs = list(list(width = '150px', targets = 1)))) %>% \n formatStyle(columns = 2:ncol(evals_pub_df_overview), \n textAlign = 'center') %>% \nformatStyle(\n \"Paper name\",\n fontSize = '10px'\n )\n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n\n```{.r .cell-code}\nrm(evals_pub_df_overview)\n```\n:::\n\n\n:::\n\n\n\n#### Evaluation metrics (ratings) {-}\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrename_dtstuff <- function(df){\n df %>% \n rename(\n \"Paper Abbreviation\" = paper_abbrev,\n \"Evaluator Name\" = eval_name,\n \"Advancing knowledge\" = adv_knowledge,\n \"Methods\" = methods,\n \"Logic & comm.\" = logic_comms,\n \"Real world engagement\" = real_world,\n \"Global priorities relevance\" = gp_relevance,\n \"Open Science\" = open_sci\n )\n}\n```\n:::\n\n\n\n::: column-body-outset \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Need to find a way to control column width but it seems to be a problem with DT\n# https://github.com/rstudio/DT/issues/29\n\n\nevals_pub_df <- evals_pub %>%\n # Arrange data\n arrange(paper_abbrev, eval_name, overall) %>%\n \n # Select and rename columns\n dplyr::select(paper_abbrev, eval_name, all_of(rating_cats)) %>%\n rename_dtstuff \n\n\n(\n evals_pub_dt <- evals_pub_df %>% \n # Convert to a datatable and apply styling\n datatable(\n caption = \"Evaluations and predictions (confidence bounds not shown)\", \n filter = 'top',\n rownames = FALSE,\n options = list(pageLength = 5, \n columnDefs = list(list(width = '150px', targets = 0)))) %>% \n formatStyle(columns = 2:ncol(evals_pub_df), \n textAlign = 'center')\n)\n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n\n:::\n\n\\\n\nNext, a preview of the evaluations, focusing on the 'middle ratings and predictions':\n\n::: column-body-outset \n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"Data datable (all shareable relevant data)\"}\n# we didn't seem to be using all_evals_dt so I removed it to increase readability\n\n\nevals_pub %>%\n arrange(paper_abbrev, eval_name, overall) %>%\n dplyr::select(paper_abbrev, eval_name, all_of(rating_cats)) %>%\n rename_dtstuff %>% \n DT::datatable(\n caption = \"Evaluations and predictions (confidence bounds not shown)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 5,\n columnDefs = list(list(width = '150px', targets = 0))) \n\n )\n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n\n:::\n\n\\ \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# we did not seem to be using all_evals_dt_ci so I removed it to improve readability\nevals_pub %>%\n arrange(paper_abbrev, eval_name) %>%\n dplyr::select(paper_abbrev, eval_name, conf_overall, all_of(rating_cats), matches(\"ub_imp|lb_imp\")) %>%\n rename_dtstuff %>% \n DT::datatable(\n caption = \"Evaluations and (imputed*) confidence bounds)\", \n filter = 'top',\n rownames= FALSE,\n options = list(pageLength = 5)\n )\n```\n:::\n\n:::\n\n\n\n\n\n::: {.callout-note collapse=\"true\"}\n##### Next consider...\n\n- Composition of research evaluated\n - By field (economics, psychology, etc.)\n - By subfield of economics \n - By topic/cause area (Global health, economic development, impact of technology, global catastrophic risks, etc. )\n - By source (submitted, identified with author permission, direct evaluation)\n \n- Timing of intake and evaluation^[Consider: timing might be its own section or chapter; this is a major thing journals track, and we want to keep track of ourselves]\n\n:::\n\n#### Paper selection {-}\n\nThe Sankey diagram below starts with the papers we prioritized for likely *Unjournal* evaluation:^[Those marked as 'considering' in the Airtable].\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#Add in the 3 different evaluation input sources\n#update to be automated rather than hard-coded - to look at David's work here\n\npapers_considered <- all_papers_p %>% \n nrow()\n\npapers_deprio <- all_papers_p %>% \n filter(`stage of process/todo` == \"de-prioritized\") %>% \n nrow()\n\npapers_evaluated <- all_papers_p %>% \n filter(`stage of process/todo` %in% c(\"published\",\n \"contacting/awaiting_authors_response_to_evaluation\",\n \"awaiting_publication_ME_comments\",\"awaiting_evaluations\")) %>% \n nrow()\n\npapers_complete <- all_papers_p %>% \n filter(`stage of process/todo` == \"published\") %>%\n nrow()\n\npapers_in_progress <- papers_evaluated - papers_complete\n\npapers_still_in_consideration <- all_papers_p %>% filter(`stage of process/todo` == \"considering\") %>% nrow()\n\n\n#todo: adjust wording of hover notes ('source, target...etc')\n\nfig <- plot_ly(\n type = \"sankey\",\n orientation = \"h\",\n \n node = list(\n label = c(\"Prioritized\", \"Evaluating\", \"Complete\", \"In progress\", \"Still in consideration\", \"De-prioritized\"),\n color = c(\"orange\", \"green\", \"green\", \"orange\", \"orange\", \"red\"),\n #Todo: adjust 'location' to group these left to right\n pad = 15,\n thickness = 20,\n line = list(\n color = \"black\",\n width = 0.5\n )\n ),\n \n link = list(\n source = c(0,1,1,0,0),\n target = c(1,2,3,4,5),\n value = c(\n papers_evaluated,\n papers_complete,\n papers_in_progress,\n papers_still_in_consideration,\n papers_deprio\n ))\n)\nfig <- fig %>% layout(\n title = \"Unjournal paper funnel\",\n font = list(\n size = 10\n )\n)\n\nfig \n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n\n\n\nTodo: ^[Make interactive/dashboards of the elements below]\n\n#### Paper categories {-}\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nevals_pub %>% \n select(paper_abbrev, starts_with(\"cat_\")) %>%\n distinct() %>% \n pivot_longer(cols = starts_with(\"cat_\"), names_to = \"CatNum\", values_to = \"Category\") %>% \n group_by(CatNum, Category) %>% \n count() %>% \n filter(!is.na(Category)) %>% \n mutate(Category = str_to_title(Category),\n CatNum = ordered(CatNum, \n levels = c(\"cat_1\", \"cat_2\", \"cat_3\"),\n labels = c(\"Primary\", \"Secondary\", \"Tertiary\"))) %>%\n ggplot(aes(x = reorder(Category, -n), y = n)) +\n geom_bar(aes(fill = CatNum), stat = \"identity\", color = \"grey30\") + \n labs(x = \"Paper category\", y = \"Count\", fill = \"Cat Level\",\n title = \"Paper categories represented in pilot data\") +\n theme_bw() +\n facet_grid(~CatNum, scales=\"free_x\", space=\"free_x\") +\n theme(axis.text.x=element_text(angle=45,hjust=1)) +\n theme(legend.position = \"none\")\n```\n\n::: {.cell-output-display}\n![](evaluation_data_analysis_files/figure-html/all_categories-1.png){width=672}\n:::\n:::\n\n\n#### Paper source {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Bar plot\nevals_pub %>% \n rowwise() %>% \n mutate(source_main = str_replace_all(string = source_main, \n pattern = \"-\", \n replace = \" \") %>% str_to_title()) %>%\n select(paper_abbrev, source_main) %>% \n distinct() %>%\n ggplot(aes(x = source_main)) + \n geom_bar(position = \"stack\", stat = \"count\", color = \"grey30\", fill = \"grey80\") +\n labs(x = \"Source\", y = \"Count\") +\n labs(title = \"Pool of research/evaluations by paper source\") +\n theme_bw() +\n theme(text = element_text(size = 15)) +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 20))\n```\n\n::: {.cell-output-display}\n![](evaluation_data_analysis_files/figure-html/paper_source-1.png){width=672}\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# JB: Most of these should probably be cleaned in data storage\n\nlibrary(RColorBrewer) # for color palettes\n\n# paper statuses that are considered \"being evaluated\"\neval_true = c(\"published\", \n \"contacting/awaiting_authors_response_to_evaluation\",\n \"awaiting_publication_ME_comments\",\n \"awaiting_evaluations\")\n\n# Is the paper being evaluated? \nall_papers_p <- all_papers_p %>% \n mutate(is_evaluated = if_else(`stage of process/todo` %in% eval_true, TRUE, FALSE))\n\n# main source clean\nall_papers_p <- all_papers_p %>% \n mutate(source_main = case_when(source_main == \"NA\" ~ \"Not applicable\",\n source_main == \"internal-from-syllabus-agenda-policy-database\" ~ \"Internal: syllabus, agenda, etc.\",\n is.na(source_main) ~ \"Unknown\",\n TRUE ~ source_main))\n\nall_papers_p %>% \nggplot(aes(x = fct_infreq(source_main), fill = is_evaluated)) + \n geom_bar(position = \"stack\", stat = \"count\") +\n labs(x = \"Source\", y = \"Count\", fill = \"Selected for\\nevaluation?\") +\n coord_flip() + # flipping the coordinates to have categories on y-axis (on the left)\n labs(title = \"Evaluations by source of the paper\") +\n theme_bw() +\n theme(text = element_text(size = 15)) +\n scale_fill_brewer(palette = \"Set1\") +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 20))\n```\n\n::: {.cell-output-display}\n![](evaluation_data_analysis_files/figure-html/data clean-1.png){width=672}\n:::\n:::\n\n\n\n## The distribution of ratings and predictions {-}\n\nNext, we present the ratings and predictions along with 'uncertainty measures'.^[We use \"ub imp\" (and \"lb imp\") to denote the upper and lower bounds given by evaluators.] Where evaluators gave only a 1-5 confidence level^[More or less, the ones who report a level for 'conf overall', although some people did this for some but not others], we use the imputations discussed and coded above. \n\n\n- For each category and prediction (overall and by paper)\n\n::: {.cell}\n\n```{.r .cell-code}\n# evals_pub %>% \n# select(matches(\"overall\")) %>% \n# view()\n```\n:::\n\n\n\n::: column-body-outset\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Generate a color palette with more colors\ncolor_count <- length(unique(evals_pub$paper_abbrev))\ncolor_palette <- colorRampPalette(brewer.pal(8, \"Set1\"))(color_count)\n\n# set one \"set\" of dodge width values across layers\npd = position_dodge(width = 0.8)\n\n# Dot plot\ng1 <- evals_pub %>% \n ggplot(aes(x = paper_abbrev, y = overall, \n text = paste0('Evaluator: ', eval_name, # tooltip data\n '
Rating [CI]: ', overall, \" [\", overall_lb_imp, \", \", overall_ub_imp, \"]\"))) +\n geom_point(aes(color = paper_abbrev), \n stat = \"identity\", size = 2, shape = 18, stroke = 1, \n position = pd) +\n geom_linerange(aes(ymin = overall_lb_imp, ymax = overall_ub_imp, color = paper_abbrev), \n position = pd) +\n geom_text(data = subset(evals_pub, str_detect(eval_name, \"Anonymous\")),\n aes(label = \"anon.\"), size=3) +\n coord_flip() + # flipping the coordinates to have categories on y-axis (on the left)\n labs(x = \"Paper\", y = \"Overall score\",\n title = \"Overall scores of evaluated papers\") +\n theme_bw() +\n theme(text = element_text(size = 15)) +\n theme(legend.position = \"none\") +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 20)) + \n scale_color_manual(values = color_palette)\n\n\nggplotly(g1, tooltip = c(\"text\"))\n```\n\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n\n:::\n\nIn future, we aim to build a dashboard allowing people to use the complete set of ratings and predictions, and choose their own weightings. (Also incorporating the evaluator uncertainty in reasonable ways.)\n\n### Shiny dashboard {-}\n\n::: column-body-outset\n\n\n```{=html}\n\n\n\n```\n\n\n:::\n\n::: {.callout-note collapse=\"true\"}\n## Future vis\n\nSpider or radial chart \n\nEach rating is a dimension or attribute (potentially normalized)\npotentially superimpose a 'circle' for the suggested weighting or overall. \n\nEach paper gets its own spider, with all others (or the average) in faded color behind it as a comparator. \n\nIdeally user can switch on/off \n\nBeware -- people infer things from the shape's size\n\n::: \n\n\n::: column-body-outset\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# JB: what is the purpose of this table? It's very large and I'm not totally\n# sure what it's doing so I'm just turning it off for now\nunit.scale = function(x) (x*100 - min(x*100)) / (max(x*100) - min(x*100))\n\nevaluations_table <- evals_pub %>%\n select(paper_abbrev, eval_name, cat_1, \n source_main, overall, adv_knowledge,\n methods, logic_comms, journal_predict) %>%\n arrange(desc(paper_abbrev))\n\nformattable(\n evaluations_table,\n list(\n #area(col = 5:8) ~ function(x) percent(x / 100, digits = 0),\n area(col = 5:8) ~ color_tile(\"#FA614B66\",\"#3E7DCC\"),\n `journal_predict` = proportion_bar(\"#DeF7E9\", unit.scale)\n )\n)\n```\n:::\n\n:::\n\n\n### Sources of variation {-}\n\nNext, look for systematic variation in the ratings \n\n- By field and topic area of paper\n\n- By submission/selection route\n\n- By evaluation manager (or their seniority, or whether they are US/Commonwealth/Other)^[DR: My theory is that people in commonwealth countries target a 70+ as 'strong' (because of their marking system) and that may drive a bias.]\n\n... perhaps building a model of this. We are looking for systematic 'biases and trends', loosely speaking, to help us better understand how our evaluation system is working.\n\n\\\n\n\n### Relationship among the ratings (and predictions) {-} \n\n::: {.callout-note collapse=\"true\"}\n## Next steps (suggested analyses)\n\n- Correlation matrix\n\n- ANOVA\n\n- PCA (Principle components)\n\n- With other 'control' factors?\n\n- How do the specific measures predict the aggregate ones (overall rating, merited publication)\n - CF 'our suggested weighting'\n\n::: \n\n\nNext chapter (analysis): *aggregation of evaluator judgment*\n\n\n::: {.callout-note collapse=\"true\"}\n## Scoping our future coverage\n\nWe have funding to evaluate roughly 50-70 papers/projects per year, given our proposed incentives.\n\nConsider:\n\n- How many relevant NBER papers come out per year?\n\n- How much relevant work in other prestige archives?\n\n- What quotas do we want (by cause, etc.) and how feasible are these?\n\n::: \n\n", "supporting": [ "evaluation_data_analysis_files" ], diff --git a/chapters/aggregation.qmd b/chapters/aggregation.qmd index 5b68852..8eae612 100644 --- a/chapters/aggregation.qmd +++ b/chapters/aggregation.qmd @@ -6,7 +6,7 @@ #| include: false library(tidyverse) -library(aggreCAT) +#library(aggreCAT) library(here) library(irr) @@ -204,6 +204,14 @@ evals_pub %>% ``` + + ## Decomposing variation, dimension reduction, simple linear models diff --git a/data/all_papers_p.Rdata b/data/all_papers_p.Rdata index 345e63b..2665dcc 100644 Binary files a/data/all_papers_p.Rdata and b/data/all_papers_p.Rdata differ diff --git a/data/all_papers_p.csv b/data/all_papers_p.csv index 4ece07b..c13e3e4 100644 --- a/data/all_papers_p.csv +++ b/data/all_papers_p.csv @@ -20,11 +20,11 @@ rec8CVePFXLK7bxWn,,0.53,NA,0.8,,"""Innovation, meta-science, and research""",NA, rec8MONL7xFGq2BaM,,0.7,0.58,0.9,,Economic development & governance (LMICs),"""Other: Economics, growth, policy, global markets and population ""","Development, Development and Growth, Regional Economics, Regional and Urban Economics, Migration",Emmanuel Orkoh,Unpublished working paper,Follow-up email sent,seeking_(more)_evaluators,internal-NBER,not needed (Unjournal Direct),NA,2022-11-05T15:57:00.000Z rec99VmJ2A7naPxuc,,0.95,NA,0.85,,"""Global health; """"Health & well-being in low-income countries""""""",NA,Health,Ryan Briggs ,NA,NA,seeking_(more)_evaluators,suggested - externally - NGO,not needed (Unjournal Direct),NA,2023-06-21T13:57:12.000Z rec9sED8IjDgjICxH,,0.65,NA,0.65,,Empirical methods,"""Catastrophic and existential risks, the long-term future, forecasting""",Econometrics,NA,NA,NA,NA,internal-from-syllabus-agenda-policy-database,NA,NA,2022-04-26T20:23:52.000Z -recAIY1CCAN8PB417,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,Not a paper/project,NA,NA,NA,2023-07-31T16:18:12.000Z recBdPi9jTrdn6xhs,,0.65,NA,0.55,,"""Global health; """"Health & well-being in low-income countries""""""",NA,NA,NA,NA,Emailed,considering,suggested - internally,NA,NA,2022-09-24T15:18:34.000Z recC20N4daHXFNmJJ,,0.6,NA,NA,,"""Communicable diseases, bio-security and pandemic preparedness, biological risks""",NA,"""Health, Education, and Welfare"", Health","David Reinstein, Sam Abbott",Unpublished working paper,NA,considering,internal-NBER,NA,NA,2022-11-05T22:40:56.000Z recCblJLRgWmhYBcO,,0.52,NA,NA,,Economic development & governance (LMICs),NA,NA,NA,NA,NA,NA,internal-NBER,NA,NA,2023-06-08T23:29:52.000Z recDKf292flMuBf7b,,0.57,NA,NA,,"""Catastrophic and existential risks, the long-term future, forecasting""",Economic development & governance (LMICs),NA,NA,NA,NA,NA,internal-NBER,NA,NA,2023-06-08T23:34:05.000Z +recDx0VLZQq5nckAO,,0.6,NA,NA,,"""Catastrophic and existential risks, the long-term future, forecasting""",Empirical methods,meta-analysis,NA,Unpublished,NA,NA,suggested - externally,NA,NA,2023-09-11T19:50:14.000Z recEZssfl3wF37J1T,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,NA,suggested - externally - NGO,NA,NA,2023-07-31T16:18:12.000Z recEiYGtyDewEDl9T,,0.55,NA,0.8,,Emerging technologies: social and economic impacts (focus: AI),NA,"Development and Growth, Innovation and R&D, Economic Systems, Industrial Organization",Kris Gulati,Unpublished working paper,NA,NA,internal-NBER,NA,NA,2022-11-05T22:16:13.000Z recFauZqPDBMVK28J,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2023-07-31T16:18:12.000Z @@ -119,7 +119,6 @@ recsiQaf3ZTSkkXLV,,0.63,NA,NA,,"""Global health; """"Health & well-being in low- recsii9l3QRQFerkU,,1,0.8,1,,"""Catastrophic and existential risks, the long-term future, forecasting""",Emerging technologies: social and economic impacts (focus: AI),NA,NA,"Published, ? journal",Agreed,published,submitted,not needed (submitted by authors),NA,2022-05-08T03:58:56.000Z recsxlSRIz4Y1RHd3,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2023-07-31T16:18:12.000Z rect8c6gbgVnvz6Zt,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2023-08-14T19:32:14.000Z -rectZXnEtaDibizPe,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2023-08-22T22:03:51.000Z rectfSMcCGKrVVtuw,,0.63,NA,NA,,"""Global health; """"Health & well-being in low-income countries""""""",Economic development & governance (LMICs),"Public Economics, ""Health, Education, and Welfare"", Poverty and Wellbeing, Labor Economics, Demography and Aging, Labor Supply and Demand, Development and Growth, Development","Hansika Kapoor, Anirudh Tagat",NA,Emailed,published,internal-NBER,NA,NA,2022-11-23T01:51:58.000Z rectim9KLJ6yQ1Goa,,0.56,NA,0.56,,"""Catastrophic and existential risks, the long-term future, forecasting""",NA,NA,NA,"Published, ~top journal",NA,NA,internal-from-syllabus-agenda-policy-database,NA,NA,2022-04-15T15:57:47.000Z rectraxzNjDb0cDbU,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2023-08-22T20:29:40.000Z diff --git a/data/evals.Rdata b/data/evals.Rdata index c20ac1a..af8b687 100644 Binary files a/data/evals.Rdata and b/data/evals.Rdata differ diff --git a/data/evals.csv b/data/evals.csv index e07883a..3cf0a56 100644 --- a/data/evals.csv +++ b/data/evals.csv @@ -11,7 +11,7 @@ recMAQaYQFRL0DQJB,"""The Environmental Effects of Economic Production: Evidence recNMQY75RCZdIcyG,"Kremer, M., Levin, J. and Snyder, C.M., 2020, May. Advance Market Commitments: Insights from Theory and Experience. In AEA Papers and Proceedings (Vol. 110, pp. 269-73).",Advance market commit. (vaccines),Dan Tortorice,policy,economics,biorisk,internal-from-syllabus-agenda-policy-database,Agreed,80,90,80,80,NA,95,4,90,4,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,4,5,4,4,NA,5,5,3,5,72,86,72,72,NA,91,3.8,75,3.8,88,94,88,88,NA,99,4.2,100,4.2 recOV0UgplEXwJf81,"Aghion, P., Jones, B.F., and Jones, C.I., 2017. Artificial Intelligence and Economic Growth",AI and econ. growth,Seth Benzell,macroeconomics,Artificial intelligence,prominent,internal-from-syllabus-agenda-policy-database,Agreed,80,75,80,70,NA,90,NA,95,4,90,85,85,80,NA,100,NA,100,5,70,65,75,60,NA,85,NA,90,3.5,NA,NA,NA,NA,NA,NA,NA,NA,NA,70,65,75,60,NA,85,NA,90,3.5,90,85,85,80,NA,100,NA,100,5 recXaqPPgA911nuoY,Banning wildlife trade can boost demand for unregulated threatened species,Banning wildlife trade can boost demand,Liew Jia Huan,conservation,biodiversity,NA,submitted,Agreed,75,80,50,70,90,65,2.5,50,3,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,4,4,2,5,4,3,5,5,5,67,72,25,66,82,50,2.3,46,2.8,83,88,75,74,98,80,2.7,54,3.2 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,65,70,60,55,55,80,3.6,45,3.8,74,75,70,65,75,90,4,60,4.1,55,55,55,50,45,70,2.8,30,3,NA,NA,NA,NA,NA,NA,NA,NA,NA,55,55,55,50,45,70,2.8,30,3,74,75,70,65,75,90,4,60,4.1 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,65,70,60,55,55,80,3.6,45,3.8,74,75,70,65,75,90,4,60,4.1,55,55,55,50,45,70,2.8,30,3,NA,NA,NA,NA,NA,NA,NA,NA,NA,55,55,55,50,45,70,2.8,30,3,74,75,70,65,75,90,4,60,4.1 recbXm55IKEWH4DAM,"Aghion, P., Jones, B.F., and Jones, C.I., 2017. Artificial Intelligence and Economic Growth ",AI and econ. growth,Phil Trammel,macroeconomics,Artificial intelligence,prominent,internal-from-syllabus-agenda-policy-database,Agreed,92,97,70,45,NA,92,3.5,80,5,100,100,90,70,NA,100,NA,NA,NA,80,80,40,30,NA,80,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,4,80,80,40,30,NA,80,NA,42.5,4.6,100,100,90,70,NA,100,NA,100,5 recf9O8DFGO98TPWk,"Mental Health Therapy as a Core Strategy for Increasing Human Capital: Evidence from Ghana (renamed ""Cognitive Behavioral Therapy among Ghana's Rural Poor Is Effective Regardless of Baseline Mental Distress"")","CBT Human K, Ghana",Anonymous_14,GH&D,NA,NA,internal-NBER,Emailed,75,60,90,70,50,50,4,90,4,84,65,94,82,52,60,NA,95,NA,70,55,82,62,48,40,NA,80,NA,NA,NA,NA,NA,NA,NA,4,NA,4,70,55,82,62,48,40,3.6,80,3.6,84,65,94,82,52,60,4.4,95,4.4 rechZ3KqBeCecnHOU,"When Celebrities Speak: A Nationwide Twitter Experiment Promoting Vaccination In Indonesia (Alatas et al, 2019/2021)","Celeb. Twitter promo, Indonesia vacc.",Anirugh Tagat,GH&D,biorisk,NA,internal-NBER,Emailed,85,90,80,85,100,100,4,80,5,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,4,4,3,4,4,5,5,3,5,77,82,65,77,92,96,3.8,65,4.8,93,98,95,93,100,100,4.2,95,5 diff --git a/data/evals_long.csv b/data/evals_long.csv index b3d71a0..ae5a937 100644 --- a/data/evals_long.csv +++ b/data/evals_long.csv @@ -107,15 +107,15 @@ recXaqPPgA911nuoY,Banning wildlife trade can boost demand for unregulated threat recXaqPPgA911nuoY,Banning wildlife trade can boost demand for unregulated threatened species,Banning wildlife trade can boost demand,Liew Jia Huan,conservation,biodiversity,NA,submitted,Agreed,journal_predict,2.5,NA,NA,5,2.3,2.7 recXaqPPgA911nuoY,Banning wildlife trade can boost demand for unregulated threatened species,Banning wildlife trade can boost demand,Liew Jia Huan,conservation,biodiversity,NA,submitted,Agreed,open_sci,50,NA,NA,5,46,54 recXaqPPgA911nuoY,Banning wildlife trade can boost demand for unregulated threatened species,Banning wildlife trade can boost demand,Liew Jia Huan,conservation,biodiversity,NA,submitted,Agreed,merits_journal,3,NA,NA,5,2.8,3.2 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,overall,65,55,74,NA,55,74 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,adv_knowledge,70,55,75,NA,55,75 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,methods,60,55,70,NA,55,70 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,logic_comms,55,50,65,NA,50,65 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,real_world,55,45,75,NA,45,75 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,gp_relevance,80,70,90,NA,70,90 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,journal_predict,3.6,2.8,4,NA,2.8,4 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,open_sci,45,30,60,NA,30,60 -recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Wayne Aaron Sandholtz,GH&D,NA,NA,internal-NBER,Emailed,merits_journal,3.8,3,4.1,NA,3,4.1 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,overall,65,55,74,NA,55,74 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,adv_knowledge,70,55,75,NA,55,75 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,methods,60,55,70,NA,55,70 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,logic_comms,55,50,65,NA,50,65 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,real_world,55,45,75,NA,45,75 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,gp_relevance,80,70,90,NA,70,90 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,journal_predict,3.6,2.8,4,NA,2.8,4 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,open_sci,45,30,60,NA,30,60 +recYb2JcJlGrHoI2H,The Governance Of Non-Profits And Their Social Impact: Evidence From A Randomized Program In Healthcare In DRC,Nonprofit Govc.: Randomized healthcare DRC,Anonymous_12,GH&D,NA,NA,internal-NBER,Emailed,merits_journal,3.8,3,4.1,NA,3,4.1 recbXm55IKEWH4DAM,"Aghion, P., Jones, B.F., and Jones, C.I., 2017. Artificial Intelligence and Economic Growth ",AI and econ. growth,Phil Trammel,macroeconomics,Artificial intelligence,prominent,internal-from-syllabus-agenda-policy-database,Agreed,overall,92,80,100,NA,80,100 recbXm55IKEWH4DAM,"Aghion, P., Jones, B.F., and Jones, C.I., 2017. Artificial Intelligence and Economic Growth ",AI and econ. growth,Phil Trammel,macroeconomics,Artificial intelligence,prominent,internal-from-syllabus-agenda-policy-database,Agreed,adv_knowledge,97,80,100,NA,80,100 recbXm55IKEWH4DAM,"Aghion, P., Jones, B.F., and Jones, C.I., 2017. Artificial Intelligence and Economic Growth ",AI and econ. growth,Phil Trammel,macroeconomics,Artificial intelligence,prominent,internal-from-syllabus-agenda-policy-database,Agreed,methods,70,40,90,NA,40,90 diff --git a/data/evals_long.rds b/data/evals_long.rds index f76e724..1014472 100644 Binary files a/data/evals_long.rds and b/data/evals_long.rds differ diff --git a/docs/chapters/aggregation.html b/docs/chapters/aggregation.html index eeed737..160c3a7 100644 --- a/docs/chapters/aggregation.html +++ b/docs/chapters/aggregation.html @@ -2,22 +2,21 @@ - + The Unjournal evaluations: data and analysis - 3  Aggregation of evaluators judgments (modeling) @@ -92,25 +121,22 @@
-
-
-

+

3  Aggregation of evaluators judgments (modeling)

@@ -181,19 +201,18 @@
-

3.1 Notes on sources and approaches

-
+
-
+
-
+
+

3.3 Decomposing variation, dimension reduction, simple linear models

@@ -372,23 +398,9 @@ icon: icon }; anchorJS.add('.anchored'); - const isCodeAnnotation = (el) => { - for (const clz of el.classList) { - if (clz.startsWith('code-annotation-')) { - return true; - } - } - return false; - } const clipboard = new window.ClipboardJS('.code-copy-button', { - text: function(trigger) { - const codeEl = trigger.previousElementSibling.cloneNode(true); - for (const childEl of codeEl.children) { - if (isCodeAnnotation(childEl)) { - childEl.remove(); - } - } - return codeEl.innerText; + target: function(trigger) { + return trigger.previousElementSibling; } }); clipboard.on('success', function(e) { @@ -400,24 +412,7 @@ button.classList.add('code-copy-button-checked'); var currentTitle = button.getAttribute("title"); button.setAttribute("title", "Copied!"); - let tooltip; - if (window.bootstrap) { - button.setAttribute("data-bs-toggle", "tooltip"); - button.setAttribute("data-bs-placement", "left"); - button.setAttribute("data-bs-title", "Copied!"); - tooltip = new bootstrap.Tooltip(button, - { trigger: "manual", - customClass: "code-copy-button-tooltip", - offset: [0, -8]}); - tooltip.show(); - } setTimeout(function() { - if (tooltip) { - tooltip.hide(); - button.removeAttribute("data-bs-title"); - button.removeAttribute("data-bs-toggle"); - button.removeAttribute("data-bs-placement"); - } button.setAttribute("title", currentTitle); button.classList.remove('code-copy-button-checked'); }, 1000); @@ -504,128 +499,24 @@ return note.innerHTML; }); } - let selectedAnnoteEl; - const selectorForAnnotation = ( cell, annotation) => { - let cellAttr = 'data-code-cell="' + cell + '"'; - let lineAttr = 'data-code-annotation="' + annotation + '"'; - const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; - return selector; - } - const selectCodeLines = (annoteEl) => { - const doc = window.document; - const targetCell = annoteEl.getAttribute("data-target-cell"); - const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); - const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); - const lines = annoteSpan.getAttribute("data-code-lines").split(","); - const lineIds = lines.map((line) => { - return targetCell + "-" + line; - }) - let top = null; - let height = null; - let parent = null; - if (lineIds.length > 0) { - //compute the position of the single el (top and bottom and make a div) - const el = window.document.getElementById(lineIds[0]); - top = el.offsetTop; - height = el.offsetHeight; - parent = el.parentElement.parentElement; - if (lineIds.length > 1) { - const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); - const bottom = lastEl.offsetTop + lastEl.offsetHeight; - height = bottom - top; - } - if (top !== null && height !== null && parent !== null) { - // cook up a div (if necessary) and position it - let div = window.document.getElementById("code-annotation-line-highlight"); - if (div === null) { - div = window.document.createElement("div"); - div.setAttribute("id", "code-annotation-line-highlight"); - div.style.position = 'absolute'; - parent.appendChild(div); - } - div.style.top = top - 2 + "px"; - div.style.height = height + 4 + "px"; - let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); - if (gutterDiv === null) { - gutterDiv = window.document.createElement("div"); - gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); - gutterDiv.style.position = 'absolute'; - const codeCell = window.document.getElementById(targetCell); - const gutter = codeCell.querySelector('.code-annotation-gutter'); - gutter.appendChild(gutterDiv); - } - gutterDiv.style.top = top - 2 + "px"; - gutterDiv.style.height = height + 4 + "px"; - } - selectedAnnoteEl = annoteEl; - } - }; - const unselectCodeLines = () => { - const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; - elementsIds.forEach((elId) => { - const div = window.document.getElementById(elId); - if (div) { - div.remove(); - } - }); - selectedAnnoteEl = undefined; - }; - // Attach click handler to the DT - const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); - for (const annoteDlNode of annoteDls) { - annoteDlNode.addEventListener('click', (event) => { - const clickedEl = event.target; - if (clickedEl !== selectedAnnoteEl) { - unselectCodeLines(); - const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); - if (activeEl) { - activeEl.classList.remove('code-annotation-active'); - } - selectCodeLines(clickedEl); - clickedEl.classList.add('code-annotation-active'); - } else { - // Unselect the line - unselectCodeLines(); - clickedEl.classList.remove('code-annotation-active'); - } - }); - } - const findCites = (el) => { - const parentEl = el.parentElement; - if (parentEl) { - const cites = parentEl.dataset.cites; - if (cites) { - return { - el, - cites: cites.split(' ') - }; - } else { - return findCites(el.parentElement) - } - } else { - return undefined; - } - }; var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]'); for (var i=0; i

diff --git a/docs/chapters/evaluation_data_analysis.html b/docs/chapters/evaluation_data_analysis.html index 4f5cab7..4f21ca3 100644 --- a/docs/chapters/evaluation_data_analysis.html +++ b/docs/chapters/evaluation_data_analysis.html @@ -2,22 +2,21 @@ - + The Unjournal evaluations: data and analysis - 2  Evaluation data: description, exploration, checks @@ -100,25 +129,22 @@
-
-
-

+

2  Evaluation data: description, exploration, checks

@@ -194,7 +214,6 @@
-
@@ -230,12 +249,12 @@ options(mc.cores = parallel::detectCores())
-
+
@@ -378,8 +394,8 @@ )
-
- +
+
@@ -400,12 +416,12 @@

:::

-
+

Todo: 3

@@ -577,7 +593,7 @@ scale_fill_brewer(palette = "Set1") + scale_x_discrete(labels = function(x) str_wrap(x, width = 20))
-

+

The distribution of ratings and predictions

@@ -624,8 +640,8 @@ ggplotly(g1, tooltip = c("text"))
-
- +
+
@@ -636,12 +652,12 @@ -
+

Relationship among the ratings (and predictions)

-
+

Next chapter (analysis): aggregation of evaluator judgment

-
+