diff --git a/_freeze/chapters/aggregation/execute-results/html.json b/_freeze/chapters/aggregation/execute-results/html.json index 673406f..b1b8814 100644 --- a/_freeze/chapters/aggregation/execute-results/html.json +++ b/_freeze/chapters/aggregation/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "2726ad363da560877f7c0b4345be5343", + "hash": "2fac02861d4c188680b8a73586f7e412", "result": { - "markdown": "# Aggregation of evaluators judgments (modeling)\n\n\n\n\n\n\n## Notes on sources and approaches\n\n\n::: {.callout-note collapse=\"true\"}\n\n## Hanea et al {-}\n(Consult, e.g., repliCATS/Hanea and others work; meta-science and meta-analysis approaches)\n\n`aggrecat` package\n\n> Although the accuracy, calibration, and informativeness of the majority of methods are very similar, a couple of the aggregation methods consistently distinguish themselves as among the best or worst. Moreover, the majority of methods outperform the usual benchmarks provided by the simple average or the median of estimates.\n\n[Hanea et al, 2021](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0256919#sec007)\n\n However, these are in a different context. Most of those measures are designed to deal with probablistic forecasts for binary outcomes, where the predictor also gives a 'lower bound' and 'upper bound' for that probability. We could roughly compare that to our continuous metrics with 90% CI's (or imputations for these).\n\nFurthermore, many (all their successful measures?) use 'performance-based weights', accessing metrics from prior prediction performance of the same forecasters We do not have these, nor do we have a sensible proxy for this. \n:::\n\n\n::: {.callout-note collapse=\"true\"}\n## D Veen et al (2017)\n\n[link](https://www.researchgate.net/profile/Duco-Veen/publication/319662351_Using_the_Data_Agreement_Criterion_to_Rank_Experts'_Beliefs/links/5b73e2dc299bf14c6da6c663/Using-the-Data-Agreement-Criterion-to-Rank-Experts-Beliefs.pdf)\n\n... we show how experts can be ranked based on their knowledge and their level of (un)certainty. By letting experts specify their knowledge in the form of a probability distribution, we can assess how accurately they can predict new data, and how appropriate their level of (un)certainty is. The expert’s specified probability distribution can be seen as a prior in a Bayesian statistical setting. We evaluate these priors by extending an existing prior-data (dis)agreement measure, the Data Agreement Criterion, and compare this approach to using Bayes factors to assess prior specification. We compare experts with each other and the data to evaluate their appropriateness. Using this method, new research questions can be asked and answered, for instance: Which expert predicts the new data best? Is there agreement between my experts and the data? Which experts’ representation is more valid or useful? Can we reach convergence between expert judgement and data? We provided an empirical example ranking (regional) directors of a large financial institution based on their predictions of turnover. \n\nBe sure to consult the [correction made here](https://www.semanticscholar.org/paper/Correction%3A-Veen%2C-D.%3B-Stoel%2C-D.%3B-Schalken%2C-N.%3B-K.%3B-Veen-Stoel/a2882e0e8606ef876133f25a901771259e7033b1)\n\n::: \n\n\n::: {.callout-note collapse=\"true\"}\n## Also seems relevant:\n\nSee [Gsheet HERE](https://docs.google.com/spreadsheets/d/14japw6eLGpGjEWy1MjHNJXU1skZY_GAIc2uC2HIUalM/edit#gid=0), generated from an Elicit.org inquiry.\n\n\n::: \n\n\n\nIn spite of the caveats in the fold above, we construct some measures of aggregate beliefs using the `aggrecat` package. We will make (and explain) some ad-hoc choices here. We present these:\n\n1. For each paper\n2. For categories of papers and cross-paper categories of evaluations\n3. For the overall set of papers and evaluations\n\nWe can also hold onto these aggregated metrics for later use in modeling.\n\n\n- Simple averaging\n\n- Bayesian approaches \n\n- Best-performing approaches from elsewhere \n\n- Assumptions over unit-level random terms \n\n\n### Simple rating aggregation {-}\n\nBelow, we are preparing the data for the aggreCATS package.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# JB: This section is a work in progress, please do not edit\n\n# paper_ratings: one row per rating category and 'type' (score, upper, lower bound.)\nevals_pub %>% \n select(id, eval_name, paper_abbrev, \n overall, overall_lb_imp, overall_ub_imp,\n adv_knowledge, adv_knowledge_lb_imp, adv_knowledge_ub_imp,\n methods, methods_lb_imp, methods_ub_imp,\n logic_comms, logic_comms_lb_imp, logic_comms_ub_imp,\n real_world, real_world_lb_imp, real_world_ub_imp,\n gp_relevance, gp_relevance_lb_imp, gp_relevance_ub_imp,\n open_sci, open_sci_lb_imp, open_sci_ub_imp) %>% \n rename_with(function(x) paste0(x,\"_score\"), all_of(rating_cats)) %>%\n pivot_longer(cols = c(-id, -eval_name, -paper_abbrev),\n names_pattern = \"(.+)_(score|[ul]b_imp)\",\n names_to = c(\"criterion\",\"element\"),\n values_to = \"value\") -> paper_ratings\n\n# renaming to conform with aggreCATS expectations\npaper_ratings <- paper_ratings %>% \n rename(paper_id = paper_abbrev,\n user_name = eval_name) %>% \n mutate(round = \"round_1\",\n element = case_when(element == \"lb_imp\" ~ \"three_point_lower\",\n element == \"ub_imp\" ~ \"three_point_upper\",\n element == \"score\" ~ \"three_point_best\"))\n\n# filter only overall for now\npaper_ratings %>% \n filter(criterion == \"overall\") %>% \n group_by(user_name, paper_id) %>% \n filter(sum(is.na(value))==0) %>% \n ungroup() -> temp\n \n\nAverageWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"ArMean\")\n\nIntervalWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"IntWAgg\")\n\naggreCAT::DistributionWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"DistribArMean\", percent_toggle = T)\n\n# EXAMPLE CODE ===============================\n# data(data_ratings)\n# set.seed(1234)\n# \n# participant_subset <- data_ratings %>%\n# distinct(user_name) %>%\n# sample_n(5) %>%\n# mutate(participant_name = paste(\"participant\", rep(1:n())))\n# \n# single_claim <- data_ratings %>%\n# filter(paper_id == \"28\") %>%\n# right_join(participant_subset, by = \"user_name\") %>%\n# filter(grepl(x = element, pattern = \"three_.+\")) %>%\n# select(-group, -participant_name, -question)\n# \n# DistributionWAgg(expert_judgements = single_claim,\n# type = \"DistribArMean\", percent_toggle = T)\n# \n```\n:::\n\n\n\n\n\n### Explicit modeling of 'research quality' (for use in prizes, etc.) {-}\n\n- Use the above aggregation as the outcome of interest, or weight towards categories of greater interest?\n\n- Model with controls -- look for greatest positive residual? \n\n\n## Inter-rater reliability\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](aggregation_files/figure-html/unnamed-chunk-1-1.png){width=672}\n:::\n:::\n\n\n\n\n\n\n## Decomposing variation, dimension reduction, simple linear models\n\n\n## Later possiblities\n\n- Relation to evaluation text content (NLP?)\n\n- Relation/prediction of later outcomes (traditional publication, citations, replication)\n", + "markdown": "# Aggregation of evaluators judgments (modeling)\n\n\n\n\n\n\n## Notes on sources and approaches\n\n\n::: {.callout-note collapse=\"true\"}\n\n## Hanea et al {-}\n(Consult, e.g., repliCATS/Hanea and others work; meta-science and meta-analysis approaches)\n\n`aggrecat` package\n\n> Although the accuracy, calibration, and informativeness of the majority of methods are very similar, a couple of the aggregation methods consistently distinguish themselves as among the best or worst. Moreover, the majority of methods outperform the usual benchmarks provided by the simple average or the median of estimates.\n\n[Hanea et al, 2021](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0256919#sec007)\n\n However, these are in a different context. Most of those measures are designed to deal with probablistic forecasts for binary outcomes, where the predictor also gives a 'lower bound' and 'upper bound' for that probability. We could roughly compare that to our continuous metrics with 90% CI's (or imputations for these).\n\nFurthermore, many (all their successful measures?) use 'performance-based weights', accessing metrics from prior prediction performance of the same forecasters We do not have these, nor do we have a sensible proxy for this. \n:::\n\n\n::: {.callout-note collapse=\"true\"}\n## D Veen et al (2017)\n\n[link](https://www.researchgate.net/profile/Duco-Veen/publication/319662351_Using_the_Data_Agreement_Criterion_to_Rank_Experts'_Beliefs/links/5b73e2dc299bf14c6da6c663/Using-the-Data-Agreement-Criterion-to-Rank-Experts-Beliefs.pdf)\n\n... we show how experts can be ranked based on their knowledge and their level of (un)certainty. By letting experts specify their knowledge in the form of a probability distribution, we can assess how accurately they can predict new data, and how appropriate their level of (un)certainty is. The expert’s specified probability distribution can be seen as a prior in a Bayesian statistical setting. We evaluate these priors by extending an existing prior-data (dis)agreement measure, the Data Agreement Criterion, and compare this approach to using Bayes factors to assess prior specification. We compare experts with each other and the data to evaluate their appropriateness. Using this method, new research questions can be asked and answered, for instance: Which expert predicts the new data best? Is there agreement between my experts and the data? Which experts’ representation is more valid or useful? Can we reach convergence between expert judgement and data? We provided an empirical example ranking (regional) directors of a large financial institution based on their predictions of turnover. \n\nBe sure to consult the [correction made here](https://www.semanticscholar.org/paper/Correction%3A-Veen%2C-D.%3B-Stoel%2C-D.%3B-Schalken%2C-N.%3B-K.%3B-Veen-Stoel/a2882e0e8606ef876133f25a901771259e7033b1)\n\n::: \n\n\n::: {.callout-note collapse=\"true\"}\n## Also seems relevant:\n\nSee [Gsheet HERE](https://docs.google.com/spreadsheets/d/14japw6eLGpGjEWy1MjHNJXU1skZY_GAIc2uC2HIUalM/edit#gid=0), generated from an Elicit.org inquiry.\n\n\n::: \n\n\n\nIn spite of the caveats in the fold above, we construct some measures of aggregate beliefs using the `aggrecat` package. We will make (and explain) some ad-hoc choices here. We present these:\n\n1. For each paper\n2. For categories of papers and cross-paper categories of evaluations\n3. For the overall set of papers and evaluations\n\nWe can also hold onto these aggregated metrics for later use in modeling.\n\n\n- Simple averaging\n\n- Bayesian approaches \n\n- Best-performing approaches from elsewhere \n\n- Assumptions over unit-level random terms \n\n\n### Simple rating aggregation {-}\n\nBelow, we are preparing the data for the aggreCATS package.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# JB: This section is a work in progress, please do not edit\n\n# paper_ratings: one row per rating category and 'type' (score, upper, lower bound.)\nevals_pub %>% \n select(id, eval_name, paper_abbrev, \n overall, overall_lb_imp, overall_ub_imp,\n adv_knowledge, adv_knowledge_lb_imp, adv_knowledge_ub_imp,\n methods, methods_lb_imp, methods_ub_imp,\n logic_comms, logic_comms_lb_imp, logic_comms_ub_imp,\n real_world, real_world_lb_imp, real_world_ub_imp,\n gp_relevance, gp_relevance_lb_imp, gp_relevance_ub_imp,\n open_sci, open_sci_lb_imp, open_sci_ub_imp) %>% \n rename_with(function(x) paste0(x,\"_score\"), all_of(rating_cats)) %>%\n pivot_longer(cols = c(-id, -eval_name, -paper_abbrev),\n names_pattern = \"(.+)_(score|[ul]b_imp)\",\n names_to = c(\"criterion\",\"element\"),\n values_to = \"value\") -> paper_ratings\n\n# renaming to conform with aggreCATS expectations\npaper_ratings <- paper_ratings %>% \n rename(paper_id = paper_abbrev,\n user_name = eval_name) %>% \n mutate(round = \"round_1\",\n element = case_when(element == \"lb_imp\" ~ \"three_point_lower\",\n element == \"ub_imp\" ~ \"three_point_upper\",\n element == \"score\" ~ \"three_point_best\"))\n\n# filter only overall for now\npaper_ratings %>% \n filter(criterion == \"overall\") %>% \n group_by(user_name, paper_id) %>% \n filter(sum(is.na(value))==0) %>% \n ungroup() -> temp\n \n\nAverageWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"ArMean\")\n\nIntervalWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"IntWAgg\")\n\naggreCAT::DistributionWAgg(expert_judgements = temp, round_2_filter = FALSE, type = \"DistribArMean\", percent_toggle = T)\n\n# EXAMPLE CODE ===============================\n# data(data_ratings)\n# set.seed(1234)\n# \n# participant_subset <- data_ratings %>%\n# distinct(user_name) %>%\n# sample_n(5) %>%\n# mutate(participant_name = paste(\"participant\", rep(1:n())))\n# \n# single_claim <- data_ratings %>%\n# filter(paper_id == \"28\") %>%\n# right_join(participant_subset, by = \"user_name\") %>%\n# filter(grepl(x = element, pattern = \"three_.+\")) %>%\n# select(-group, -participant_name, -question)\n# \n# DistributionWAgg(expert_judgements = single_claim,\n# type = \"DistribArMean\", percent_toggle = T)\n# \n```\n:::\n\n\n\n\n\n### Explicit modeling of 'research quality' (for use in prizes, etc.) {-}\n\n- Use the above aggregation as the outcome of interest, or weight towards categories of greater interest?\n\n- Model with controls -- look for greatest positive residual? \n\n\n## Inter-rater reliability\n\nInter-rater reliability is a measure of the degree to which two or more independent raters (in our case, paper evaluators) agree. Here, the ratings are the 7 aspects of each paper that evaluators were asked to rate. For each paper, we can obtain one value that summarizes the agreement between the two or three evaluators.\n\nWe use Krippendorff's alpha as a measure of interrater agreement. Krippendorff's alpha is a more flexible measure of agreement and can be used with different levels of data (categorical, ordinal, interval, and ratio) as well as different numbers of raters. The calculation of alpha in the function `kripp.alpha` was implemented by Jim Lemon in the package `irr` and is based on Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills, CA: Sage.\n\nKrippendorff's alpha can range from -1 to +1, and it can be interpreted similarly to a correlation: values closer to +1 indicate excellent agreement between evaluators; values closer to 0 indicate there is no agreement between evaluators; and negative values indicate that there is systematic disagreement between evaluators, such that ratings are reversed -- where a given evaluator tends to rate something as high, the other(s) tend to rate it as low, and vice versa.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](aggregation_files/figure-html/unnamed-chunk-1-1.png){width=672}\n:::\n:::\n\n\n\n\n\n\n## Decomposing variation, dimension reduction, simple linear models\n\n\n## Later possiblities\n\n- Relation to evaluation text content (NLP?)\n\n- Relation/prediction of later outcomes (traditional publication, citations, replication)\n", "supporting": [ "aggregation_files" ], diff --git a/_freeze/chapters/evaluation_data_input/execute-results/html.json b/_freeze/chapters/evaluation_data_input/execute-results/html.json index 3696a88..a5be508 100644 --- a/_freeze/chapters/evaluation_data_input/execute-results/html.json +++ b/_freeze/chapters/evaluation_data_input/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "c76fbe826325ec2c8e8ad614a78c1b21", + "hash": "afecc092d396b60c9af7b5d0310e2892", "result": { - "markdown": "# Evaluation data: input/features\n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"load packages\"}\nlibrary(tidyverse) \n\n# data acquisition ----\n#devtools::install_github(\"bergant/airtabler\")\nlibrary(airtabler)\n\n# data cleaning & shaping ----\n\n# data analysis ----\n# library(lme4)\n# library(lmtest) # Testing Linear Regression Models\n\n# markdown et al. ----\nlibrary(knitr)\nlibrary(bookdown)\nlibrary(quarto)\nlibrary(formattable) # Create 'Formattable' Data Structures\n\n# others ----\nlibrary(here) # A Simpler Way to Find Your Files\n#devtools::install_github(\"metamelb-repliCATS/aggreCAT\")\n#library(aggrecat)\n\n# Make sure select is always the dplyr version\nselect <- dplyr::select \n\n# options\noptions(knitr.duplicate.label = \"allow\")\n```\n:::\n\n\n\n::: {.callout-note collapse=\"true\"}\n## Note on data input (10-Aug-23)\n\nBelow, the evaluation data is input from an Airtable, which itself was largely hand-input from evaluators' reports. As PubPub builds (target: end of Sept. 2023), this will allow us to include the ratings and predictions as structured data objects. We then plan to access and input this data *directly* from the PubPub (API?) into the present analysis. This will improve automation and limit the potential for data entry errors.\n\n::: \n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"input from airtable\"}\nbase_id <- \"appbPYEw9nURln7Qg\"\n\n\n# Set your Airtable API key \nSys.setenv(AIRTABLE_API_KEY = Sys.getenv(\"AIRTABLE_API_KEY\"))\n#this should be set in my .Renviron file\n\n# Read data from a specific view\n\nevals <- air_get(base = base_id, \"output_eval\") \n\nall_pub_records <- data.frame()\npub_records <- air_select(base = base_id, table = \"crucial_research\")\n\n# Append the records to the list\nall_pub_records <- bind_rows(all_pub_records, pub_records)\n\n# While the length of the records list is 100 (the maximum), fetch more records\nwhile(nrow(pub_records) == 100) {\n # Get the ID of the last record in the list\n offset <- get_offset(pub_records)\n \n # Fetch the next 100 records, starting after the last ID\n pub_records <- air_select(base = base_id, table = \"crucial_research\", offset = offset)\n \n # Append the records to the df\n all_pub_records <- bind_rows(all_pub_records, pub_records)\n}\n\n# housekeeping\nrm(pub_records)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"just the useful and publish-able data, clean a bit\"}\n# clean evals names to snakecase\ncolnames(evals) <- snakecase::to_snake_case(colnames(evals))\n\nevals_pub <- evals %>% \n dplyr::rename(stage_of_process = stage_of_process_todo_from_crucial_research_2) %>% \n mutate(stage_of_process = unlist(stage_of_process)) %>% \n dplyr::filter(stage_of_process == \"published\") %>% \n select(id, \n crucial_research, \n paper_abbrev, \n evaluator_name, \n category, \n source_main, \n author_agreement, \n overall, \n lb_overall, \n ub_overall, \n conf_index_overall, \n advancing_knowledge_and_practice, \n lb_advancing_knowledge_and_practice, \n ub_advancing_knowledge_and_practice, \n conf_index_advancing_knowledge_and_practice,\n methods_justification_reasonableness_validity_robustness,\n lb_methods_justification_reasonableness_validity_robustness,\n ub_methods_justification_reasonableness_validity_robustness,\n conf_index_methods_justification_reasonableness_validity_robustness, \n logic_communication, lb_logic_communication, ub_logic_communication, \n conf_index_logic_communication,\n engaging_with_real_world_impact_quantification_practice_realism_and_relevance,\n lb_engaging_with_real_world_impact_quantification_practice_realism_and_relevance,\n ub_engaging_with_real_world_impact_quantification_practice_realism_and_relevance,\n conf_index_engaging_with_real_world_impact_quantification_practice_realism_and_relevance,\n relevance_to_global_priorities, \n lb_relevance_to_global_priorities, \n ub_relevance_to_global_priorities, \n conf_index_relevance_to_global_priorities, \n journal_quality_predict, \n lb_journal_quality_predict, \n ub_journal_quality_predict,\n conf_index_journal_quality_predict, \n open_collaborative_replicable, \n conf_index_open_collaborative_replicable, \n lb_open_collaborative_replicable, \n ub_open_collaborative_replicable, \n merits_journal, \n lb_merits_journal, \n ub_merits_journal, \n conf_index_merits_journal)\n\n# shorten names (before you expand into columns)\nnew_names <- c(\n \"eval_name\" = \"evaluator_name\",\n \"cat\" = \"category\",\n \"crucial_rsx\" = \"crucial_research\",\n \"conf_overall\" = \"conf_index_overall\",\n \"adv_knowledge\" = \"advancing_knowledge_and_practice\",\n \"lb_adv_knowledge\" = \"lb_advancing_knowledge_and_practice\",\n \"ub_adv_knowledge\" = \"ub_advancing_knowledge_and_practice\",\n \"conf_adv_knowledge\" = \"conf_index_advancing_knowledge_and_practice\",\n \"methods\" = \"methods_justification_reasonableness_validity_robustness\",\n \"lb_methods\" = \"lb_methods_justification_reasonableness_validity_robustness\",\n \"ub_methods\" = \"ub_methods_justification_reasonableness_validity_robustness\",\n \"conf_methods\" = \"conf_index_methods_justification_reasonableness_validity_robustness\",\n \"logic_comms\" = \"logic_communication\",\n \"lb_logic_comms\" = \"lb_logic_communication\",\n \"ub_logic_comms\" = \"ub_logic_communication\",\n \"conf_logic_comms\" = \"conf_index_logic_communication\",\n \"real_world\" = \"engaging_with_real_world_impact_quantification_practice_realism_and_relevance\",\n \"lb_real_world\" = \"lb_engaging_with_real_world_impact_quantification_practice_realism_and_relevance\",\n \"ub_real_world\" = \"ub_engaging_with_real_world_impact_quantification_practice_realism_and_relevance\",\n \"conf_real_world\" = \"conf_index_engaging_with_real_world_impact_quantification_practice_realism_and_relevance\",\n \"gp_relevance\" = \"relevance_to_global_priorities\",\n \"lb_gp_relevance\" = \"lb_relevance_to_global_priorities\",\n \"ub_gp_relevance\" = \"ub_relevance_to_global_priorities\",\n \"conf_gp_relevance\" = \"conf_index_relevance_to_global_priorities\",\n \"journal_predict\" = \"journal_quality_predict\",\n \"lb_journal_predict\" = \"lb_journal_quality_predict\",\n \"ub_journal_predict\" = \"ub_journal_quality_predict\",\n \"conf_journal_predict\" = \"conf_index_journal_quality_predict\",\n \"open_sci\" = \"open_collaborative_replicable\",\n \"conf_open_sci\" = \"conf_index_open_collaborative_replicable\",\n \"lb_open_sci\" = \"lb_open_collaborative_replicable\",\n \"ub_open_sci\" = \"ub_open_collaborative_replicable\",\n \"conf_merits_journal\" = \"conf_index_merits_journal\"\n)\n\nevals_pub <- evals_pub %>%\n rename(!!!new_names)\n\n# Create a list of labels with the old, longer names\nlabels <- str_replace_all(new_names, \"_\", \" \") %>% str_to_title()\n\n# Assign labels to the dataframe / tibble\n# (maybe this can be done as an attribute, not currently working)\n# for(i in seq_along(labels)) {\n# col_name <- new_names[names(new_names)[i]]\n# label <- labels[i]\n# attr(evals_pub[[col_name]], \"label\") <- label\n# }\n\n\n# expand categories into columns, unlist everything\nevals_pub %<>%\n tidyr::unnest_wider(cat, names_sep = \"_\") %>% # give each of these its own col\n mutate(across(everything(), unlist)) # maybe check why some of these are lists in the first place\n \n\n# clean the Anonymous names\nevals_pub$eval_name <- ifelse(\n grepl(\"^\\\\b\\\\w+\\\\b$|\\\\bAnonymous\\\\b\", evals_pub$eval_name),\n paste0(\"Anonymous_\", seq_along(evals_pub$eval_name)),\n evals_pub$eval_name\n)\n\n#housekeeping\nrm(evals)\n\n#Todo -- check the unlist is not propagating the entry\n#Note: category, topic_subfield, and source have multiple meaningful categories. These will need care \n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nevals_pub_long <- evals_pub %>% \n pivot_longer(cols = -c(id, crucial_rsx, paper_abbrev, eval_name, \n cat_1,cat_2, cat_3,source_main,author_agreement),\n names_pattern = \"(lb_|ub_|conf_)?(.+)\",\n names_to = c(\"value_type\", \"rating_type\")) %>% # one line per rating type\n mutate(value_type = if_else(value_type == \"\", \"est_\", value_type)) %>% #add main rating id\n pivot_wider(names_from = value_type, \n values_from = value)\n```\n:::\n\n\n\n\n\n### Reconcile uncertainty ratings and CIs {-}\n\nWhere people gave only confidence level 'dots', we impute CIs (confidence/credible intervals). We follow the correspondence described [here](https://effective-giving-marketing.gitbook.io/unjournal-x-ea-and-global-priorities-research/policies-projects-evaluation-workflow/evaluation/guidelines-for-evaluators#1-5-dots-explanation-and-relation-to-cis). (Otherwise, where they gave actual CIs, we use these.)^[Note this is only a first-pass; a more sophisticated approach may be warranted in future.]\n\n::: {.callout-note collapse=\"true\"}\n## Dots to interval choices\n\n> 5 = Extremely confident, i.e., 90% confidence interval spans +/- 4 points or less)\n\nFor 0-100 ratings, code the LB as $max(R - 4,0)$ and the UB as $min(R + 4,100)$, where R is the stated (middle) rating. This 'scales' the CI, as interpreted, to be proportional to the rating, with a maximum 'interval' of about 8, with the rating is about 96.\n\n> 4 = Very*confident: 90% confidence interval +/- 8 points or less\n\nFor 0-100 ratings, code the LB as $max(R - 8,0)$ and the UB as $min(R + 8,100)$, where R is the stated (middle) rating. \n\n> 3 = Somewhat** confident: 90% confidence interval +/- 15 points or less \n\n> 2 = Not very** confident: 90% confidence interval, +/- 25 points or less\n\nComparable scaling for the 2-3 ratings as for the 4 and 5 rating.\n\n> 1 = Not** confident: (90% confidence interval +/- more than 25 points)\n \nCode LB as $max(R - 37.5,0)$ and the UB as $min(R + 37.5,100)$. \n \nThis is just a first-pass. There might be a more information-theoretic way of doing this. On the other hand, we might be switching the evaluations to use a different tool soon, perhaps getting rid of the 1-5 confidence ratings altogether.\n\n::: \n\n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"reconcile explicit bounds and stated confidence level\"}\n# Define the baseline widths for each confidence rating\n# JB: it would be good to have some more backing of whether this\n# is a valid way to translate these confidence levels into CIs\n\n# baseline_widths <- c(4, 8, 15, 25, 37.5)\n\n# Lists of categories\nrating_cats <- c(\"overall\", \"adv_knowledge\", \"methods\", \"logic_comms\", \"real_world\", \"gp_relevance\", \"open_sci\")\n\n#... 'predictions' are currently 1-5 (0-5?)\npred_cats <- c(\"journal_predict\", \"merits_journal\")\n\n#DR: Note that I use these objects in future chapters, but they are not connected to the data frame. Either one saves and reinputs the whole environment (messy, replicability issue), or you save this somewhere and re-input it or connect it to the data frame that gets saved (not sure how to do it), or you hard-code reinput it in the next chapter. \n\n#I do the latter for now, but I'm not happy about it, because the idea is 'input definitions in a single place to use later'\n\n#JB: I don't really understand why not just hard code it in the next chapter. These are very short strings. Do you expect them to change often? If so, we can derive them from a dataframe somewhere in the future or save as a separate object. \n\n\n# JB: Rewritten functions for adding imputed ub and lb\n\n# calculate the lower and upper bounds, \n# rating: given a rating, \n# conf: a confidence (1-5) score,\n# type: a bound type (lower, upper),\n# scale: a scale (100 is 0-100, 5 is 1-5)\n# This function is not vectorized\ncalc_bounds <- function(rating, conf, type, scale) {\n \n if(scale == 5){ #for the 'journal tier prediction case'\n baseline_width = case_match(conf, \n 5 ~ .2, #4*5/100\n 4 ~ .4, #8*5/100\n 3 ~ .75, #15*5/100\n 2 ~ 1.25, #25*5/100\n 1 ~ 1.875, #37.5*5/100\n .default = NA_real_) \n\n upper = min(rating + baseline_width, 5)\n lower = max(rating - baseline_width, 1)\n }#/if(scale == 5)\n \n if(scale == 100){ #for the 'ratings case'\n \n baseline_width = case_match(conf, \n 5 ~ 4, \n 4 ~ 8, \n 3 ~ 15, \n 2 ~ 25, \n 1 ~ 37.5,\n .default = NA_real_)\n \n upper = min(rating + baseline_width, 100)\n lower = max(rating - baseline_width, 0)\n } #/if(scale == 100)\n \n if(type == \"lower\") return(lower)\n if(type == \"upper\") return(upper)\n}\n\n\n# calculate or find correct lower or upper bound\n# based on rating type, and lb, ub, and conf values\nimpute_bounds <- function(var_name, est, lb, ub, conf, bound_type) {\n \n # get scale of each variable\n scale = if_else(var_name %in% c(\"journal_predict\", \"merits_journal\"), # if variable is a prediction\n 5, 100) #scale is 5, else scale is 100\n # if calculating lower bound variable\n if(bound_type == \"lower\") { #we are calculating a lower bound imputation\n # \n calculated_bound = map_dbl(.x = est, .f = calc_bounds, conf = conf, type = bound_type, scale = scale)\n \n imp_bound = if_else(is.na(lb), calculated_bound, lb)\n }\n \n # if calculating upper bound variable\n if(bound_type == \"upper\") { #we are calculating an upper bound imputation\n # \n calculated_bound = map_dbl(.x = est, .f = calc_bounds, conf = conf, type = bound_type, scale = scale)\n imp_bound = if_else(is.na(ub), calculated_bound, ub)\n }\n \n return(imp_bound)\n}\n\n# apply functions to evals_pub_long\n# where each row is one type of rating\n# so each evaluation is 9 rows long\nevals_pub_long <- evals_pub_long %>% \n rowwise() %>% # apply function to each row\n mutate(lb_imp_ = impute_bounds(var_name = rating_type,\n est = est_,\n lb = lb_, ub = ub_, conf = conf_,\n bound_type = \"lower\")) %>% \n mutate(ub_imp_ = impute_bounds(var_name = rating_type,\n est = est_,\n lb = lb_, ub = ub_, conf = conf_,\n bound_type = \"upper\"))\n\n# Reshape evals_pub_long into evals_pub to add imputed bounds\nevals_pub <- evals_pub_long %>% \n pivot_wider(names_from = rating_type, # take the dataframe back to old format\n values_from = c(est_, ub_, lb_, conf_, lb_imp_, ub_imp_),\n names_sep = \"\") %>% \n dplyr::rename_with(.cols = matches(\"^[ul]b_imp\"),\n .fn = gsub,\n pattern = \"(ub_imp|lb_imp)_(.+)\", \n replacement = \"\\\\2_\\\\1\") %>% \n dplyr::rename_with(.cols = starts_with(\"est_\"),\n .fn = gsub,\n pattern = \"est_(.+)\",\n replacement = \"\\\\1\")\n\n# Clean evals_pub_long names (remove _ at end)\nevals_pub_long <- evals_pub_long %>% \n rename_with(.cols = ends_with(\"_\"),\n .fn = str_remove,\n pattern = \"_$\")\n```\n:::\n\n\n\nWe cannot publicly share the 'papers under consideration', but we can share some of the statistics on these papers. Let's generate an ID (or later, salted hash) for each such paper, and keep only the shareable features of interest\n\n\n::: {.cell}\n\n```{.r .cell-code}\nall_papers_p <- all_pub_records %>% \n dplyr::select(\n id,\n category,\n cfdc_DR,\n 'confidence -- user entered',\n cfdc_assessor,\n avg_cfdc,\n category,\n cause_cat_1_text,\n cause_cat_2_text,\n topic_subfield_text,\n eval_manager_text,\n 'publication status',\n 'Contacted author?',\n 'stage of process/todo',\n 'source_main', \n 'author permission?',\n'Direct Kotahi Prize Submission?',\n 'createdTime' \n )\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a dataset to be used as the input to a shiny app\n\n#| include: false\n#| \nevals_pub_long %>% \n write_rds(file = here(\"shinyapp/DataExplorer\", \"shiny_explorer.rds\"))\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"save data for others' use\"}\nall_papers_p %>% saveRDS(file = here(\"data\", \"all_papers_p.Rdata\"))\nall_papers_p %>% write_csv(file = here(\"data\", \"all_papers_p.csv\"))\n\nevals_pub %>% saveRDS(file = here(\"data\", \"evals.Rdata\"))\nevals_pub %>% write_csv(file = here(\"data\", \"evals.csv\"))\n\nevals_pub_long %>% write_rds(file = here(\"data\", \"evals_long.rds\"))\nevals_pub_long %>% write_csv(file = here(\"data\", \"evals_long.csv\"))\n\n#evals_pub %>% readRDS(file = here(\"data\", \"evals.Rdata\"))\n```\n:::\n", + "markdown": "# Evaluation data: input/features\n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"load packages\"}\nlibrary(tidyverse) \n\n# data acquisition ----\n#devtools::install_github(\"bergant/airtabler\")\nlibrary(airtabler)\n\n# data cleaning & shaping ----\n\n# data analysis ----\n# library(lme4)\n# library(lmtest) # Testing Linear Regression Models\n\n# markdown et al. ----\nlibrary(knitr)\nlibrary(bookdown)\nlibrary(quarto)\nlibrary(formattable) # Create 'Formattable' Data Structures\n\n# others ----\nlibrary(here) # A Simpler Way to Find Your Files\n#devtools::install_github(\"metamelb-repliCATS/aggreCAT\")\n#library(aggrecat)\n\n# Make sure select is always the dplyr version\nselect <- dplyr::select \n\n# options\noptions(knitr.duplicate.label = \"allow\")\n```\n:::\n\n\n\n::: {.callout-note collapse=\"true\"}\n## Note on data input (10-Aug-23)\n\nBelow, the evaluation data is input from an Airtable, which itself was largely hand-input from evaluators' reports. As PubPub builds (target: end of Sept. 2023), this will allow us to include the ratings and predictions as structured data objects. We then plan to access and input this data *directly* from the PubPub (API?) into the present analysis. This will improve automation and limit the potential for data entry errors.\n\n::: \n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"input from airtable\"}\nbase_id <- \"appbPYEw9nURln7Qg\"\n\n\n# Set your Airtable API key \nSys.setenv(AIRTABLE_API_KEY = Sys.getenv(\"AIRTABLE_API_KEY\"))\n#this should be set in my .Renviron file\n\n# Read data from a specific view\n\nevals <- air_get(base = base_id, \"output_eval\") \n\nall_pub_records <- data.frame()\npub_records <- air_select(base = base_id, table = \"crucial_research\")\n\n# Append the records to the list\nall_pub_records <- bind_rows(all_pub_records, pub_records)\n\n# While the length of the records list is 100 (the maximum), fetch more records\nwhile(nrow(pub_records) == 100) {\n # Get the ID of the last record in the list\n offset <- get_offset(pub_records)\n \n # Fetch the next 100 records, starting after the last ID\n pub_records <- air_select(base = base_id, table = \"crucial_research\", offset = offset)\n \n # Append the records to the df\n all_pub_records <- bind_rows(all_pub_records, pub_records)\n}\n\n# housekeeping\nrm(pub_records)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"just the useful and publish-able data, clean a bit\"}\n# clean evals names to snakecase\ncolnames(evals) <- snakecase::to_snake_case(colnames(evals))\n\nevals_pub <- evals %>% \n dplyr::rename(stage_of_process = stage_of_process_todo_from_crucial_research_2) %>% \n mutate(stage_of_process = unlist(stage_of_process)) %>% \n dplyr::filter(stage_of_process == \"published\") %>% \n select(id, \n crucial_research, \n paper_abbrev, \n evaluator_name, \n category, \n source_main, \n author_agreement, \n overall, \n lb_overall, \n ub_overall, \n conf_index_overall, \n advancing_knowledge_and_practice, \n lb_advancing_knowledge_and_practice, \n ub_advancing_knowledge_and_practice, \n conf_index_advancing_knowledge_and_practice,\n methods_justification_reasonableness_validity_robustness,\n lb_methods_justification_reasonableness_validity_robustness,\n ub_methods_justification_reasonableness_validity_robustness,\n conf_index_methods_justification_reasonableness_validity_robustness, \n logic_communication, lb_logic_communication, ub_logic_communication, \n conf_index_logic_communication,\n engaging_with_real_world_impact_quantification_practice_realism_and_relevance,\n lb_engaging_with_real_world_impact_quantification_practice_realism_and_relevance,\n ub_engaging_with_real_world_impact_quantification_practice_realism_and_relevance,\n conf_index_engaging_with_real_world_impact_quantification_practice_realism_and_relevance,\n relevance_to_global_priorities, \n lb_relevance_to_global_priorities, \n ub_relevance_to_global_priorities, \n conf_index_relevance_to_global_priorities, \n journal_quality_predict, \n lb_journal_quality_predict, \n ub_journal_quality_predict,\n conf_index_journal_quality_predict, \n open_collaborative_replicable, \n conf_index_open_collaborative_replicable, \n lb_open_collaborative_replicable, \n ub_open_collaborative_replicable, \n merits_journal, \n lb_merits_journal, \n ub_merits_journal, \n conf_index_merits_journal)\n\n# shorten names (before you expand into columns)\nnew_names <- c(\n \"eval_name\" = \"evaluator_name\",\n \"cat\" = \"category\",\n \"crucial_rsx\" = \"crucial_research\",\n \"conf_overall\" = \"conf_index_overall\",\n \"adv_knowledge\" = \"advancing_knowledge_and_practice\",\n \"lb_adv_knowledge\" = \"lb_advancing_knowledge_and_practice\",\n \"ub_adv_knowledge\" = \"ub_advancing_knowledge_and_practice\",\n \"conf_adv_knowledge\" = \"conf_index_advancing_knowledge_and_practice\",\n \"methods\" = \"methods_justification_reasonableness_validity_robustness\",\n \"lb_methods\" = \"lb_methods_justification_reasonableness_validity_robustness\",\n \"ub_methods\" = \"ub_methods_justification_reasonableness_validity_robustness\",\n \"conf_methods\" = \"conf_index_methods_justification_reasonableness_validity_robustness\",\n \"logic_comms\" = \"logic_communication\",\n \"lb_logic_comms\" = \"lb_logic_communication\",\n \"ub_logic_comms\" = \"ub_logic_communication\",\n \"conf_logic_comms\" = \"conf_index_logic_communication\",\n \"real_world\" = \"engaging_with_real_world_impact_quantification_practice_realism_and_relevance\",\n \"lb_real_world\" = \"lb_engaging_with_real_world_impact_quantification_practice_realism_and_relevance\",\n \"ub_real_world\" = \"ub_engaging_with_real_world_impact_quantification_practice_realism_and_relevance\",\n \"conf_real_world\" = \"conf_index_engaging_with_real_world_impact_quantification_practice_realism_and_relevance\",\n \"gp_relevance\" = \"relevance_to_global_priorities\",\n \"lb_gp_relevance\" = \"lb_relevance_to_global_priorities\",\n \"ub_gp_relevance\" = \"ub_relevance_to_global_priorities\",\n \"conf_gp_relevance\" = \"conf_index_relevance_to_global_priorities\",\n \"journal_predict\" = \"journal_quality_predict\",\n \"lb_journal_predict\" = \"lb_journal_quality_predict\",\n \"ub_journal_predict\" = \"ub_journal_quality_predict\",\n \"conf_journal_predict\" = \"conf_index_journal_quality_predict\",\n \"open_sci\" = \"open_collaborative_replicable\",\n \"conf_open_sci\" = \"conf_index_open_collaborative_replicable\",\n \"lb_open_sci\" = \"lb_open_collaborative_replicable\",\n \"ub_open_sci\" = \"ub_open_collaborative_replicable\",\n \"conf_merits_journal\" = \"conf_index_merits_journal\"\n)\n\nevals_pub <- evals_pub %>%\n rename(!!!new_names)\n\n# Create a list of labels with the old, longer names\nlabels <- str_replace_all(new_names, \"_\", \" \") %>% str_to_title()\n\n# Assign labels to the dataframe / tibble\n# (maybe this can be done as an attribute, not currently working)\n# for(i in seq_along(labels)) {\n# col_name <- new_names[names(new_names)[i]]\n# label <- labels[i]\n# attr(evals_pub[[col_name]], \"label\") <- label\n# }\n\n\n# expand categories into columns, unlist everything\nevals_pub %<>%\n tidyr::unnest_wider(cat, names_sep = \"_\") %>% # give each of these its own col\n mutate(across(everything(), unlist)) # maybe check why some of these are lists in the first place\n \n\n# clean the Anonymous names\nevals_pub$eval_name <- ifelse(\n grepl(\"^\\\\b\\\\w+\\\\b$|\\\\bAnonymous\\\\b\", evals_pub$eval_name),\n paste0(\"Anonymous_\", seq_along(evals_pub$eval_name)),\n evals_pub$eval_name\n)\n\n#housekeeping\nrm(evals)\n\n#Todo -- check the unlist is not propagating the entry\n#Note: category, topic_subfield, and source have multiple meaningful categories. These will need care \n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nevals_pub_long <- evals_pub %>% \n pivot_longer(cols = -c(id, crucial_rsx, paper_abbrev, eval_name, \n cat_1,cat_2, cat_3,source_main,author_agreement),\n names_pattern = \"(lb_|ub_|conf_)?(.+)\",\n names_to = c(\"value_type\", \"rating_type\")) %>% # one line per rating type\n mutate(value_type = if_else(value_type == \"\", \"est_\", value_type)) %>% #add main rating id\n pivot_wider(names_from = value_type, \n values_from = value)\n```\n:::\n\n\n\n\n\n### Reconcile uncertainty ratings and CIs {-}\n\nWhere people gave only confidence level 'dots', we impute CIs (confidence/credible intervals). We follow the correspondence described [here](https://effective-giving-marketing.gitbook.io/unjournal-x-ea-and-global-priorities-research/policies-projects-evaluation-workflow/evaluation/guidelines-for-evaluators#1-5-dots-explanation-and-relation-to-cis). (Otherwise, where they gave actual CIs, we use these.)^[Note this is only a first-pass; a more sophisticated approach may be warranted in future.]\n\n::: {.callout-note collapse=\"true\"}\n## Dots to interval choices\n\n> 5 = Extremely confident, i.e., 90% confidence interval spans +/- 4 points or less)\n\nFor 0-100 ratings, code the LB as $max(R - 4,0)$ and the UB as $min(R + 4,100)$, where R is the stated (middle) rating. This 'scales' the CI, as interpreted, to be proportional to the rating, with a maximum 'interval' of about 8, with the rating is about 96.\n\n> 4 = Very*confident: 90% confidence interval +/- 8 points or less\n\nFor 0-100 ratings, code the LB as $max(R - 8,0)$ and the UB as $min(R + 8,100)$, where R is the stated (middle) rating. \n\n> 3 = Somewhat** confident: 90% confidence interval +/- 15 points or less \n\n> 2 = Not very** confident: 90% confidence interval, +/- 25 points or less\n\nComparable scaling for the 2-3 ratings as for the 4 and 5 rating.\n\n> 1 = Not** confident: (90% confidence interval +/- more than 25 points)\n \nCode LB as $max(R - 37.5,0)$ and the UB as $min(R + 37.5,100)$. \n \nThis is just a first-pass. There might be a more information-theoretic way of doing this. On the other hand, we might be switching the evaluations to use a different tool soon, perhaps getting rid of the 1-5 confidence ratings altogether.\n\n::: \n\n\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"reconcile explicit bounds and stated confidence level\"}\n# Define the baseline widths for each confidence rating\n# JB: it would be good to have some more backing of whether this\n# is a valid way to translate these confidence levels into CIs\n\n# baseline_widths <- c(4, 8, 15, 25, 37.5)\n\n# Lists of categories\nrating_cats <- c(\"overall\", \"adv_knowledge\", \"methods\", \"logic_comms\", \"real_world\", \"gp_relevance\", \"open_sci\")\n\n#... 'predictions' are currently 1-5 (0-5?)\npred_cats <- c(\"journal_predict\", \"merits_journal\")\n\n#DR: Note that I use these objects in future chapters, but they are not connected to the data frame. Either one saves and reinputs the whole environment (messy, replicability issue), or you save this somewhere and re-input it or connect it to the data frame that gets saved (not sure how to do it), or you hard-code reinput it in the next chapter. \n\n#I do the latter for now, but I'm not happy about it, because the idea is 'input definitions in a single place to use later'\n\n#JB: I don't really understand why not just hard code it in the next chapter. These are very short strings. Do you expect them to change often? If so, we can derive them from a dataframe somewhere in the future or save as a separate object. \n\n\n# JB: Rewritten functions for adding imputed ub and lb\n\n# calculate the lower and upper bounds, \n# rating: given a rating, \n# conf: a confidence (1-5) score,\n# type: a bound type (lower, upper),\n# scale: a scale (100 is 0-100, 5 is 1-5)\n# This function is not vectorized\ncalc_bounds <- function(rating, conf, type, scale) {\n \n if(scale == 5){ #for the 'journal tier prediction case'\n baseline_width = case_match(conf, \n 5 ~ .2, #4*5/100\n 4 ~ .4, #8*5/100\n 3 ~ .75, #15*5/100\n 2 ~ 1.25, #25*5/100\n 1 ~ 1.875, #37.5*5/100\n .default = NA_real_) \n\n upper = min(rating + baseline_width, 5)\n lower = max(rating - baseline_width, 1)\n }#/if(scale == 5)\n \n if(scale == 100){ #for the 'ratings case'\n \n baseline_width = case_match(conf, \n 5 ~ 4, \n 4 ~ 8, \n 3 ~ 15, \n 2 ~ 25, \n 1 ~ 37.5,\n .default = NA_real_)\n \n upper = min(rating + baseline_width, 100)\n lower = max(rating - baseline_width, 0)\n } #/if(scale == 100)\n \n if(type == \"lower\") return(lower)\n if(type == \"upper\") return(upper)\n}\n\n\n# calculate or find correct lower or upper bound\n# based on rating type, and lb, ub, and conf values\nimpute_bounds <- function(var_name, est, lb, ub, conf, bound_type) {\n \n # get scale of each variable\n scale = if_else(var_name %in% c(\"journal_predict\", \"merits_journal\"), # if variable is a prediction\n 5, 100) #scale is 5, else scale is 100\n # if calculating lower bound variable\n if(bound_type == \"lower\") { #we are calculating a lower bound imputation\n # \n calculated_bound = map_dbl(.x = est, .f = calc_bounds, conf = conf, type = bound_type, scale = scale)\n \n imp_bound = if_else(is.na(lb), calculated_bound, lb)\n }\n \n # if calculating upper bound variable\n if(bound_type == \"upper\") { #we are calculating an upper bound imputation\n # \n calculated_bound = map_dbl(.x = est, .f = calc_bounds, conf = conf, type = bound_type, scale = scale)\n imp_bound = if_else(is.na(ub), calculated_bound, ub)\n }\n \n return(imp_bound)\n}\n\n# apply functions to evals_pub_long\n# where each row is one type of rating\n# so each evaluation is 9 rows long\nevals_pub_long <- evals_pub_long %>% \n rowwise() %>% # apply function to each row\n mutate(lb_imp_ = impute_bounds(var_name = rating_type,\n est = est_,\n lb = lb_, ub = ub_, conf = conf_,\n bound_type = \"lower\")) %>% \n mutate(ub_imp_ = impute_bounds(var_name = rating_type,\n est = est_,\n lb = lb_, ub = ub_, conf = conf_,\n bound_type = \"upper\"))\n\n# Reshape evals_pub_long into evals_pub to add imputed bounds\nevals_pub <- evals_pub_long %>% \n pivot_wider(names_from = rating_type, # take the dataframe back to old format\n values_from = c(est_, ub_, lb_, conf_, lb_imp_, ub_imp_),\n names_sep = \"\") %>% \n dplyr::rename_with(.cols = matches(\"^[ul]b_imp\"),\n .fn = gsub,\n pattern = \"(ub_imp|lb_imp)_(.+)\", \n replacement = \"\\\\2_\\\\1\") %>% \n dplyr::rename_with(.cols = starts_with(\"est_\"),\n .fn = gsub,\n pattern = \"est_(.+)\",\n replacement = \"\\\\1\")\n\n# Clean evals_pub_long names (remove _ at end)\nevals_pub_long <- evals_pub_long %>% \n rename_with(.cols = ends_with(\"_\"),\n .fn = str_remove,\n pattern = \"_$\")\n```\n:::\n\n\n\nWe cannot publicly share the 'papers under consideration', but we can share some of the statistics on these papers. Let's generate an ID (or later, salted hash) for each such paper, and keep only the shareable features of interest\n\n\n::: {.cell}\n\n```{.r .cell-code}\nall_papers_p <- all_pub_records %>% \n dplyr::select(\n id,\n category,\n cfdc_DR,\n 'confidence -- user entered',\n cfdc_assessor,\n avg_cfdc,\n category,\n cause_cat_1_text,\n cause_cat_2_text,\n topic_subfield_text,\n eval_manager_text,\n 'publication status',\n 'Contacted author?',\n 'stage of process/todo',\n 'source_main', \n 'author permission?',\n'Direct Kotahi Prize Submission?',\n 'createdTime' \n )\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a dataset to be used as the input to a shiny app\n\n#| include: false\n#| \nevals_pub_long %>% \n mutate(rating_type = factor(rating_type, \n levels = c(rating_cats, pred_cats),\n labels = c(\"Overall assessment\",\n \"Advances our knowledge & practice\", \n \"Methods: justification, reasonableness, validity, robustness\", \n \"Logic and communication\", \n \"Engages with real-world, impact quantification\",\n \"Relevance to global priorities\",\n \"Open, collaborative, replicable science and methods\",\n \"Predicted Journal\", \"Merits Journal\"))) %>% \n write_rds(file = here(\"shinyapp/DataExplorer\", \"shiny_explorer.rds\"))\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code code-summary=\"save data for others' use\"}\nall_papers_p %>% saveRDS(file = here(\"data\", \"all_papers_p.Rdata\"))\nall_papers_p %>% write_csv(file = here(\"data\", \"all_papers_p.csv\"))\n\nevals_pub %>% saveRDS(file = here(\"data\", \"evals.Rdata\"))\nevals_pub %>% write_csv(file = here(\"data\", \"evals.csv\"))\n\nevals_pub_long %>% write_rds(file = here(\"data\", \"evals_long.rds\"))\nevals_pub_long %>% write_csv(file = here(\"data\", \"evals_long.csv\"))\n\n#evals_pub %>% readRDS(file = here(\"data\", \"evals.Rdata\"))\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/chapters/aggregation.qmd b/chapters/aggregation.qmd index 8eae612..5b59025 100644 --- a/chapters/aggregation.qmd +++ b/chapters/aggregation.qmd @@ -159,6 +159,12 @@ aggreCAT::DistributionWAgg(expert_judgements = temp, round_2_filter = FALSE, typ ## Inter-rater reliability +Inter-rater reliability is a measure of the degree to which two or more independent raters (in our case, paper evaluators) agree. Here, the ratings are the 7 aspects of each paper that evaluators were asked to rate. For each paper, we can obtain one value that summarizes the agreement between the two or three evaluators. + +We use Krippendorff's alpha as a measure of interrater agreement. Krippendorff's alpha is a more flexible measure of agreement and can be used with different levels of data (categorical, ordinal, interval, and ratio) as well as different numbers of raters. The calculation of alpha in the function `kripp.alpha` was implemented by Jim Lemon in the package `irr` and is based on Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills, CA: Sage. + +Krippendorff's alpha can range from -1 to +1, and it can be interpreted similarly to a correlation: values closer to +1 indicate excellent agreement between evaluators; values closer to 0 indicate there is no agreement between evaluators; and negative values indicate that there is systematic disagreement between evaluators, such that ratings are reversed -- where a given evaluator tends to rate something as high, the other(s) tend to rate it as low, and vice versa. + ```{r} #| echo: false #| fig-height: 8 diff --git a/chapters/evaluation_data_input.qmd b/chapters/evaluation_data_input.qmd index 2de334a..f464f9d 100644 --- a/chapters/evaluation_data_input.qmd +++ b/chapters/evaluation_data_input.qmd @@ -412,6 +412,16 @@ all_papers_p <- all_pub_records %>% #| include: false #| evals_pub_long %>% + mutate(rating_type = factor(rating_type, + levels = c(rating_cats, pred_cats), + labels = c("Overall assessment", + "Advances our knowledge & practice", + "Methods: justification, reasonableness, validity, robustness", + "Logic and communication", + "Engages with real-world, impact quantification", + "Relevance to global priorities", + "Open, collaborative, replicable science and methods", + "Predicted Journal", "Merits Journal"))) %>% write_rds(file = here("shinyapp/DataExplorer", "shiny_explorer.rds")) ``` diff --git a/data/all_papers_p.Rdata b/data/all_papers_p.Rdata index 2665dcc..6bd90e7 100644 Binary files a/data/all_papers_p.Rdata and b/data/all_papers_p.Rdata differ diff --git a/data/all_papers_p.csv b/data/all_papers_p.csv index c13e3e4..e636d83 100644 --- a/data/all_papers_p.csv +++ b/data/all_papers_p.csv @@ -41,6 +41,7 @@ recIlJDSYJtfwrUVE,,NA,NA,NA,,NA,NA,"Development and Growth, Growth and productiv recJNtBg4lyBwy2Yh,,0.45,NA,0.6,,"""Attitudes and behaviors (Altruism, moral circles, animal consumption, effectiveness, political attitudes, etc.)""",Possible category: more 'economics' treatment of preferences for giving and 'ethical/donation-linked' choices; implications of these preferences; donor coordination etc.,"""Health, Education, and Welfare"", Poverty and Wellbeing","David Reinstein, Jonathan Berman",Unpublished working paper,NA,NA,internal-NBER,NA,NA,2022-10-26T14:25:41.000Z recJpc6KiWth0JTQP,,NA,NA,NA,,Markets for products with large externalities (focus: animal agriculture),NA,NA,NA,NA,NA,non-academic stream possibility,suggested - externally - NGO,NA,NA,2023-07-31T16:18:12.000Z recKL2IpTca2x2MID,,0.5700000000000001,NA,NA,,"""Health and well-being (global, including wealthy countries)""","""Innovation, meta-science, and research""","""Health, Education, and Welfare"", Health",NA,NA,NA,NA,internal-NBER,NA,NA,2022-11-23T01:00:41.000Z +recKgrthW975DCFti,,1,NA,1,,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2023-09-15T16:10:50.000Z recLC36l549oXU64R,,0.6,NA,NA,,"""Communicable diseases, bio-security and pandemic preparedness, biological risks""",NA,Health,NA,"Published, ~top journal",NA,NA,submitted,NA,TRUE,2022-12-04T18:44:20.000Z recLGKU3MIfrIxXjH,,NA,NA,0.85,,"""Global health; """"Health & well-being in low-income countries""""""",NA,"""Health, Education, and Welfare""",NA,Unpublished working paper,NA,considering,suggested - externally,needed/seeking,NA,2023-07-30T11:51:10.000Z recLlY6U68evI1tei,,0.52,NA,NA,,"""Global health; """"Health & well-being in low-income countries""""""",Economic development & governance (LMICs),"Econometrics, Experimental Design, Development and Growth","Daniela Cialfi, Ryan Briggs ",Unpublished working paper,NA,NA,internal-NBER,NA,NA,2022-10-26T14:51:25.000Z @@ -80,7 +81,7 @@ recb2OhsLTzsTFitP,,0.63,NA,0.63,,"""Attitudes and behaviors (Altruism, moral cir recbSIx4vL7A2g8L8,,0.55,NA,0.55,,Economic development & governance (LMICs),NA,NA,NA,Unpublished working paper,Authors say 'no' (but maybe revisit?),NA,suggested - internally,NA,NA,2022-06-06T18:27:46.000Z reccAvoay5ECDSGgO,,0.6,NA,0.6,,Emerging technologies: social and economic impacts (focus: AI),Economic development & governance (LMICs),"International Economics, Globalization and International Relations, Development and Growth, Innovation and R&D",NA,NA,NA,considering,internal-NBER,NA,NA,2022-10-26T14:48:38.000Z reccgH2KoqMzix1f6,,0.62,NA,0.62,,"""Innovation, meta-science, and research""",NA,"Development and Growth, Innovation and R&D",NA,"published, decent journal ",NA,considering,internal-NBER,NA,NA,2022-11-05T14:56:50.000Z -reccpREogKO0HCyXu,,0.72,NA,NA,,"""Global health; """"Health & well-being in low-income countries""""""",Economic development & governance (LMICs),"Econometrics, Experimental Design, Microeconomics, Behavioral Economics, Development and Growth, Development","Anirudh Tagat, Hansika Kapoor",NA,Acknowledged,published,internal-NBER,NA,NA,2022-11-23T01:42:35.000Z +reccpREogKO0HCyXu,,0.72,NA,NA,,"""Global health; """"Health & well-being in low-income countries""""""",Economic development & governance (LMICs),"Econometrics, Experimental Design, Microeconomics, Behavioral Economics, Development and Growth, Development","x, Hansika Kapoor",NA,Acknowledged,published,internal-NBER,NA,NA,2022-11-23T01:42:35.000Z recd26i1Bs8QsR3P3,,0.64,NA,0.64,,"""Other: Economics, growth, policy, global markets and population """,NA,NA,NA,NA,Author suggested but need to confirm with co-authors,considering,suggested - externally,NA,NA,2022-07-17T10:53:38.000Z recdaNJkDpIpcbBBU,,0.58,NA,0.58,,"""Attitudes and behaviors (Altruism, moral circles, animal consumption, effectiveness, political attitudes, etc.)""",NA,NA,NA,NA,NA,NA,suggested - externally - NGO,NA,NA,2022-10-26T14:23:01.000Z receXGQBDjKhWzD6X,,0.54,NA,NA,,"""International cooperation and conflict, behavior of large governments; authoritarianism """,Economic development & governance (LMICs),NA,NA,NA,NA,NA,internal-NBER,NA,NA,2023-06-08T23:24:10.000Z @@ -106,7 +107,7 @@ recmJNyeaL60b0UvU,,NA,NA,0.65,,"""Global health; """"Health & well-being in low- recmXqAzzj4MlJbwC,,0.6,NA,NA,,Economic development & governance (LMICs),"""Global health; """"Health & well-being in low-income countries""""""",NA,NA,NA,NA,NA,internal-NBER,NA,NA,2023-06-08T23:29:58.000Z recmatMogeg5trJZR,,0.58,NA,NA,,"""Global health; """"Health & well-being in low-income countries""""""","""The Environment: Harm to human health, biodiversity, risks and climate change""",NA,NA,NA,NA,NA,suggested - externally,NA,NA,2023-06-09T22:12:51.000Z recmk7X0k0H5kCYvN,,0.57,NA,NA,,"""Other: Economics, growth, policy, global markets and population """,Economic development & governance (LMICs),Migration,NA,NA,NA,NA,internal-NBER,NA,NA,2023-06-08T12:49:19.000Z -recokbZR7Lm4IU815,,0.58,NA,0.45,,"""Innovation, meta-science, and research""",NA,"Development and Growth, Public Economics, ""Health, Education, and Welfare"", Innovation and R&D, Education","Gavin Taylor , Alexander Herwix",NA,NA,NA,internal-NBER,NA,NA,2022-11-05T15:30:16.000Z +recokbZR7Lm4IU815,,0.58,NA,0.45,,"""Innovation, meta-science, and research""",NA,"Development and Growth, Public Economics, ""Health, Education, and Welfare"", Innovation and R&D, Education","x, Alexander Herwix",NA,NA,NA,internal-NBER,NA,NA,2022-11-05T15:30:16.000Z recomNDX1PTvP5gDM,,0.58,NA,NA,,"""Global health; """"Health & well-being in low-income countries""""""",NA,"Development and Growth, Development, Environmental and Resource Economics, Environment",NA,"published, decent journal ",NA,NA,internal-NBER,NA,NA,2022-10-26T14:42:33.000Z recpBOskwOo7MB7Ze,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,non-academic stream possibility,suggested - externally - NGO,NA,NA,2023-07-31T16:18:12.000Z recpgeTZaxfK6t3H0,,0.54,NA,NA,,Markets for products with large externalities (focus: animal agriculture),Empirical methods,NA,NA,NA,NA,NA,suggested - externally - NGO,NA,NA,2022-11-22T16:03:53.000Z @@ -119,7 +120,7 @@ recsiQaf3ZTSkkXLV,,0.63,NA,NA,,"""Global health; """"Health & well-being in low- recsii9l3QRQFerkU,,1,0.8,1,,"""Catastrophic and existential risks, the long-term future, forecasting""",Emerging technologies: social and economic impacts (focus: AI),NA,NA,"Published, ? journal",Agreed,published,submitted,not needed (submitted by authors),NA,2022-05-08T03:58:56.000Z recsxlSRIz4Y1RHd3,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2023-07-31T16:18:12.000Z rect8c6gbgVnvz6Zt,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2023-08-14T19:32:14.000Z -rectfSMcCGKrVVtuw,,0.63,NA,NA,,"""Global health; """"Health & well-being in low-income countries""""""",Economic development & governance (LMICs),"Public Economics, ""Health, Education, and Welfare"", Poverty and Wellbeing, Labor Economics, Demography and Aging, Labor Supply and Demand, Development and Growth, Development","Hansika Kapoor, Anirudh Tagat",NA,Emailed,published,internal-NBER,NA,NA,2022-11-23T01:51:58.000Z +rectfSMcCGKrVVtuw,,0.63,NA,NA,,"""Global health; """"Health & well-being in low-income countries""""""",Economic development & governance (LMICs),"Public Economics, ""Health, Education, and Welfare"", Poverty and Wellbeing, Labor Economics, Demography and Aging, Labor Supply and Demand, Development and Growth, Development","Hansika Kapoor, x",NA,Emailed,published,internal-NBER,NA,NA,2022-11-23T01:51:58.000Z rectim9KLJ6yQ1Goa,,0.56,NA,0.56,,"""Catastrophic and existential risks, the long-term future, forecasting""",NA,NA,NA,"Published, ~top journal",NA,NA,internal-from-syllabus-agenda-policy-database,NA,NA,2022-04-15T15:57:47.000Z rectraxzNjDb0cDbU,,NA,NA,NA,,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2023-08-22T20:29:40.000Z recuWvH8xC8jK5Rtk,,NA,NA,1,,"""Global health; """"Health & well-being in low-income countries""""""",Economic development & governance (LMICs),Health,Ryan Briggs ,NA,Acknowledged,de-prioritized,suggested - externally - NGO,not needed (Unjournal Direct),NA,2023-06-08T12:48:58.000Z diff --git a/docs/chapters/aggregation.html b/docs/chapters/aggregation.html index 160c3a7..59bcdbe 100644 --- a/docs/chapters/aggregation.html +++ b/docs/chapters/aggregation.html @@ -2,21 +2,22 @@ - + The Unjournal evaluations: data and analysis - 3  Aggregation of evaluators judgments (modeling) @@ -121,22 +92,25 @@
-
-
+ +
-

+

3  Aggregation of evaluators judgments (modeling)

@@ -201,18 +181,19 @@

+

3.1 Notes on sources and approaches

-
+
-
+
-
+

3.2 Inter-rater reliability

+

Inter-rater reliability is a measure of the degree to which two or more independent raters (in our case, paper evaluators) agree. Here, the ratings are the 7 aspects of each paper that evaluators were asked to rate. For each paper, we can obtain one value that summarizes the agreement between the two or three evaluators.

+

We use Krippendorff’s alpha as a measure of interrater agreement. Krippendorff’s alpha is a more flexible measure of agreement and can be used with different levels of data (categorical, ordinal, interval, and ratio) as well as different numbers of raters. The calculation of alpha in the function kripp.alpha was implemented by Jim Lemon in the package irr and is based on Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills, CA: Sage.

+

Krippendorff’s alpha can range from -1 to +1, and it can be interpreted similarly to a correlation: values closer to +1 indicate excellent agreement between evaluators; values closer to 0 indicate there is no agreement between evaluators; and negative values indicate that there is systematic disagreement between evaluators, such that ratings are reversed – where a given evaluator tends to rate something as high, the other(s) tend to rate it as low, and vice versa.

@@ -398,9 +382,23 @@

icon: icon }; anchorJS.add('.anchored'); + const isCodeAnnotation = (el) => { + for (const clz of el.classList) { + if (clz.startsWith('code-annotation-')) { + return true; + } + } + return false; + } const clipboard = new window.ClipboardJS('.code-copy-button', { - target: function(trigger) { - return trigger.previousElementSibling; + text: function(trigger) { + const codeEl = trigger.previousElementSibling.cloneNode(true); + for (const childEl of codeEl.children) { + if (isCodeAnnotation(childEl)) { + childEl.remove(); + } + } + return codeEl.innerText; } }); clipboard.on('success', function(e) { @@ -412,7 +410,24 @@

button.classList.add('code-copy-button-checked'); var currentTitle = button.getAttribute("title"); button.setAttribute("title", "Copied!"); + let tooltip; + if (window.bootstrap) { + button.setAttribute("data-bs-toggle", "tooltip"); + button.setAttribute("data-bs-placement", "left"); + button.setAttribute("data-bs-title", "Copied!"); + tooltip = new bootstrap.Tooltip(button, + { trigger: "manual", + customClass: "code-copy-button-tooltip", + offset: [0, -8]}); + tooltip.show(); + } setTimeout(function() { + if (tooltip) { + tooltip.hide(); + button.removeAttribute("data-bs-title"); + button.removeAttribute("data-bs-toggle"); + button.removeAttribute("data-bs-placement"); + } button.setAttribute("title", currentTitle); button.classList.remove('code-copy-button-checked'); }, 1000); @@ -499,24 +514,128 @@

return note.innerHTML; }); } + let selectedAnnoteEl; + const selectorForAnnotation = ( cell, annotation) => { + let cellAttr = 'data-code-cell="' + cell + '"'; + let lineAttr = 'data-code-annotation="' + annotation + '"'; + const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; + return selector; + } + const selectCodeLines = (annoteEl) => { + const doc = window.document; + const targetCell = annoteEl.getAttribute("data-target-cell"); + const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); + const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); + const lines = annoteSpan.getAttribute("data-code-lines").split(","); + const lineIds = lines.map((line) => { + return targetCell + "-" + line; + }) + let top = null; + let height = null; + let parent = null; + if (lineIds.length > 0) { + //compute the position of the single el (top and bottom and make a div) + const el = window.document.getElementById(lineIds[0]); + top = el.offsetTop; + height = el.offsetHeight; + parent = el.parentElement.parentElement; + if (lineIds.length > 1) { + const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); + const bottom = lastEl.offsetTop + lastEl.offsetHeight; + height = bottom - top; + } + if (top !== null && height !== null && parent !== null) { + // cook up a div (if necessary) and position it + let div = window.document.getElementById("code-annotation-line-highlight"); + if (div === null) { + div = window.document.createElement("div"); + div.setAttribute("id", "code-annotation-line-highlight"); + div.style.position = 'absolute'; + parent.appendChild(div); + } + div.style.top = top - 2 + "px"; + div.style.height = height + 4 + "px"; + let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); + if (gutterDiv === null) { + gutterDiv = window.document.createElement("div"); + gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); + gutterDiv.style.position = 'absolute'; + const codeCell = window.document.getElementById(targetCell); + const gutter = codeCell.querySelector('.code-annotation-gutter'); + gutter.appendChild(gutterDiv); + } + gutterDiv.style.top = top - 2 + "px"; + gutterDiv.style.height = height + 4 + "px"; + } + selectedAnnoteEl = annoteEl; + } + }; + const unselectCodeLines = () => { + const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; + elementsIds.forEach((elId) => { + const div = window.document.getElementById(elId); + if (div) { + div.remove(); + } + }); + selectedAnnoteEl = undefined; + }; + // Attach click handler to the DT + const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); + for (const annoteDlNode of annoteDls) { + annoteDlNode.addEventListener('click', (event) => { + const clickedEl = event.target; + if (clickedEl !== selectedAnnoteEl) { + unselectCodeLines(); + const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); + if (activeEl) { + activeEl.classList.remove('code-annotation-active'); + } + selectCodeLines(clickedEl); + clickedEl.classList.add('code-annotation-active'); + } else { + // Unselect the line + unselectCodeLines(); + clickedEl.classList.remove('code-annotation-active'); + } + }); + } + const findCites = (el) => { + const parentEl = el.parentElement; + if (parentEl) { + const cites = parentEl.dataset.cites; + if (cites) { + return { + el, + cites: cites.split(' ') + }; + } else { + return findCites(el.parentElement) + } + } else { + return undefined; + } + }; var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]'); for (var i=0; i

diff --git a/docs/chapters/evaluation_data_analysis.html b/docs/chapters/evaluation_data_analysis.html index cad86cd..96fbbab 100644 --- a/docs/chapters/evaluation_data_analysis.html +++ b/docs/chapters/evaluation_data_analysis.html @@ -223,7 +223,7 @@ #library(aggreCAT) # Make sure select is always the dplyr version -select <- dplyr::select +select <- dplyr::select # options options(knitr.duplicate.label = "allow") @@ -247,8 +247,8 @@
-
Input evaluation data
evals_pub <- readRDS(file = here("data", "evals.Rdata"))
-all_papers_p <- readRDS(file = here("data", "all_papers_p.Rdata"))
+
Input evaluation data
evals_pub <- readRDS(file = here("data", "evals.Rdata"))
+all_papers_p <- readRDS(file = here("data", "all_papers_p.Rdata"))
@@ -266,13 +266,13 @@

In the interactive table below we give some key attributes of the papers and the evaluators.

-
Code
evals_pub_df_overview <- evals_pub %>%
-  arrange(paper_abbrev, eval_name) %>%
-  dplyr::select(paper_abbrev, crucial_rsx, eval_name, cat_1, cat_2, source_main, author_agreement) %>%
-  dplyr::select(-matches("ub_|lb_|conf")) 
+
Code
evals_pub_df_overview <- evals_pub %>%
+  arrange(paper_abbrev, eval_name) %>%
+  dplyr::select(paper_abbrev, crucial_rsx, eval_name, cat_1, cat_2, source_main, author_agreement) %>%
+  dplyr::select(-matches("ub_|lb_|conf")) 
 
-evals_pub_df_overview %>%   
-   rename(
+evals_pub_df_overview %>%   
+   rename(
     "Paper Abbreviation" = paper_abbrev,
     "Paper name" = crucial_rsx,
     "Evaluator Name" = eval_name,
@@ -280,15 +280,15 @@
     "Category 2" = cat_2,
     "Main source" = source_main,
     "Author contact" = author_agreement,
-  ) %>% 
+  ) %>% 
   DT::datatable(
     caption = "Evaluations (confidence bounds not shown)", 
     filter = 'top',
     rownames= FALSE,
     options = list(pageLength = 5,
-      columnDefs = list(list(width = '150px', targets = 1)))) %>% 
+      columnDefs = list(list(width = '150px', targets = 1)))) %>% 
   formatStyle(columns = 2:ncol(evals_pub_df_overview), 
-              textAlign = 'center') %>% 
+              textAlign = 'center') %>% 
 formatStyle(
     "Paper name",
     fontSize = '10px'
@@ -306,8 +306,8 @@
 

Evaluation metrics (ratings)

Code
rename_dtstuff <- function(df){
-  df %>%  
-  rename(
+  df %>%  
+  rename(
     "Paper Abbreviation" = paper_abbrev,
     "Evaluator Name" = eval_name,
     "Advancing knowledge" = adv_knowledge,
@@ -326,24 +326,24 @@
 # https://github.com/rstudio/DT/issues/29
 
 
-evals_pub_df <- evals_pub %>%
+evals_pub_df <- evals_pub %>%
   # Arrange data
-  arrange(paper_abbrev, eval_name, overall) %>%
+  arrange(paper_abbrev, eval_name, overall) %>%
   
   # Select and rename columns
-  dplyr::select(paper_abbrev, eval_name, all_of(rating_cats)) %>%
+  dplyr::select(paper_abbrev, eval_name, all_of(rating_cats)) %>%
  rename_dtstuff 
 
 
 (
- evals_pub_dt <- evals_pub_df %>%  
+ evals_pub_dt <- evals_pub_df %>%  
   # Convert to a datatable and apply styling
   datatable(
     caption = "Evaluations and predictions (confidence bounds not shown)", 
     filter = 'top',
     rownames = FALSE,
     options = list(pageLength = 5, 
-            columnDefs = list(list(width = '150px', targets = 0)))) %>% 
+            columnDefs = list(list(width = '150px', targets = 0)))) %>% 
   formatStyle(columns = 2:ncol(evals_pub_df), 
               textAlign = 'center')
 )
@@ -361,10 +361,10 @@
Data datable (all shareable relevant data)
# we didn't seem to be using all_evals_dt so I removed it to increase readability
 
 
-evals_pub %>%
-  arrange(paper_abbrev, eval_name, overall) %>%
-  dplyr::select(paper_abbrev, eval_name, all_of(rating_cats))  %>%
-  rename_dtstuff %>%  
+evals_pub %>%
+  arrange(paper_abbrev, eval_name, overall) %>%
+  dplyr::select(paper_abbrev, eval_name, all_of(rating_cats))  %>%
+  rename_dtstuff %>%  
   DT::datatable(
     caption = "Evaluations and predictions (confidence bounds not shown)", 
     filter = 'top',
@@ -384,10 +384,10 @@
 
 
Code
# we did not seem to be using all_evals_dt_ci so I removed it to improve readability
-evals_pub %>%
-  arrange(paper_abbrev, eval_name) %>%
-  dplyr::select(paper_abbrev, eval_name, conf_overall, all_of(rating_cats), matches("ub_imp|lb_imp")) %>%
-  rename_dtstuff %>% 
+evals_pub %>%
+  arrange(paper_abbrev, eval_name) %>%
+  dplyr::select(paper_abbrev, eval_name, conf_overall, all_of(rating_cats), matches("ub_imp|lb_imp")) %>%
+  rename_dtstuff %>% 
   DT::datatable(
     caption = "Evaluations and (imputed*) confidence bounds)", 
     filter = 'top',
@@ -430,26 +430,26 @@
 
Code
#Add in the 3 different evaluation input sources
 #update to be automated rather than hard-coded - to look at David's work here
 
-papers_considered <- all_papers_p %>% 
+papers_considered <- all_papers_p %>% 
   nrow()
 
-papers_deprio <- all_papers_p %>% 
-  filter(`stage of process/todo` ==  "de-prioritized") %>% 
+papers_deprio <- all_papers_p %>% 
+  filter(`stage of process/todo` ==  "de-prioritized") %>% 
   nrow()
 
-papers_evaluated <- all_papers_p %>% 
-  filter(`stage of process/todo` %in% c("published",
+papers_evaluated <- all_papers_p %>% 
+  filter(`stage of process/todo` %in% c("published",
                                         "contacting/awaiting_authors_response_to_evaluation",
-                                        "awaiting_publication_ME_comments","awaiting_evaluations")) %>% 
+                                        "awaiting_publication_ME_comments","awaiting_evaluations")) %>% 
   nrow()
 
-papers_complete <- all_papers_p %>% 
-  filter(`stage of process/todo` ==  "published") %>%
+papers_complete <- all_papers_p %>% 
+  filter(`stage of process/todo` ==  "published") %>%
   nrow()
 
 papers_in_progress <-  papers_evaluated - papers_complete
 
-papers_still_in_consideration <-  all_papers_p %>% filter(`stage of process/todo` ==  "considering") %>% nrow()
+papers_still_in_consideration <-  all_papers_p %>% filter(`stage of process/todo` ==  "considering") %>% nrow()
 
 
 #todo: adjust wording of hover notes ('source, target...etc')
@@ -481,7 +481,7 @@
       papers_deprio
     ))
 )
-fig <- fig %>% layout(
+fig <- fig %>% layout(
   title = "Unjournal paper funnel",
   font = list(
     size = 10
@@ -497,25 +497,25 @@
 

Todo: 3

Paper categories

-
Code
evals_pub %>% 
-  select(paper_abbrev, starts_with("cat_")) %>%
-  distinct() %>% 
-  pivot_longer(cols = starts_with("cat_"), names_to = "CatNum", values_to = "Category") %>% 
-  group_by(CatNum, Category) %>% 
-  count() %>% 
-  filter(!is.na(Category)) %>% 
-  mutate(Category = str_to_title(Category),
+
Code
evals_pub %>% 
+  select(paper_abbrev, starts_with("cat_")) %>%
+  distinct() %>% 
+  pivot_longer(cols = starts_with("cat_"), names_to = "CatNum", values_to = "Category") %>% 
+  group_by(CatNum, Category) %>% 
+  count() %>% 
+  filter(!is.na(Category)) %>% 
+  mutate(Category = str_to_title(Category),
          CatNum = ordered(CatNum, 
                           levels = c("cat_1", "cat_2", "cat_3"),
-                          labels = c("Primary", "Secondary", "Tertiary"))) %>%
-  ggplot(aes(x = reorder(Category, -n), y = n)) +
-  geom_bar(aes(fill = CatNum), stat = "identity", color = "grey30") + 
-  labs(x = "Paper category", y = "Count", fill = "Cat Level",
+                          labels = c("Primary", "Secondary", "Tertiary"))) %>%
+  ggplot(aes(x = reorder(Category, -n), y = n)) +
+  geom_bar(aes(fill = CatNum), stat = "identity", color = "grey30") + 
+  labs(x = "Paper category", y = "Count", fill = "Cat Level",
        title = "Paper categories represented in pilot data") +
-  theme_bw() +
-  facet_grid(~CatNum, scales="free_x", space="free_x") +
-  theme(axis.text.x=element_text(angle=45,hjust=1)) +
-  theme(legend.position = "none")
+ theme_bw() + + facet_grid(~CatNum, scales="free_x", space="free_x") + + theme(axis.text.x=element_text(angle=45,hjust=1)) + + theme(legend.position = "none")

@@ -523,20 +523,20 @@

Paper source

Code
# Bar plot
-evals_pub %>% 
-  rowwise() %>% 
-  mutate(source_main = str_replace_all(string = source_main, 
+evals_pub %>% 
+  rowwise() %>% 
+  mutate(source_main = str_replace_all(string = source_main, 
                                        pattern = "-", 
-                                       replace = " ") %>% str_to_title()) %>%
-  select(paper_abbrev, source_main) %>% 
-  distinct() %>%
-  ggplot(aes(x = source_main)) + 
-  geom_bar(position = "stack", stat = "count", color = "grey30", fill = "grey80") +
-  labs(x = "Source", y = "Count") +
-  labs(title = "Pool of research/evaluations by paper source") +
-  theme_bw() +
-  theme(text = element_text(size = 15)) +
-  scale_x_discrete(labels = function(x) str_wrap(x, width = 20))
+ replace = " ") %>% str_to_title()) %>% + select(paper_abbrev, source_main) %>% + distinct() %>% + ggplot(aes(x = source_main)) + + geom_bar(position = "stack", stat = "count", color = "grey30", fill = "grey80") + + labs(x = "Source", y = "Count") + + labs(title = "Pool of research/evaluations by paper source") + + theme_bw() + + theme(text = element_text(size = 15)) + + scale_x_discrete(labels = function(x) str_wrap(x, width = 20))

@@ -553,26 +553,26 @@ "awaiting_evaluations") # Is the paper being evaluated? -all_papers_p <- all_papers_p %>% - mutate(is_evaluated = if_else(`stage of process/todo` %in% eval_true, TRUE, FALSE)) +all_papers_p <- all_papers_p %>% + mutate(is_evaluated = if_else(`stage of process/todo` %in% eval_true, TRUE, FALSE)) # main source clean -all_papers_p <- all_papers_p %>% - mutate(source_main = case_when(source_main == "NA" ~ "Not applicable", +all_papers_p <- all_papers_p %>% + mutate(source_main = case_when(source_main == "NA" ~ "Not applicable", source_main == "internal-from-syllabus-agenda-policy-database" ~ "Internal: syllabus, agenda, etc.", is.na(source_main) ~ "Unknown", TRUE ~ source_main)) -all_papers_p %>% -ggplot(aes(x = fct_infreq(source_main), fill = is_evaluated)) + - geom_bar(position = "stack", stat = "count") + - labs(x = "Source", y = "Count", fill = "Selected for\nevaluation?") + - coord_flip() + # flipping the coordinates to have categories on y-axis (on the left) - labs(title = "Evaluations by source of the paper") + - theme_bw() + - theme(text = element_text(size = 15)) + - scale_fill_brewer(palette = "Set1") + - scale_x_discrete(labels = function(x) str_wrap(x, width = 20))
+all_papers_p %>% +ggplot(aes(x = fct_infreq(source_main), fill = is_evaluated)) + + geom_bar(position = "stack", stat = "count") + + labs(x = "Source", y = "Count", fill = "Selected for\nevaluation?") + + coord_flip() + # flipping the coordinates to have categories on y-axis (on the left) + labs(title = "Evaluations by source of the paper") + + theme_bw() + + theme(text = element_text(size = 15)) + + scale_fill_brewer(palette = "Set1") + + scale_x_discrete(labels = function(x) str_wrap(x, width = 20))

@@ -595,28 +595,28 @@ color_palette <- colorRampPalette(brewer.pal(8, "Set1"))(color_count) # set one "set" of dodge width values across layers -pd = position_dodge(width = 0.8) +pd = position_dodge(width = 0.8) # Dot plot -g1 <- evals_pub %>% - ggplot(aes(x = paper_abbrev, y = overall, +g1 <- evals_pub %>% + ggplot(aes(x = paper_abbrev, y = overall, text = paste0('Evaluator: ', eval_name, # tooltip data '<br>Rating [CI]: ', overall, " [", overall_lb_imp, ", ", overall_ub_imp, "]"))) + - geom_point(aes(color = paper_abbrev), + geom_point(aes(color = paper_abbrev), stat = "identity", size = 2, shape = 18, stroke = 1, position = pd) + - geom_linerange(aes(ymin = overall_lb_imp, ymax = overall_ub_imp, color = paper_abbrev), + geom_linerange(aes(ymin = overall_lb_imp, ymax = overall_ub_imp, color = paper_abbrev), position = pd) + - geom_text(data = subset(evals_pub, str_detect(eval_name, "Anonymous")), - aes(label = "anon."), size=3) + - coord_flip() + # flipping the coordinates to have categories on y-axis (on the left) - labs(x = "Paper", y = "Overall score", + geom_text(data = subset(evals_pub, str_detect(eval_name, "Anonymous")), + aes(label = "anon."), size=3) + + coord_flip() + # flipping the coordinates to have categories on y-axis (on the left) + labs(x = "Paper", y = "Overall score", title = "Overall scores of evaluated papers") + - theme_bw() + - theme(text = element_text(size = 15)) + - theme(legend.position = "none") + - scale_x_discrete(labels = function(x) str_wrap(x, width = 20)) + - scale_color_manual(values = color_palette) + theme_bw() + + theme(text = element_text(size = 15)) + + theme(legend.position = "none") + + scale_x_discrete(labels = function(x) str_wrap(x, width = 20)) + + scale_color_manual(values = color_palette) ggplotly(g1, tooltip = c("text"))
@@ -659,18 +659,18 @@ # sure what it's doing so I'm just turning it off for now unit.scale = function(x) (x*100 - min(x*100)) / (max(x*100) - min(x*100)) -evaluations_table <- evals_pub %>% - select(paper_abbrev, eval_name, cat_1, +evaluations_table <- evals_pub %>% + select(paper_abbrev, eval_name, cat_1, source_main, overall, adv_knowledge, - methods, logic_comms, journal_predict) %>% - arrange(desc(paper_abbrev)) + methods, logic_comms, journal_predict) %>% + arrange(desc(paper_abbrev)) -formattable( +formattable( evaluations_table, list( #area(col = 5:8) ~ function(x) percent(x / 100, digits = 0), - area(col = 5:8) ~ color_tile("#FA614B66","#3E7DCC"), - `journal_predict` = proportion_bar("#DeF7E9", unit.scale) + area(col = 5:8) ~ color_tile("#FA614B66","#3E7DCC"), + `journal_predict` = proportion_bar("#DeF7E9", unit.scale) ) )
diff --git a/docs/chapters/evaluation_data_input.html b/docs/chapters/evaluation_data_input.html index 0ebb3ed..8017b28 100644 --- a/docs/chapters/evaluation_data_input.html +++ b/docs/chapters/evaluation_data_input.html @@ -2,21 +2,22 @@ - + The Unjournal evaluations: data and analysis - 1  Evaluation data: input/features @@ -117,27 +88,30 @@ "search-detached-cancel-button-title": "Cancel", "search-submit-button-title": "Submit" } -} +}
-
-
+ +
-

+

1  Evaluation data: input/features

@@ -193,6 +173,7 @@

+
@@ -228,12 +209,12 @@

options(knitr.duplicate.label = "allow")

-
+
diff --git a/docs/index.html b/docs/index.html index 9fe6a84..09b57d9 100644 --- a/docs/index.html +++ b/docs/index.html @@ -2,23 +2,24 @@ - + - + The Unjournal evaluations: data and analysis @@ -123,20 +94,25 @@
-
-
+ +
-

The Unjournal evaluations: data and analysis

+

The Unjournal evaluations: data and analysis

@@ -204,9 +186,10 @@

The Unjournal evaluations: data and analy
Published
-

September 12, 2023

+

September 19, 2023

+

@@ -275,9 +258,23 @@

The Unjournal evaluations: data and analy icon: icon }; anchorJS.add('.anchored'); + const isCodeAnnotation = (el) => { + for (const clz of el.classList) { + if (clz.startsWith('code-annotation-')) { + return true; + } + } + return false; + } const clipboard = new window.ClipboardJS('.code-copy-button', { - target: function(trigger) { - return trigger.previousElementSibling; + text: function(trigger) { + const codeEl = trigger.previousElementSibling.cloneNode(true); + for (const childEl of codeEl.children) { + if (isCodeAnnotation(childEl)) { + childEl.remove(); + } + } + return codeEl.innerText; } }); clipboard.on('success', function(e) { @@ -289,7 +286,24 @@

The Unjournal evaluations: data and analy button.classList.add('code-copy-button-checked'); var currentTitle = button.getAttribute("title"); button.setAttribute("title", "Copied!"); + let tooltip; + if (window.bootstrap) { + button.setAttribute("data-bs-toggle", "tooltip"); + button.setAttribute("data-bs-placement", "left"); + button.setAttribute("data-bs-title", "Copied!"); + tooltip = new bootstrap.Tooltip(button, + { trigger: "manual", + customClass: "code-copy-button-tooltip", + offset: [0, -8]}); + tooltip.show(); + } setTimeout(function() { + if (tooltip) { + tooltip.hide(); + button.removeAttribute("data-bs-title"); + button.removeAttribute("data-bs-toggle"); + button.removeAttribute("data-bs-placement"); + } button.setAttribute("title", currentTitle); button.classList.remove('code-copy-button-checked'); }, 1000); @@ -376,24 +390,128 @@

The Unjournal evaluations: data and analy return note.innerHTML; }); } + let selectedAnnoteEl; + const selectorForAnnotation = ( cell, annotation) => { + let cellAttr = 'data-code-cell="' + cell + '"'; + let lineAttr = 'data-code-annotation="' + annotation + '"'; + const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; + return selector; + } + const selectCodeLines = (annoteEl) => { + const doc = window.document; + const targetCell = annoteEl.getAttribute("data-target-cell"); + const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); + const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); + const lines = annoteSpan.getAttribute("data-code-lines").split(","); + const lineIds = lines.map((line) => { + return targetCell + "-" + line; + }) + let top = null; + let height = null; + let parent = null; + if (lineIds.length > 0) { + //compute the position of the single el (top and bottom and make a div) + const el = window.document.getElementById(lineIds[0]); + top = el.offsetTop; + height = el.offsetHeight; + parent = el.parentElement.parentElement; + if (lineIds.length > 1) { + const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); + const bottom = lastEl.offsetTop + lastEl.offsetHeight; + height = bottom - top; + } + if (top !== null && height !== null && parent !== null) { + // cook up a div (if necessary) and position it + let div = window.document.getElementById("code-annotation-line-highlight"); + if (div === null) { + div = window.document.createElement("div"); + div.setAttribute("id", "code-annotation-line-highlight"); + div.style.position = 'absolute'; + parent.appendChild(div); + } + div.style.top = top - 2 + "px"; + div.style.height = height + 4 + "px"; + let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); + if (gutterDiv === null) { + gutterDiv = window.document.createElement("div"); + gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); + gutterDiv.style.position = 'absolute'; + const codeCell = window.document.getElementById(targetCell); + const gutter = codeCell.querySelector('.code-annotation-gutter'); + gutter.appendChild(gutterDiv); + } + gutterDiv.style.top = top - 2 + "px"; + gutterDiv.style.height = height + 4 + "px"; + } + selectedAnnoteEl = annoteEl; + } + }; + const unselectCodeLines = () => { + const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; + elementsIds.forEach((elId) => { + const div = window.document.getElementById(elId); + if (div) { + div.remove(); + } + }); + selectedAnnoteEl = undefined; + }; + // Attach click handler to the DT + const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); + for (const annoteDlNode of annoteDls) { + annoteDlNode.addEventListener('click', (event) => { + const clickedEl = event.target; + if (clickedEl !== selectedAnnoteEl) { + unselectCodeLines(); + const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); + if (activeEl) { + activeEl.classList.remove('code-annotation-active'); + } + selectCodeLines(clickedEl); + clickedEl.classList.add('code-annotation-active'); + } else { + // Unselect the line + unselectCodeLines(); + clickedEl.classList.remove('code-annotation-active'); + } + }); + } + const findCites = (el) => { + const parentEl = el.parentElement; + if (parentEl) { + const cites = parentEl.dataset.cites; + if (cites) { + return { + el, + cites: cites.split(' ') + }; + } else { + return findCites(el.parentElement) + } + } else { + return undefined; + } + }; var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]'); for (var i=0; i