Skip to content

Latest commit

 

History

History
36 lines (26 loc) · 1.21 KB

QUESTIONS.md

File metadata and controls

36 lines (26 loc) · 1.21 KB
  • How often is data from repositories used in the published literature? What is the distribution of use across datasets and time?

    • rates of reuse by repository (histograms, ANOVA) PCA of histogram bin values - which repos have citation frequency distributions which are the most similar

    • cumulative citations over time

      • needed: citation dates for each dataset (from Web of Science or Scopus)
      • regression: IV=time, DV=cumulative citations
  • AUTHORS: Who reuses data? Are investigators who reuse repository datasets similar to investigators who deposit data?

    • things to compare:

      • author department (cat)
      • author country (cat)
      • author institution (cat)
    • chi squared: IV=data vs reuse, DV=(dept, country, institution)

  • STUDIES: What is data reused for? How similar are studies that reuse data to studies that deposit data?

    • things to compare:

      • keywords
      • number of authors
      • author institution
      • author country
    • will require keyword data

    • how similar (in multivariate space, by keywords) are citing papers to the papers they cite?

    • figure 2 from proposal, topic cooccurrence network