-
How often is data from repositories used in the published literature? What is the distribution of use across datasets and time?
-
rates of reuse by repository (histograms, ANOVA) PCA of histogram bin values - which repos have citation frequency distributions which are the most similar
-
cumulative citations over time
- needed: citation dates for each dataset (from Web of Science or Scopus)
- regression: IV=time, DV=cumulative citations
-
-
AUTHORS: Who reuses data? Are investigators who reuse repository datasets similar to investigators who deposit data?
-
things to compare:
- author department (cat)
- author country (cat)
- author institution (cat)
-
chi squared: IV=data vs reuse, DV=(dept, country, institution)
-
-
STUDIES: What is data reused for? How similar are studies that reuse data to studies that deposit data?
-
things to compare:
- keywords
- number of authors
- author institution
- author country
-
will require keyword data
-
how similar (in multivariate space, by keywords) are citing papers to the papers they cite?
-
figure 2 from proposal, topic cooccurrence network
-