Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data publishing for data from third-parties that don't allow data sharing #33

Open
tkuhn opened this issue Feb 11, 2017 · 1 comment
Open

Comments

@tkuhn
Copy link
Member

tkuhn commented Feb 11, 2017

How should we deal with studies that use data from third parties like Twitter that don't allow for data sharing? According to the PLOS guidelines (http://journals.plos.org/plosone/s/data-availability), which we are following for now, it seems that such studies couldn't be published (though there are recent PLOS One articles on Twitter studies...). The publication of aggregated, post-processed data (e.g. data points in a plot) should always be possible though. So it seems we have the following options:

  1. Allow exceptions to data availability requirements for cases of third-party data with terms of service that don't allow for data sharing
  2. Interpret data availability requirements in a loose way that is compliant with just releasing aggregated post-processed data (e.g. data points in a plot)
  3. Don't allow for exceptions and apply data requirements in a strict way: We won't be able to publish Twitter studies in this case
  4. Are there other options?

Which one should we follow? I am undecided...

@micheldumontier
Copy link

I believe that we need to accommodate the fact that not all data can be shared in a public manner. For instance, we use anonymized patient data in which we are prohibited from sharing, and there are strict restrictions on their availability beyond the approved users, which are in some cases purely members of the medical center. In such cases reproducibility can only be through collaboration, but this cannot be guaranteed owing to the burden that it places on individual investigators. These are real problems that cannot be ignored when it comes to reproducibility. Should we exclude such studies? I don't think so. As we drafted the FAIR principles [1], we specifically recognize that the essential aspect here is that the mechanism by which data can be accessed must simply be made explicit. Therefore, the right solution to this complex real world problem is that there is sufficient documentation that describes the proper mechanism, if any.
However, I would argue that if the reviewers raise serious doubts regarding the validity of results and the data cannot be made available to the reviewers, then these are grounds for rejection where agreed by both the managing editor and the assigned editor in chief.

[1] http://www.nature.com/articles/sdata201618

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants