Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data sharing & licensing data #12

Open
strasser opened this issue Dec 20, 2016 · 7 comments
Open

Data sharing & licensing data #12

strasser opened this issue Dec 20, 2016 · 7 comments
Labels

Comments

@strasser
Copy link
Contributor

Discussion/hacking on using open data and sharing data for others to use

@dhimmel
Copy link
Contributor

dhimmel commented Dec 20, 2016

I'm interested and can discuss the importance of openly licensing data.

@lederman
Copy link
Member

lederman commented Dec 20, 2016

I suggest we add to this discussion the reasons why people don't share code and data, and see for which problems we have solutions, and which are legitimate concerns.

We can also talk about the "dark pool" of failed experiments/ failed processing methods. If we are going beyond the paper into the raw work that led to it, these things are as important as the successful methods, but they are even more difficult to document and share. I have published some examples of cases where I get one of my algorithms to fail, but it is the first thing that I have to cut when I summarize the technical report to a paper. I have seen some reviews on such things (probably the most important paper I've seen in one area) - but these are done later, usually refer to other people's work, and they are rare. I wonder if people found good ways to share these "failed" experiments/ pipelines.
To put in a larger context, this is related to how other disciplines use "cases" and "failures" to learn, and to some dangers in open benchmarks.

In the context of algorithms (and data associated with the algorithms), I mentioned CodeOcean in the previous meeting, I've been participating in their closed beta. As a user (reader) of a paper/ algorithm, you can just see all the data and code online, change it and run it without installing anything locally. I asked them for invitations for the participants, would there be a good way to distribute those invitations to anyone who is interested? (this part is also related to topics #11 and #6 )

@dhimmel
Copy link
Contributor

dhimmel commented Dec 20, 2016

@lederman I read a bit about code ocean -- it seems more focused on code and not data. I'm also interested in the reproducibility of computation. But, I'm not too keen on spending conference time to learn how to use a service that's not currently publicly available.

@lederman
Copy link
Member

@dhimmel Regarding conference time and codeocean: it will be public soon, but I agree - that's why I haven't asked to present a demo, just to share the invitation with anyone who wants to try it before it is public.

Regarding data vs. code vs. reproducibility: I agree that codeocean is not a generic data repository or data exploration tool. In addition to reproducibility, I think that gives you an easy way to start working with published data because you can easily play with an entire pipeline used to generate the data, which is why I mention it in this context.
I wouldn't mind classifying these issues under other categories.

@lederman
Copy link
Member

@dhimmel it turns out that there are two codeoceans.... I'm talking about a different ocean out of Cornell Tech, the information about them isn't public yet. Sorry, I didn't notice the link you sent so I just realized this now.

@strasser
Copy link
Contributor Author

turning into licensing and having daniel lead

  • We are trying to find ways to share data without knowing about the content
  • Trying to get open, reproducible data. What if no one wants it? How to make sure it’s discoverable?
  • If you are using someone else’s data, you need to know about licensing.
  • What’s the sweet spot for sharing and still getting credit

@strasser strasser changed the title Data sharing & using open data Data sharing & licensing data Jan 20, 2017
@strasser
Copy link
Contributor Author

Another angle: Ethics of Data Science (and data sharing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants