-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data sharing & licensing data #12
Comments
I'm interested and can discuss the importance of openly licensing data. |
I suggest we add to this discussion the reasons why people don't share code and data, and see for which problems we have solutions, and which are legitimate concerns. We can also talk about the "dark pool" of failed experiments/ failed processing methods. If we are going beyond the paper into the raw work that led to it, these things are as important as the successful methods, but they are even more difficult to document and share. I have published some examples of cases where I get one of my algorithms to fail, but it is the first thing that I have to cut when I summarize the technical report to a paper. I have seen some reviews on such things (probably the most important paper I've seen in one area) - but these are done later, usually refer to other people's work, and they are rare. I wonder if people found good ways to share these "failed" experiments/ pipelines. In the context of algorithms (and data associated with the algorithms), I mentioned CodeOcean in the previous meeting, I've been participating in their closed beta. As a user (reader) of a paper/ algorithm, you can just see all the data and code online, change it and run it without installing anything locally. I asked them for invitations for the participants, would there be a good way to distribute those invitations to anyone who is interested? (this part is also related to topics #11 and #6 ) |
@lederman I read a bit about code ocean -- it seems more focused on code and not data. I'm also interested in the reproducibility of computation. But, I'm not too keen on spending conference time to learn how to use a service that's not currently publicly available. |
@dhimmel Regarding conference time and codeocean: it will be public soon, but I agree - that's why I haven't asked to present a demo, just to share the invitation with anyone who wants to try it before it is public. Regarding data vs. code vs. reproducibility: I agree that codeocean is not a generic data repository or data exploration tool. In addition to reproducibility, I think that gives you an easy way to start working with published data because you can easily play with an entire pipeline used to generate the data, which is why I mention it in this context. |
@dhimmel it turns out that there are two codeoceans.... I'm talking about a different ocean out of Cornell Tech, the information about them isn't public yet. Sorry, I didn't notice the link you sent so I just realized this now. |
turning into licensing and having daniel lead
|
Another angle: Ethics of Data Science (and data sharing) |
Discussion/hacking on using open data and sharing data for others to use
The text was updated successfully, but these errors were encountered: