Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the post-creation life cycle of software produced for your research? #6

Open
faokryn opened this issue Sep 27, 2016 · 5 comments

Comments

@faokryn
Copy link
Contributor

faokryn commented Sep 27, 2016

Please describe the typical life cycle of software produced for your research, after creation and use. Is availability to and reusability by other researchers considered? How was the software made available to others, i.e. how was it hosted? If such software has been used by other researchers, what are some of the ways they've cited it, if at all?

More Discussion Questions

@aurelg
Copy link

aurelg commented Sep 28, 2016

From my own experience in bioinformatics, here are the most common situations:

  • If the software is very specific to the research work that has been published, it is then abandoned (part of it could certainly be reused, but often aren't because of bad design/implementation practices, mostly because of lack of interest)
  • If the software is still scientifically useful:
    • If it provides its author(s) with a competitive advantage in terms of potential publications and funding, it is not distributed
    • If its author(s) want other scientists to be able to use it (very important for academic recognition), but not to own it, it is made available behind a website/webservice
    • If it is opensource, then it is usually made available for download on the lab page.

Imagine you are a PhD student and write a successful program, mostly alone: you publish the associate scientific results. After publication, you may either keep your program for yourself (or whoever has IP), or be free to distribute it.

  • If you keep it for yourself, your software won't have any impact in the scientific community outside of its original lab, which is less than satisfying both for you and for your career.
  • If you distribute it, you have to publish its source code too so that others can trust it, and tweak it. This might be psychologically difficult because scientists are not developers and usually know the way the source code is written may impact their image badly.
  • If you have money, you can afford to spend time to create a website/webservice and hope you’ll get users and good reviews. Otherwise, it might be the end of your career.

Anyway, if you make it available, you’ll soon have to answer requests for feature A or B, and bugs Y or X. The first will slow down your own work, and the latter might reveal flaws in your research and impact your career. The more successful you are, the more exposed you are, and the more vulnerable you are. This is kinda curse, especially if you’re not promoted quickly.

One day, you are eventually paid to work on another project. If your code is not opensource with an active community able to take over, then you'll have to choose: either you keep maintaining it or you do whatever you are paid for, or you do both (plus a burn-out). This is just not sustainable, and I’ve seen so many renowned software that are just abandoned and/or crippled of bugs no one knows how to fix.

IMHO, this situation comes from how research is organized. Group leaders are promoted because of their scientific abilities. This focus on scientific results let most PhD students and even postdocs think that quick & dirty methods are good enough. There is consequently no incentive to build a culture of sustainable software, explains the lack of support for scientific software development, and why scientific software are so broken.

Hopefully, as far as persona/lab branding/citations are concerned, the best strategy seems to be to publish the scientific results first, and then a methodological paper comes for free, and finally another article to advertise for the website/webservice once it's available. If the website is widely used, you can then write an article every other year. This is by far the most rewarding strategy in the long run in terms of publications, and can even allow for sustainable funding of the software.

If this strategy can’t be implemented - or in combination with it anyway - then the safest is certainly to release your software with a permissive (ie. free) license, and hope someone will take over when you’ll leave for other horizons.

Well, that was the quick (!) answer. More to come if you're interested :-)

@Ourobor
Copy link
Contributor

Ourobor commented Oct 7, 2016

@aurelg I had no idea academia was so cutthroat! I am still just an undergrad and I guess my picture of graduate work is a bit flawed. I do understand the fear of releasing flawed code though as it plagued me for quite a while in my first few years of undergrad.

One of the the things I was thinking about when I was reading the paper we used as inspiration for the project is the idea that citing software created during research allows it to be included in the peer review process. Would a focus on getting research code peer-reviewed and "legitimatized" be a good way to encourage code to be published?

Alternatively, the goal of Software Citation Tools is to cite software. We're really looking to collect metadata for the software being used and make it available. Perhaps adding an option to cite the software, with it's metadata including authors, purpose, etc, without linking to the source code would be useful?

@aurelg
Copy link

aurelg commented Dec 17, 2016

I think it's a good idea to promote software citations. Most citations however don't refer to the source code itself - which may not be available -, but to a reference paper which describes the first historical algorithm. There are usually some variations:

  • between the reference algorithm and the one that is really implemented (which was usually too complex to describe in details)
  • between the real algorithm and its implementation (which may be buggy and full of hack and optimization which are likely to have side effects)
  • between the algorithm at a given time and and the code base, which evolves over time (new scientific features, bugfixes, optimization)
  • between different binaries, generated by different compilers (I remember a bug, not covered by any test, which was completely silent and nevertheless caused different numerical results between the x86 and amd64 binary, in the CHARMM software)

I guess a nice way to promote software citation would be to create a scientific software repository with ready-to-use docker images. A bit like this initiative. Then, citing software would be easy and safe. However, I don't think it can be achieved, as research software are usually bound to solving scientific issues that drive their evolution much more than the need to publish it and/org achieve reproducibility.

@Ourobor I'm not talking about academia as a whole, only my experience. YMMV (and I hope so).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@faokryn @Ourobor @aurelg and others