-
Notifications
You must be signed in to change notification settings - Fork 4
/
section_Discussion_begin_figure_centering__.tex
43 lines (35 loc) · 3.93 KB
/
section_Discussion_begin_figure_centering__.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
\section{Discussion}
\begin{figure}
\centering
\includegraphics[width=.5\linewidth]{../licenses}
\caption{\label{licenses} Distribution of open source licenses used in cataloged software packages.}
\end{figure}
\begin{figure*}
\centering
\begin{subfigure}[t]{.4\linewidth}
\centering
\includegraphics[width=\linewidth]{../develop}
\caption{\label{develop}}
\end{subfigure}
\hfill
\begin{subfigure}[t]{.4\linewidth}
\centering
\includegraphics[width=\linewidth]{../usage}
\caption{\label{usage}}
\end{subfigure}
\caption{\label{pies} Activity distributions of cataloged software packages.
(\subref{develop}) Distribution of development activity. (\subref{usage}) Distribution of user activity.
}
\end{figure*}
\begin{figure}
\centering
\includegraphics[width=.5\linewidth]{../citedist}
\caption{\label{cites} Distribution of citations as reported by Google Scholar generated on average every year by those software packages with citeable publications.}
\end{figure}
We have cataloged over 200 open-source packages for molecular modeling that provide a wide range of capabilities. As shown in Figure~\ref{licenses}, the most popular license (55\%) is some variant of the copyleft GNU Public License, which ensures that derivative works remain open source. Interestingly 78\% of the packages cataloged have a corresponding citeable publication which suggests that much of the software originates from academia. The distribution of average citations generated a year (as reported by Google Scholar) for the citeable publications is shown in Figure~\ref{cites}. A significant majority (84\%) of publications generate at least one citation a year, 29\% generate at least 10 citations, and 8\% generate more than 100 citations a year on average.
A substantial portion of the packages cataloged are under active development and see significant usage, as shown in Figure~\ref{pies}. We rated 56\% of the packages as `A' level development, meaning major features or releases were made within the last 18 months, and 51\% see substantial usage (rank 1).
There are a number of projects (30\%) where development has apparently ceased (no changes within the last 18 months). Note that our methodology for identifying packages
ignores cases where software is no longer available, this is an underestimate.
However, although we did find instances where an open source package was referenced in a paper but was no longer available, we did not find this to be a common occurrence. Most packages, even those that have remained unchanged for a decade, see some usage. In fact, a number of packages (23), still see significant usage despite having received no development for the past 18 months. This underlies the importance of releasing source code through a third-party site such as SourceForge or GitHub as it ensures the continued existence of a project.
A major advantage of open source is that in cases where a popular project is not being actively developed (e.g. AutoDock Vina \cite{Trott_2009}) new projects can fork the source code and continue development (e.g. smina \cite{Koes_2013}). However, a potential problem area with open source development is the lack of central coordination and efficient pooling of resources. For example, there are several forks of AutoDock Vina that improve it's performance on computing clusters and there are an array of tools in several categories that effectively perform the same task. This underscores the importance of efforts like Blue Obelisk \cite{Guha_2006,blueobelisk} and Open Chemistry (\url{http://www.openchemistry.org}) which foster collaboration among open source cheminformatics projects.
It is clear that open source software plays an important role in the scientific community and is a vibrant sub-community of its own with a wide assortment of projects under development and in widespread use. The open source software packages cataloged here provide launching points for the development of new tools for enabling further scientific discovery.