The CVSAnalY tool extracts information out of source code repository logs and stores it into a database.
- Get pip:
sudo easy_install pip
- Use pip:
pip install "https://github.com/SoftwareIntrospectionLab/cvsanaly/tarball/master#egg=master"
Note for upgraders: CVSAnalY now uses setuptools for installation. Depending on your PYTHONPATH, the old CVSAnalY might not be removed (or worse, override this release). Please check for and remove old installations before installing this version.
CVSAnalY has the following dependencies:
-
Python 2.5 or higher
-
RepositoryHandler (this needs to be placed in your PYTHONPATH)
git clone https://github.com/SoftwareIntrospectionLab/repositoryhandler.git
-
Guilty (optional. Required for the Blame or HunkBlame extensions, also needs to be discoverable in the PYTHONPATH)
git clone http://github.com/SoftwareIntrospectionLab/guilty.git
-
CVS (optional. Required for CVS support. Make sure to read the "SCM Support" section.)
-
Subversion (optional. Required for SVN support. Make sure to read the "SCM Support section.)
-
Git (optional. Required for Git support. Must be >= 1.7.4 for HunkBlame extension to work)
-
Python MySQLDb (optional, but of course required if you wish to actually use MySQL as your database engine!)
-
python-progressbar (http://code.google.com/p/python-progressbar/)
-
Pygments (optional. Required for extension HunkBlame with the option --hb-ignore-comments. This needs to be placed in your PYTHONPATH)
You don't need to do anything if you are happy using CVSAnalY from the path you downloaded it to. This is easiest if you intend on staying up-to-date with our releases from our Git repositories. You can also move the directory around to wherever you wish.
If you want to install it to a system location, you can do this by running the setup.py
script:
python setup.py install
If you do this, you'll need to remember to run this every time you get a new release.
If you don't have root privledges, you can just add CVSAnalY to your $PATH (cvsanalydir is the directory where CVSAnalY is installed):
export PATH=$PATH:cvsanalydir
CVSAnalY needs RepositoryHandler. If it is not installed in the usual path for Python packages, PKG_CONFIG_PATH should include the directory where it is installed (repohandlerdir is the path where RepositoryHandler is installed):
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:repohandlerdir
You are now ready to use CVSAnalY!
Just checkout (from Git/SVN/CVS) to obtain a local
version of your repository, and then run cvsanaly2
.
Here's an example using Voldemort
$ git clone git://github.com/voldemort/voldemort.git ~/Downloads/voldemort
$ cd ~/Downloads/voldemort
$ ~/Downloads/voldemort$ cvsanaly2
More options, and a more detailed info about the options, can be
found by running cvsanaly2 --help
.
Just checkout (from Git/SVN/CVS) to obtain a local
version of your repository, and then run cvsanaly2
, pointing to where you downloaded it.
Here's an example using Voldemort:
$ git clone git://github.com/voldemort/voldemort.git ~/Downloads/voldemort
$ cd [where you downloaded CVSAnalY to]
[CVSAnalY directory]$ ./cvsanaly2 ~/Downloads/voldemort
More options, and a more detailed info about the options, can be
found by running ./cvsanaly2 --help
.
At this point in time, only Git is fully tested and supported across all of CVSAnalY and its extensions. SVN is a "best effort" basis: things shouldn't break using SVN, but if they do, you're unlikely to have anyone respond to a bug tracker issue without a pull request patch.
CVSAnalY was originally created to support CVS and SVN. Git support appeared later, and Bazaar support was started but abandoned. As development has continued, it has become clear that Git represents the best possibilities for data mining source code repositories. Because Git allows all the source history to be downloaded to local storage, CVSAnalY actions are orders of magnitude faster. For example, the Content extension can get every revision of a file. With CVS and SVN, this requires sending the request to the central server, have the server (slowly) process it, and then get the content back. We've found that operations which take hours on Git can take weeks with SVN.
If you have an SVN repository that you want to mine, but you can't find a Git mirror for it, we've had good success with svn2git.
Sometimes, a lot of data can pass between CVSAnalY and MySQL, and packet limits are set too small.
Follow the instructions here.
This happens because Python is trying to print out a Unicode string to a terminal that has told Python it only supports ASCII. You can coerce Python into printing Unicode by setting up your sitecustomize.py.
CVSAnalY is developed by the GSyC/LibreSoft group at the Universidad Rey Juan Carlos in Móstoles, near Madrid (Spain). It is part of a wider research on libre software engineering, aimed to gain knowledge on how libre software is developed and maintained.
CVSAnalY is actively contributed to by the Software Introspection Lab at University of California, Santa Cruz, and hosts Git mirrors at https://github.com/SoftwareIntrospectionLab . UCSC can review pull requests and bug reports using GitHub's systems. This is currently more active than the official LibreSoft repository ecosystem, and may be more likely to have your issue reviewed.
- Carlos Garcia Campos, [email protected]
- Gregorio Robles, [email protected]
- Alvaro Navarro, [email protected]
- Jesus M. Gonzalez-Barahona, [email protected]
- Israel Herraiz, [email protected]
- Juan Jose Amor, [email protected]
- Martin Michlmayr, [email protected]
- Alvaro del Castillo, [email protected]
- Santiago Duenas, [email protected]
- Chris Lewis, (Lewisham on GitHub) [email protected]
- Zhongpeng Lin, (linzhp on GitHub) [email protected]
- Alexander Pepper, (apepper on GitHub) [email protected]