Skip to content

oriolpich/nucleosome-periodicity

Repository files navigation

README

This folder contains the code used for 10.1016/j.cell.2018.10.004. If you use this code in a publication, please cite:

Citation

Oriol Pich, Ferran Muiños, Radhakrishnan Sabarinathan, Iker Reyes-Salazar, Abel Gonzalez-Perez, Nuria Lopez-Bigas, Somatic and germline mutation periodicity follow the orientation of the DNA minor groove around nucleosomes, Cell (2018) doi: 10.1016/j.cell.2018.10.004

The exact version for reproducing the results is under the Tag Paper. Further improvements in the code can be found in the master branch.

A brief description of the structure of this repo:

Each folder contains a notebook with a brief description and the requirements (notebooks that need to be executed).

Running this software

These analysis have been perform using software in Python, R and GNU bash.

We have created a set of Jupyter notebooks that you can run if you are interested in re-running partially or totally our analysis. In each notebook you will find further details for running them.

Requirements

To be able to run those notebooks you need to have the following software installed (we also indicate the version so you can reproduce the exact same results):

Python (3.5.6) Packages:

Python (2.7.15) Packages:

R (3.4.3) packages:

Other software:

In addition, we have created a Python package named nucperiod that contains a set of python scripts that we have used during our analysis. In can be installed with pip:

cd nucperiod
pip install .

For some of the analyses (those where CrossMap, DeconstructSigs and SigFit are involved) we already prepared three conda environments:

  • env_crossmap environment for Crossmap as it is a Python 2.7 tool (you can create it with env_crossmap.yml)
  • env_deconstructsigs environment for the deconstructSigs R package (use the env_deconstructsigs.yml to replicate it)
  • env_nucperiod_sigfit environment for the SigFit R package. Please, note that the package is not installed in that environment and you need to install it manually

Notes

Most of the intermediate files generated while running any notebook are most likely not used for further analysis. However, we have decided not to remove them so you can check them if needed.

Compressing most of the files is not needed, however, we have decided to do that in order to save disk space.

The scripts that you can find in the scripts directories are documented for further info. If you want to check which parameters each script accepts, use the --help flag (python <script> --help).

Fixing datasets versions

This project makes use of datasets available thought the bgdata. This package will try to download the latest version, however, you can fix the version of these datasets easily. After installing the package, update the file ~/.bbglab/bgdata.conf to add the following lines:

[datasets/genomereference/hg19]
build = 20150724
[datasets/genomereference/tair10]
build = 20180810
[datasets/genomereference/saccer3]
build = 20180720
[datasets/genomereference/dm3]
build = 20180904
[datasets/genomereference/mm9]
build = 20171103

[1](1, 2, 3, 4, 5, 6, 7) This software was not installed within a conda environment.
[2]This package has been installed in a separate environment named as env_crossmap
[3]This package has been installed in a separate environment named as env_deconstructsigs
[4]This package has been installed in a separate environment named as env_sigfit