This folder contains the code used for 10.1016/j.cell.2018.10.004. If you use this code in a publication, please cite:
Citation
Oriol Pich, Ferran Muiños, Radhakrishnan Sabarinathan, Iker Reyes-Salazar, Abel Gonzalez-Perez, Nuria Lopez-Bigas, Somatic and germline mutation periodicity follow the orientation of the DNA minor groove around nucleosomes, Cell (2018) doi: 10.1016/j.cell.2018.10.004
The exact version for reproducing the results is under the Tag Paper. Further improvements in the code can be found in the master branch.
A brief description of the structure of this repo:
- accessibility: accessibility data analysis
- ancestral: ancestral states analysis
- damage: damage data (UV and NMP) analysis
- figures: code to generate the figures and tables for the paper
- germline: germline data analysis
- increase: code for the increase of mutation rate analysis
- mutations: mutational data analysis
- nucleosomes: computation of the dyads positions
- periodicity: WW periodicity analysis
- rotational: rotational classification of the nucleosomes
- signatures: analysis of the signatures of the mutational data
- simulation: simulation
Each folder contains a notebook with a brief description and the requirements (notebooks that need to be executed).
These analysis have been perform using software in Python, R and GNU bash.
We have created a set of Jupyter notebooks that you can run if you are interested in re-running partially or totally our analysis. In each notebook you will find further details for running them.
To be able to run those notebooks you need to have the following software installed (we also indicate the version so you can reproduce the exact same results):
Python (3.5.6) Packages:
- ipykernel (4.8.2)
- numpy (1.15.1)
- pandas (0.23.4)
- scipy (1.1.0)
- matplotlib (2.2.3)
- statsmodels (0.9.0)
- click (6.7)
- tqdm (4.25.0)
- intervaltree (2.1.0)
- lmfit (0.9.11)
- rpy2 (2.7.8)
- bgreference (0.5)
- xlrd (1.1.0)
Python (2.7.15) Packages:
R (3.4.3) packages:
- deconstructSigs (1.8.0) [3]
- sigfit (1.0.0) [4]
Other software:
In addition, we have created a Python package named nucperiod
that contains a set of
python scripts that we have used during our analysis.
In can be installed with pip:
cd nucperiod
pip install .
For some of the analyses (those where CrossMap, DeconstructSigs and SigFit are involved) we already prepared three conda environments:
env_crossmap
environment for Crossmap as it is a Python 2.7 tool (you can create it withenv_crossmap.yml
)env_deconstructsigs
environment for the deconstructSigs R package (use theenv_deconstructsigs.yml
to replicate it)env_nucperiod_sigfit
environment for the SigFit R package. Please, note that the package is not installed in that environment and you need to install it manually
Most of the intermediate files generated while running any notebook are most likely not used for further analysis. However, we have decided not to remove them so you can check them if needed.
Compressing most of the files is not needed, however, we have decided to do that in order to save disk space.
The scripts that you can find in the scripts
directories
are documented for further info.
If you want to check which parameters
each script accepts, use the --help
flag
(python <script> --help).
This project makes use of datasets available thought the
bgdata.
This package will try to download the latest version,
however, you can fix the version of these datasets easily.
After installing the package, update the file
~/.bbglab/bgdata.conf
to add the following lines:
[datasets/genomereference/hg19] build = 20150724 [datasets/genomereference/tair10] build = 20180810 [datasets/genomereference/saccer3] build = 20180720 [datasets/genomereference/dm3] build = 20180904 [datasets/genomereference/mm9] build = 20171103
[1] | (1, 2, 3, 4, 5, 6, 7) This software was not installed within a conda environment. |
[2] | This package has been installed in a separate environment
named as env_crossmap |
[3] | This package has been installed in a separate environment
named as env_deconstructsigs |
[4] | This package has been installed in a separate environment
named as env_sigfit |