-
Notifications
You must be signed in to change notification settings - Fork 1
Setting up
This page describes how to set up all necessary software so that all the analyses described in this wiki can be run. We will install everything in $PREFIX
, which you will have to set, eg. export PREFIX=$HOME
.
Set these environment variables.
export PATH=$PREFIX/bin:$PATH
export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
export R_LIBS_USER=$PREFIX/Rlibrary
export PYTHONPATH=$PREFIX/lib/python2.7
export PKG_CONFIG_PATH=$PREFIX/lib/pkgconfig:$PREFIX/share/pkgconfig:$PKG_CONFIG_PATH
You must have a fairly recent version of gcc
in your $PATH
, and the shared libraries for mpc
, mpfr
, and gmp
in your $LD_LIBRARY_PATH
. Installing these is outside the scope of this document. If you are on genesis, they are already installed, you just need to do the following.
export PATH=/gsc/software/linux-x86_64-centos5/gcc-4.9.1/bin:$PATH
export LD_LIBRARY_PATH=/gsc/software/linux-x86_64-centos5/mpc-1.0.2/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/gsc/software/linux-x86_64-centos5/mpfr-3.1.2/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/gsc/software/linux-x86_64-centos5/gmp-6.0.0a/lib:$LD_LIBRARY_PATH
Add this line to your $HOME/.Rprofile (create this file if it doesn't already exist).
options(bitmapType="cairo")
-
Download and install sqlite3.
wget http://www.sqlite.org/2014/sqlite-autoconf-3080703.tar.gz tar xf sqlite-autoconf-3080703.tar.gz cd sqlite-autoconf-3080703 ./configure --prefix=$PREFIX make make install cd ..
-
Download and install OpenSSL.
wget https://www.openssl.org/source/openssl-1.0.1j.tar.gz tar xf openssl-1.0.1j.tar.gz cd openssl-1.0.1j CFLAGS='-fPIC' ./config shared --prefix=$PREFIX make make install cd ..
-
Download Python 2.
wget https://www.python.org/ftp/python/2.7.8/Python-2.7.8.tgz tar xf Python-2.7.8.tgz cd Python-2.7.8
-
Edit
Modules/Setup.dist
. Uncomment the lines defining SSL, and change the location to$PREFIX
. For example, if your$PREFIX
is/home/whoami
, it would be as follows.SSL=/home/whoami _ssl _ssl.c \ -DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \ -L$(SSL)/lib -lssl -lcrypto
Then copy the file to
Modules/Setup
cp Modules/Setup.dist Modules/Setup
-
Compile Python. You must use these exact options to
configure
to get all the dependencies to work../configure --prefix=$PREFIX --enable-shared --enable-unicode=ucs4 make make install cd ..
-
Install pip.
wget --no-check-certificate https://bootstrap.pypa.io/get-pip.py python get-pip.py
-
Install ruffus.
wget -O ruffus-2.4.1.tar.gz https://github.com/bunbun/ruffus/archive/v2.4.1.tar.gz tar xf ruffus-2.4.1.tar.gz cd ruffus-2.4.1 python setup.py install --prefix=$PREFIX cd ..
-
Install lockfile and pysam with pip.
pip install lockfile pip install pysam
-
Download bamUtils.py and mafUtils.py from here and add their location to your
$PYTHONPATH
. -
Install the BioPython library. You will get a warning about not having NumPy, which can be ignored, since we're only using the IO features.
wget http://biopython.org/DIST/biopython-1.64.tar.gz tar xf biopython-1.64.tar.gz cd biopython-1.64 python setup.py install --prefix=$PREFIX cd ..
-
Download and install tar (yes, your system probably has tar already, but you need a newer version to deal with xzips).
wget http://ftpmirror.gnu.org/tar/tar-latest.tar.gz tar xf tar-latest.tar.gz cd tar-1.28 # your folder may be a more recent version ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install XZ Utils.
wget http://tukaani.org/xz/xz-5.0.7.tar.gz tar xf xz-5.0.7.tar.gz cd xz-5.0.7 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install pixman.
wget http://cairographics.org/releases/pixman-0.32.6.tar.gz tar xf pixman-0.32.6.tar.gz cd pixman-0.32.6 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install xproto.
wget ftp://mirror.csclub.uwaterloo.ca/x.org/current/src/everything/xproto-7.0.23.tar.gz tar xf xproto-7.0.23.tar.gz cd xproto-7.0.23 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install xcb.
wget http://xcb.freedesktop.org/dist/xcb-proto-1.11.tar.gz tar xf xcb-proto-1.11.tar.gz cd xcb-proto-1.11 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install libpthread_stubs.
wget http://xcb.freedesktop.org/dist/libpthread-stubs-0.3.tar.gz tar xf libpthread-stubs-0.3.tar.gz cd libpthread-stubs-0.3 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install libxcb.
wget http://xcb.freedesktop.org/dist/libxcb-1.11.tar.gz tar xf libxcb-1.11.tar.gz cd libxcb-1.11 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install xtrans.
wget ftp://mirror.csclub.uwaterloo.ca/x.org/current/src/everything/xtrans-1.2.7.tar.gz tar xf xtrans-1.2.7.tar.gz cd xtrans-1.2.7 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install libX11.
wget ftp://mirror.csclub.uwaterloo.ca/x.org/current/src/everything/libX11-1.5.0.tar.gz tar xf libX11-1.5.0.tar.gz cd libX11-1.5.0 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install xextproto.
wget ftp://mirror.csclub.uwaterloo.ca/x.org/current/src/everything/xextproto-7.2.1.tar.gz tar xf xextproto-7.2.1.tar.gz cd xextproto-7.2.1 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install libXext.
wget http://www.x.org/releases/X11R7.7/src/everything/libXext-1.3.1.tar.gz tar xf libXext-1.3.1.tar.gz cd libXext-1.3.1 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install cairo.
wget http://cairographics.org/releases/cairo-1.12.18.tar.xz tar xf cairo-1.12.18.tar.xz cd cairo-1.12.18 ./configure --prefix=$PREFIX && make && make install cd ..
-
Download and install R. For some reason, it doesn't compile properly with the newer GCC, so you should reset your
$PATH
so thatwhich gcc
returns/usr/bin/gcc
. Put the$PATH
back to the way it was after this step is finished.wget http://cran.rstudio.com/src/base/R-3/R-3.1.2.tar.gz tar xf R-3.1.2.tar.gz cd R-3.1.2 ./configure --with-x=no --prefix=$PREFIX && make && make install cd ..
-
Install these R packages. Do this from the R console.
install.packages("foreach") install.packages("doMC") install.packages("parallel") install.packages("getopt") install.packages("numDeriv") source("http://bioconductor.org/biocLite.R") biocLite("HMMcopy") biocLite("IRanges") biocLite("GenomeInfoDb") biocLite("Rsamtools") biocLite("SNPchip") biocLite("TitanCNA") biocLite("DNAcopy")
-
Download and install ABSOLUTE.
wget http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/absolute/ABSOLUTE_1.0.6.tar.gz R CMD INSTALL ABSOLUTE_1.0.6.tar.gz
-
Download and install ExomeCNV.
wget http://cran.r-project.org/src/contrib/Archive/ExomeCNV/ExomeCNV_1.4.tar.gz R CMD INSTALL ExomeCNV_1.4.tar.gz
-
Download and install mafft.
wget http://mafft.cbrc.jp/alignment/software/mafft-7.213-without-extensions-src.tgz tar xf mafft-7.213-without-extensions-src.tgz cd mafft-7.213-without-extensions/core sed -i s_"PREFIX = /usr/local"_"PREFIX = $PREFIX"_ Makefile make make install cd ../..
-
Download and install samtools/bcftools. You must use a samtools verison whose number starts with 0. Verisons 1.0 and above will not work with TITAN.
wget -O samtools-0.1.20.tar.gz https://github.com/samtools/samtools/archive/0.1.20.tar.gz tar xf samtools-0.1.20.tar.gz cd samtools-0.1.20 sed -i -e 's/-D_CURSES_LIB=1/-D_CURSES_LIB=0/' -e 's/LIBCURSES=/#LIBCURSES=/' Makefile # disable ncurses make mv samtools $PREFIX/bin cd bcftools make mv bcftools $PREFIX/bin cd ../..
-
Download and install bowtie. You must use version 1, bowtie2 will not work.
wget -O bowtie-1.1.1.zip http://sourceforge.net/projects/bowtie-bio/files/bowtie/1.1.1/bowtie-1.1.1-linux-x86_64.zip/download unzip bowtie-1.1.1.zip mv bowtie-1.1.1/bowtie* $PREFIX/bin cd ..
-
Download GATK. First go to this page and sign up for an account. Once you have an account, go to the download page and click "GATK" (you don't need Queue). Accept the license agreement.
Once you have downloaded the archive, extract it and move the jar file to
$PREFIX/bin
.tar xf GenomeAnalysisTK-3.3-0.tar.bz2 rm -rf resources mv GenomeAnalysisTK.jar $PREFIX/bin
-
Download and install tabix.
wget -O htslib-1.1.tar.gz https://github.com/samtools/htslib/archive/1.1.tar.gz tar xf htslib-1.1.tar.gz cd htslib-1.1 make mv tabix $PREFIX/bin cd ..
-
Install Picard Tools.
wget -O picard-tools-1.127.zip https://github.com/broadinstitute/picard/releases/download/1.127/picard-tools-1.127.zip unzip picard-tools-1.127.zip mv picard-tools-1.127/picard.jar $PREFIX/bin
-
Download and install RAxML.
wget -O standard-RAxML-8.1.13.tar.gz https://github.com/stamatak/standard-RAxML/archive/v8.1.13.tar.gz tar xf standard-RAxML-8.1.13.tar.gz cd standard-RAxML-8.1.13 make -f Makefile.gcc mv raxmlHPC $HOME/bin cd ..
-
Download the TITANRunner pipeline.
wget http://compbio.bccrc.ca/files/2013/07/TITANRunner-0.1.1.zip unzip TITANRunner-0.1.1.zip
-
Copy config_default.cfg to config.cfg, and edit it to match your local environment.
-
Change
TITANRunner/scripts/filter_chromosomes.py
to discard the mitochondrial chromosome, by changing line 8 fromif 'GL' in line or 'gl' in line or 'hap' in line:
to
if 'GL' in line or 'gl' in line or 'hap' in line or 'M' in line:
-
Change line 55 in
TITANRunner/scripts/count.py
fromresult += "\t" + ref_base + "\t" + str(ref_count) + "\tX\t" + str(alt_count) + '\t' + position_data[1]
to
result += "\t" + ref_base + "\t" + str(ref_count) + "\tX\t" + str(alt_count)
-
Optional to discard the Y chromosome (which can sometimes cause problems, and won't work with samples from female patients), change line 56 of
TITANRunner/scripts/titan.R
fromdata <- filterData(data,c(1:22,"X","Y"),minDepth=10,maxDepth=200,map=mScore,mapThres=0.8)
to
data <- filterData(data,c(1:22,"X"),minDepth=10,maxDepth=200,map=mScore,mapThres=0.8)
-
Optional If running TITAN on WES data (rather than WGS), follow step 7 of the official documentation, namely editing
scripts/correctReads.R
to use your exon co-ordinates. In this example, we'll use the fileSureSelect_regions.list
.$ head -n 3 SureSelect_regions.list 1:30275-30395 1:69069-70029 1:367647-368607
Add some code to
correctReads.R
to parse these exon co-ordinates.exons <- read.table("/extscratch/morinlab/shared/common/SureSelect_regions.list", stringsAsFactors=F) split <- strsplit(exons[,1], "[:-]") chr <- factor(sapply(split, "[[", 1), levels=c(1:22, "X", "Y")) start <- as.integer(sapply(split, "[[", 2)) stop <- as.integer(sapply(split, "[[", 3)) exons <- data.frame(chr=chr, start=start, stop=stop)
If your exon co-ordinates are in a different format (a BED file, for example), the code to parse them will be different. In any case, you must end up with a data.frame with three columns:
chr
,start
, andend
. Then replace the linecnData <- ...
withcnData <- correctReadDepth(tumWig, normWig, gc, map, targetedSequence = exons)
We'll be putting all these files in $PREFIX/reference
, so before starting, cd
there.
-
Obtain a copy of the human genome. This should be the same reference genome that your BAM files were aligned to. Create an index for it with samtools.
wget http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa samtools faidx GRCh37-lite.fa
-
Select only the 24 ordinary chromosomes from the FASTA file, and reindex it. This will help eliminate errors caused by mismatching chromosome names.
samtools faidx GRCh37-lite.fa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y > tmp.fa mv tmp.fa GRCh37-lite.fa rm GRCh37-lite.fa.fai samtools faidx GRCh37-lite.fa
-
Create a sequence dictionary for the reference genome.
java -jar $PREFIX/bin/picard.jar CreateSequenceDictionary R=$PREFIX/GRch37-lite.fa O=$PREFIX=GRch37-lite.dict
-
Generate GC and mappability wig files for the reference genome. If you are not using the GRCh37-lite.fa assembly, you will need to follow the instructions here to obtain a BigWig mappability file, and then create the wig files from there.
If you are using GRCh37-lite.fa, then just crop out the MT chromosome from the wig files that ship with TITANRunner. (see https://github.com/rmcclosk/morin-rotation/wiki/TITAN).
cd TITANRunner/data sed `grep -n MT GRCh37-lite.map.ws_1000.wig | cut -d ':' -f 1`,$((`grep -n fixedStep GRCh37-lite.map.ws_1000.wig | grep -A 1 MT | tail -n 1 | cut -d ':' -f 1`-1))d GRCh37-lite.map.ws_1000.wig > $PREFIX/reference/GRCh37-lite.map.wig sed `grep -n MT GRCh37-lite.gc.ws_1000.wig | cut -d ':' -f 1`,$((`grep -n fixedStep GRCh37-lite.gc.ws_1000.wig | grep -A 1 MT | tail -n 1 | cut -d ':' -f 1`-1))d GRCh37-lite.gc.ws_1000.wig > $PREFIX/reference/GRCh37-lite.gc.wig
-
Obtain the dbSNP common_all dataset, and create an index for it.
wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b141_GRCh37p13/VCF/common_all.vcf.gz tabix common_all.vcf.gz
-
Obtain a list of exon regions (eg. a BED file). These regions should be the ones provided by the manufacturer of the kit you used to sequence your library. Hopefully you already have such a file. If not, there some helpful pointers in this thread. The file should have the following format.
$ head -n 3 SureSelect_regions.list 1:30275-30395 1:69069-70029 1:367647-368607