Skip to content
rmcclosk edited this page Dec 16, 2014 · 2 revisions

This page describes how to set up all necessary software so that all the analyses described in this wiki can be run. We will install everything in $PREFIX, which you will have to set, eg. export PREFIX=$HOME.

Environment

Set these environment variables.

export PATH=$PREFIX/bin:$PATH
export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
export R_LIBS_USER=$PREFIX/Rlibrary
export PYTHONPATH=$PREFIX/lib/python2.7
export PKG_CONFIG_PATH=$PREFIX/lib/pkgconfig:$PREFIX/share/pkgconfig:$PKG_CONFIG_PATH

You must have a fairly recent version of gcc in your $PATH, and the shared libraries for mpc, mpfr, and gmp in your $LD_LIBRARY_PATH. Installing these is outside the scope of this document. If you are on genesis, they are already installed, you just need to do the following.

export PATH=/gsc/software/linux-x86_64-centos5/gcc-4.9.1/bin:$PATH
export LD_LIBRARY_PATH=/gsc/software/linux-x86_64-centos5/mpc-1.0.2/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/gsc/software/linux-x86_64-centos5/mpfr-3.1.2/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/gsc/software/linux-x86_64-centos5/gmp-6.0.0a/lib:$LD_LIBRARY_PATH

Add this line to your $HOME/.Rprofile (create this file if it doesn't already exist).

options(bitmapType="cairo")

Python

  1. Download and install sqlite3.

    wget http://www.sqlite.org/2014/sqlite-autoconf-3080703.tar.gz
    tar xf sqlite-autoconf-3080703.tar.gz
    cd sqlite-autoconf-3080703
    ./configure --prefix=$PREFIX
    make
    make install
    cd ..
    
  2. Download and install OpenSSL.

    wget https://www.openssl.org/source/openssl-1.0.1j.tar.gz
    tar xf openssl-1.0.1j.tar.gz
    cd openssl-1.0.1j
    CFLAGS='-fPIC' ./config shared --prefix=$PREFIX
    make
    make install
    cd ..
    
  3. Download Python 2.

    wget https://www.python.org/ftp/python/2.7.8/Python-2.7.8.tgz
    tar xf Python-2.7.8.tgz
    cd Python-2.7.8
    
  4. Edit Modules/Setup.dist. Uncomment the lines defining SSL, and change the location to $PREFIX. For example, if your $PREFIX is /home/whoami, it would be as follows.

    SSL=/home/whoami
    _ssl _ssl.c \
        -DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
        -L$(SSL)/lib -lssl -lcrypto
    

    Then copy the file to Modules/Setup

    cp Modules/Setup.dist Modules/Setup
    
  5. Compile Python. You must use these exact options to configure to get all the dependencies to work.

    ./configure --prefix=$PREFIX --enable-shared --enable-unicode=ucs4
    make
    make install
    cd ..
    
  6. Install pip.

    wget --no-check-certificate https://bootstrap.pypa.io/get-pip.py
    python get-pip.py
    
  7. Install ruffus.

    wget -O ruffus-2.4.1.tar.gz https://github.com/bunbun/ruffus/archive/v2.4.1.tar.gz
    tar xf ruffus-2.4.1.tar.gz
    cd ruffus-2.4.1
    python setup.py install --prefix=$PREFIX
    cd ..
    
  8. Install lockfile and pysam with pip.

    pip install lockfile
    pip install pysam
    
  9. Download bamUtils.py and mafUtils.py from here and add their location to your $PYTHONPATH.

  10. Install the BioPython library. You will get a warning about not having NumPy, which can be ignored, since we're only using the IO features.

    wget http://biopython.org/DIST/biopython-1.64.tar.gz
    tar xf biopython-1.64.tar.gz
    cd biopython-1.64
    python setup.py install --prefix=$PREFIX
    cd ..
    

R

  1. Download and install tar (yes, your system probably has tar already, but you need a newer version to deal with xzips).

    wget http://ftpmirror.gnu.org/tar/tar-latest.tar.gz
    tar xf tar-latest.tar.gz
    cd tar-1.28 # your folder may be a more recent version
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  2. Download and install XZ Utils.

    wget http://tukaani.org/xz/xz-5.0.7.tar.gz
    tar xf xz-5.0.7.tar.gz
    cd xz-5.0.7
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  3. Download and install pixman.

    wget http://cairographics.org/releases/pixman-0.32.6.tar.gz
    tar xf pixman-0.32.6.tar.gz
    cd pixman-0.32.6
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  4. Download and install xproto.

    wget ftp://mirror.csclub.uwaterloo.ca/x.org/current/src/everything/xproto-7.0.23.tar.gz
    tar xf xproto-7.0.23.tar.gz
    cd xproto-7.0.23
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  5. Download and install xcb.

    wget http://xcb.freedesktop.org/dist/xcb-proto-1.11.tar.gz
    tar xf xcb-proto-1.11.tar.gz
    cd xcb-proto-1.11
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  6. Download and install libpthread_stubs.

    wget http://xcb.freedesktop.org/dist/libpthread-stubs-0.3.tar.gz
    tar xf libpthread-stubs-0.3.tar.gz
    cd libpthread-stubs-0.3
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  7. Download and install libxcb.

    wget http://xcb.freedesktop.org/dist/libxcb-1.11.tar.gz
    tar xf libxcb-1.11.tar.gz
    cd libxcb-1.11
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  8. Download and install xtrans.

    wget ftp://mirror.csclub.uwaterloo.ca/x.org/current/src/everything/xtrans-1.2.7.tar.gz
    tar xf xtrans-1.2.7.tar.gz
    cd xtrans-1.2.7
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  9. Download and install libX11.

    wget ftp://mirror.csclub.uwaterloo.ca/x.org/current/src/everything/libX11-1.5.0.tar.gz
    tar xf libX11-1.5.0.tar.gz
    cd libX11-1.5.0
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  10. Download and install xextproto.

    wget ftp://mirror.csclub.uwaterloo.ca/x.org/current/src/everything/xextproto-7.2.1.tar.gz
    tar xf xextproto-7.2.1.tar.gz
    cd xextproto-7.2.1
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  11. Download and install libXext.

    wget http://www.x.org/releases/X11R7.7/src/everything/libXext-1.3.1.tar.gz
    tar xf libXext-1.3.1.tar.gz
    cd libXext-1.3.1
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  12. Download and install cairo.

    wget http://cairographics.org/releases/cairo-1.12.18.tar.xz
    tar xf cairo-1.12.18.tar.xz
    cd cairo-1.12.18
    ./configure --prefix=$PREFIX && make && make install
    cd ..
    
  13. Download and install R. For some reason, it doesn't compile properly with the newer GCC, so you should reset your $PATH so that which gcc returns /usr/bin/gcc. Put the $PATH back to the way it was after this step is finished.

    wget http://cran.rstudio.com/src/base/R-3/R-3.1.2.tar.gz
    tar xf R-3.1.2.tar.gz
    cd R-3.1.2
    ./configure --with-x=no --prefix=$PREFIX && make && make install
    cd ..
    
  14. Install these R packages. Do this from the R console.

    install.packages("foreach")
    install.packages("doMC")
    install.packages("parallel")
    install.packages("getopt")
    install.packages("numDeriv")
    source("http://bioconductor.org/biocLite.R")
    biocLite("HMMcopy")
    biocLite("IRanges")
    biocLite("GenomeInfoDb")
    biocLite("Rsamtools")
    biocLite("SNPchip")
    biocLite("TitanCNA")
    biocLite("DNAcopy")
  15. Download and install ABSOLUTE.

    wget http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/absolute/ABSOLUTE_1.0.6.tar.gz
    R CMD INSTALL ABSOLUTE_1.0.6.tar.gz
    
  16. Download and install ExomeCNV.

    wget http://cran.r-project.org/src/contrib/Archive/ExomeCNV/ExomeCNV_1.4.tar.gz
    R CMD INSTALL ExomeCNV_1.4.tar.gz
    

Other utilities

  1. Download and install mafft.

    wget http://mafft.cbrc.jp/alignment/software/mafft-7.213-without-extensions-src.tgz
    tar xf mafft-7.213-without-extensions-src.tgz
    cd mafft-7.213-without-extensions/core
    sed -i s_"PREFIX = /usr/local"_"PREFIX = $PREFIX"_ Makefile
    make
    make install
    cd ../..
    
  2. Download and install samtools/bcftools. You must use a samtools verison whose number starts with 0. Verisons 1.0 and above will not work with TITAN.

    wget -O samtools-0.1.20.tar.gz https://github.com/samtools/samtools/archive/0.1.20.tar.gz
    tar xf samtools-0.1.20.tar.gz
    cd samtools-0.1.20
    sed -i -e 's/-D_CURSES_LIB=1/-D_CURSES_LIB=0/' -e 's/LIBCURSES=/#LIBCURSES=/' Makefile # disable ncurses
    make
    mv samtools $PREFIX/bin
    cd bcftools
    make
    mv bcftools $PREFIX/bin
    cd ../..
    
  3. Download and install bowtie. You must use version 1, bowtie2 will not work.

    wget -O bowtie-1.1.1.zip http://sourceforge.net/projects/bowtie-bio/files/bowtie/1.1.1/bowtie-1.1.1-linux-x86_64.zip/download
    unzip bowtie-1.1.1.zip
    mv bowtie-1.1.1/bowtie* $PREFIX/bin
    cd ..
    
  4. Download GATK. First go to this page and sign up for an account. Once you have an account, go to the download page and click "GATK" (you don't need Queue). Accept the license agreement.

    Once you have downloaded the archive, extract it and move the jar file to $PREFIX/bin.

    tar xf GenomeAnalysisTK-3.3-0.tar.bz2
    rm -rf resources
    mv GenomeAnalysisTK.jar $PREFIX/bin
    
  5. Download and install tabix.

    wget -O htslib-1.1.tar.gz https://github.com/samtools/htslib/archive/1.1.tar.gz
    tar xf htslib-1.1.tar.gz
    cd htslib-1.1
    make
    mv tabix $PREFIX/bin
    cd ..
    
  6. Install Picard Tools.

    wget -O picard-tools-1.127.zip https://github.com/broadinstitute/picard/releases/download/1.127/picard-tools-1.127.zip
    unzip picard-tools-1.127.zip
    mv picard-tools-1.127/picard.jar $PREFIX/bin
    
  7. Download and install RAxML.

    wget -O standard-RAxML-8.1.13.tar.gz https://github.com/stamatak/standard-RAxML/archive/v8.1.13.tar.gz
    tar xf standard-RAxML-8.1.13.tar.gz
    cd standard-RAxML-8.1.13
    make -f Makefile.gcc
    mv raxmlHPC $HOME/bin
    cd ..
    

TITAN

  1. Download the TITANRunner pipeline.

    wget http://compbio.bccrc.ca/files/2013/07/TITANRunner-0.1.1.zip
    unzip TITANRunner-0.1.1.zip
    
  2. Copy config_default.cfg to config.cfg, and edit it to match your local environment.

  3. Change TITANRunner/scripts/filter_chromosomes.py to discard the mitochondrial chromosome, by changing line 8 from

    if 'GL' in line or 'gl' in line or 'hap' in line:

    to

    if 'GL' in line or 'gl' in line or 'hap' in line or 'M' in line:
  4. Change line 55 in TITANRunner/scripts/count.py from

    result += "\t" + ref_base + "\t" + str(ref_count) + "\tX\t" + str(alt_count) + '\t' + position_data[1]
    

    to

    result += "\t" + ref_base + "\t" + str(ref_count) + "\tX\t" + str(alt_count)
    
  5. Optional to discard the Y chromosome (which can sometimes cause problems, and won't work with samples from female patients), change line 56 of TITANRunner/scripts/titan.R from

    data <- filterData(data,c(1:22,"X","Y"),minDepth=10,maxDepth=200,map=mScore,mapThres=0.8)
    

    to

    data <- filterData(data,c(1:22,"X"),minDepth=10,maxDepth=200,map=mScore,mapThres=0.8)
    
  6. Optional If running TITAN on WES data (rather than WGS), follow step 7 of the official documentation, namely editing scripts/correctReads.R to use your exon co-ordinates. In this example, we'll use the file SureSelect_regions.list.

    $ head -n 3 SureSelect_regions.list
    1:30275-30395
    1:69069-70029
    1:367647-368607
    

    Add some code to correctReads.R to parse these exon co-ordinates.

    exons <- read.table("/extscratch/morinlab/shared/common/SureSelect_regions.list", stringsAsFactors=F)
    split <- strsplit(exons[,1], "[:-]")
    chr <- factor(sapply(split, "[[", 1), levels=c(1:22, "X", "Y"))
    start <- as.integer(sapply(split, "[[", 2))
    stop <- as.integer(sapply(split, "[[", 3))
    exons <- data.frame(chr=chr, start=start, stop=stop)

    If your exon co-ordinates are in a different format (a BED file, for example), the code to parse them will be different. In any case, you must end up with a data.frame with three columns: chr, start, and end. Then replace the line cnData <- ... with

    cnData <- correctReadDepth(tumWig, normWig, gc, map, targetedSequence = exons)

Obtaining reference files

We'll be putting all these files in $PREFIX/reference, so before starting, cd there.

  1. Obtain a copy of the human genome. This should be the same reference genome that your BAM files were aligned to. Create an index for it with samtools.

    wget http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa
    samtools faidx GRCh37-lite.fa
    
  2. Select only the 24 ordinary chromosomes from the FASTA file, and reindex it. This will help eliminate errors caused by mismatching chromosome names.

    samtools faidx GRCh37-lite.fa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y > tmp.fa
    mv tmp.fa GRCh37-lite.fa
    rm GRCh37-lite.fa.fai
    samtools faidx GRCh37-lite.fa
    
  3. Create a sequence dictionary for the reference genome.

    java -jar $PREFIX/bin/picard.jar CreateSequenceDictionary R=$PREFIX/GRch37-lite.fa O=$PREFIX=GRch37-lite.dict
    
  4. Generate GC and mappability wig files for the reference genome. If you are not using the GRCh37-lite.fa assembly, you will need to follow the instructions here to obtain a BigWig mappability file, and then create the wig files from there.

    If you are using GRCh37-lite.fa, then just crop out the MT chromosome from the wig files that ship with TITANRunner. (see https://github.com/rmcclosk/morin-rotation/wiki/TITAN).

    cd TITANRunner/data
    sed `grep -n MT GRCh37-lite.map.ws_1000.wig | cut -d ':' -f 1`,$((`grep -n fixedStep GRCh37-lite.map.ws_1000.wig | grep -A 1 MT | tail -n 1 | cut -d ':' -f 1`-1))d GRCh37-lite.map.ws_1000.wig > $PREFIX/reference/GRCh37-lite.map.wig
    sed `grep -n MT GRCh37-lite.gc.ws_1000.wig | cut -d ':' -f 1`,$((`grep -n fixedStep GRCh37-lite.gc.ws_1000.wig | grep -A 1 MT | tail -n 1 | cut -d ':' -f 1`-1))d GRCh37-lite.gc.ws_1000.wig > $PREFIX/reference/GRCh37-lite.gc.wig
    
  5. Obtain the dbSNP common_all dataset, and create an index for it.

    wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b141_GRCh37p13/VCF/common_all.vcf.gz
    tabix common_all.vcf.gz
    
  6. Obtain a list of exon regions (eg. a BED file). These regions should be the ones provided by the manufacturer of the kit you used to sequence your library. Hopefully you already have such a file. If not, there some helpful pointers in this thread. The file should have the following format.

    $ head -n 3 SureSelect_regions.list
    1:30275-30395
    1:69069-70029
    1:367647-368607
    
Clone this wiki locally