Skip to content
rmcclosk edited this page Dec 17, 2014 · 4 revisions

Set up

Set these environment variables.

export PATH=/home/rmccloskey/bin:/gsc/software/linux-x86_64-centos5/java-1.7.0-u13/bin:$PATH
export LD_LIBRARY_PATH=/home/rmccloskey/lib:$LD_LIBRARY_PATH
export R_LIBS_USER=/home/rmccloskey/Rlibrary
export PYTHONPATH=/home/rmccloskey/lib/python2.7:/home/rmccloskey/morin-rotation/modules:$PYTHONPATH

Add this line to your $HOME/.Rprofile (create this file if it doesn't already exist).

options(bitmapType="cairo")

Running ABSOLUTE

  1. Run ABSOLUTE.R on each of your segments. There are a number of options to ABSOLUTE.R, which are documented here, but the only mandatory ones are the segments file and the output directory.

    If you have a MAF file:

    ./ABSOLUTE.R -s my.seg -o results -m my.maf
    

    If you don't have a MAF file:

    ./ABSOLUTE.R -s my.seg -o results
    

    Do this for ALL your samples, and put the results in the same directory (ie. results in this case).

  2. Run summarize.R on the directory where all the results have been placed.

    /summarize.R -p results -n summary
    

ExomeCNV

  1. Run GATK DepthOfCoverage on both the tumour and normal BAM files.

    java -Xmx2048m -jar $HOME/bin/GenomeAnalysisTK.jar \
            -T DepthOfCoverage \
            -omitBaseOutput \
            -omitLocusTable \
            -R GRCh37-lite.fa \
            -I normal.bam \
            -L SureSelect_regions.list \
            -o normal
    
    java -Xmx2048m -jar $HOME/bin/GenomeAnalysisTK.jar \
            -T DepthOfCoverage \
            -omitBaseOutput \
            -omitLocusTable \
            -R GRCh37-lite.fa \
            -I tumour.bam \
            -L SureSelect_regions.list \
            -o tumour
    
  2. Run ExomeCNV.R on the sample_interval_summary files output by the previous step.

    ./ExomeCNV.R -a tumour.sample_interval_summary \
                 -b normal.sample_interval_summary \
                 -c 100 \
                 -d 0.3 \
                 -p result.segment.copynumber.txt
    

    Here, -c 100 means that the average read length is 100, and -d 0.3 means that the tumour sample has an estimated 30% contamination of normal cells. The -p option tells the script where to save the segment.copynumber.txt file; we are discarding all other outputs. There are several more options to ExomeCNV.R - check the file as well as the official documentation for more details.

TITAN

There is a config file in /home/rmccloskey/TITANRunner/config.cfg which has sensible defaults. Consult the official documentation for more information about the rest of the settings in the config file. If you want to change any settings, copy and edit /home/rmccloskey/TITANRunner/config.cfg, and use the updated copy as the -c argument to the pipeline.

The official instructions are thorough and well-written, and should be consulted before running the pipeline and/or if any errors are encountered. What's provided here is a brief overview, which should work in the genesis environment.

Preparing your BAM files

Make sure the names of the chromosomes in your BAM files match the names in your reference genome. For instance, if the reference has chromosomes 1, 2, etc., make sure the chromosomes in your BAM files are not called chr1, chr2, etc. You can fix this error with the sed utility. You will need to reindex your BAM files with samtools index after changing the header.

samtools view -H my.bam | sed s/chr//g > header.sam
samtools reheader header.sam my.bam > tmp.bam
mv tmp.bam my.bam
rm header.sam
samtools index my.bam

Running the pipeline

  1. Make a 6-column, tab-delimited file, with the columns (tumourID, tumour_libID, tumour_path, normalID, normal_libID, normal_path). If your samples don't have separate library ID's, you can specify the same identifier for both columns.

    echo -e "tumourID\ttumourLibID\t/path/to/tumour.bam\tnormalID\tnormalLibID\t/path/to/normal.bam" > titan_input.txt
    
  2. Run the pipeline.

    python TITANRunner/TitanRunner.py -i titan_input.txt --project-name my_project --project-path results -c TITANRunner/config.cfg
    

Interpreting the results

After running TITAN, there will be a number of files in a directory like results/my_project_R0023/titan. These will all have a cluster number (eg. cluster_3) in their names. Examine each of the files ending with params.txt, and look for the line starting with S_Dbw validity index:. The cluster with the lowest value is likely the best one, but see question 2 of the official FAQ.

Troubleshooting

If you see "Code 1" in your output, such as below, it means that something went wrong.

Your job 8104502 ("titan_tumourID_tumourLibID_normalID_normalLibID_titan_correctreads.sh") has been submitted
Job 8104502 exited with exit code 1.
Your job 8104503 ("titan_tumourID_tumourLibID_normalID_normalLibID_titan_correctreads.sh") has been submitted
Job 8104503 exited with exit code 1.

Examine the failing script's stdout and stderr. If the example command above was run, these would be in the following directory (0003 is just a database ID, yours may be different depending on how many times you have run TITAN).

results/my_project_R0003/titan/intermediate_R0003/tumourID_tumourLibID_normalID_normalLibID/logs

You can debug the script by running it individually on the command line. It is located in a sibling directory of the above path.

results/my_project_R0003/titan/intermediate_R0003/tumourID_tumourLibID_normalID_normalLibID/scripts

You will want to run the script from the same directory that you ran the pipeline (ie. $PREFIX). Because of the long path names, it may be easier to copy the script into this directory and run it from there.

Getting variant allele fractions from MAF files

Use the script /home/rmccloskey/MAF/augment_maf.py. Check inside for the description of the parameters. Here is an example which writes output to test.maf.

./augment_maf.py -m indels.maf -m snvs.maf -t tumour.bam -n normal.bam GRCh37-lite.fa test.maf