-
Notifications
You must be signed in to change notification settings - Fork 1
Analyses
Set these environment variables.
export PATH=/home/rmccloskey/bin:/gsc/software/linux-x86_64-centos5/java-1.7.0-u13/bin:$PATH
export LD_LIBRARY_PATH=/home/rmccloskey/lib:$LD_LIBRARY_PATH
export R_LIBS_USER=/home/rmccloskey/Rlibrary
export PYTHONPATH=/home/rmccloskey/lib/python2.7:/home/rmccloskey/morin-rotation/modules:$PYTHONPATH
Add this line to your $HOME/.Rprofile
(create this file if it doesn't already exist).
options(bitmapType="cairo")
-
Run ABSOLUTE.R on each of your segments. There are a number of options to ABSOLUTE.R, which are documented here, but the only mandatory ones are the segments file and the output directory.
If you have a MAF file:
./ABSOLUTE.R -s my.seg -o results -m my.maf
If you don't have a MAF file:
./ABSOLUTE.R -s my.seg -o results
Do this for ALL your samples, and put the results in the same directory (ie.
results
in this case). -
Run summarize.R on the directory where all the results have been placed.
/summarize.R -p results -n summary
-
Run GATK DepthOfCoverage on both the tumour and normal BAM files.
java -Xmx2048m -jar $HOME/bin/GenomeAnalysisTK.jar \ -T DepthOfCoverage \ -omitBaseOutput \ -omitLocusTable \ -R GRCh37-lite.fa \ -I normal.bam \ -L SureSelect_regions.list \ -o normal java -Xmx2048m -jar $HOME/bin/GenomeAnalysisTK.jar \ -T DepthOfCoverage \ -omitBaseOutput \ -omitLocusTable \ -R GRCh37-lite.fa \ -I tumour.bam \ -L SureSelect_regions.list \ -o tumour
-
Run ExomeCNV.R on the
sample_interval_summary
files output by the previous step../ExomeCNV.R -a tumour.sample_interval_summary \ -b normal.sample_interval_summary \ -c 100 \ -d 0.3 \ -p result.segment.copynumber.txt
Here,
-c 100
means that the average read length is 100, and-d 0.3
means that the tumour sample has an estimated 30% contamination of normal cells. The-p
option tells the script where to save thesegment.copynumber.txt
file; we are discarding all other outputs. There are several more options to ExomeCNV.R - check the file as well as the official documentation for more details.
There is a config file in /home/rmccloskey/TITANRunner/config.cfg
which has sensible defaults. Consult the official documentation for more information about the rest of the settings in the config file. If you want to change any settings, copy and edit /home/rmccloskey/TITANRunner/config.cfg
, and use the updated copy as the -c
argument to the pipeline.
The official instructions are thorough and well-written, and should be consulted before running the pipeline and/or if any errors are encountered. What's provided here is a brief overview, which should work in the genesis environment.
Make sure the names of the chromosomes in your BAM files match the names in your reference genome. For instance, if the reference has chromosomes 1, 2, etc., make sure the chromosomes in your BAM files are not called chr1, chr2, etc. You can fix this error with the sed
utility. You will need to reindex your BAM files with samtools index
after changing the header.
samtools view -H my.bam | sed s/chr//g > header.sam
samtools reheader header.sam my.bam > tmp.bam
mv tmp.bam my.bam
rm header.sam
samtools index my.bam
-
Make a 6-column, tab-delimited file, with the columns (tumourID, tumour_libID, tumour_path, normalID, normal_libID, normal_path). If your samples don't have separate library ID's, you can specify the same identifier for both columns.
echo -e "tumourID\ttumourLibID\t/path/to/tumour.bam\tnormalID\tnormalLibID\t/path/to/normal.bam" > titan_input.txt
-
Run the pipeline.
python TITANRunner/TitanRunner.py -i titan_input.txt --project-name my_project --project-path results -c TITANRunner/config.cfg
After running TITAN, there will be a number of files in a directory like results/my_project_R0023/titan
. These will all have a cluster number (eg. cluster_3
) in their names. Examine each of the files ending with params.txt
, and look for the line starting with S_Dbw validity index:
. The cluster with the lowest value is likely the best one, but see question 2 of the official FAQ.
If you see "Code 1" in your output, such as below, it means that something went wrong.
Your job 8104502 ("titan_tumourID_tumourLibID_normalID_normalLibID_titan_correctreads.sh") has been submitted
Job 8104502 exited with exit code 1.
Your job 8104503 ("titan_tumourID_tumourLibID_normalID_normalLibID_titan_correctreads.sh") has been submitted
Job 8104503 exited with exit code 1.
Examine the failing script's stdout and stderr. If the example command above was run, these would be in the following directory (0003 is just a database ID, yours may be different depending on how many times you have run TITAN).
results/my_project_R0003/titan/intermediate_R0003/tumourID_tumourLibID_normalID_normalLibID/logs
You can debug the script by running it individually on the command line. It is located in a sibling directory of the above path.
results/my_project_R0003/titan/intermediate_R0003/tumourID_tumourLibID_normalID_normalLibID/scripts
You will want to run the script from the same directory that you ran the pipeline (ie. $PREFIX
). Because of the long path names, it may be easier to copy the script into this directory and run it from there.
Use the script /home/rmccloskey/MAF/augment_maf.py
. Check inside for the description of the parameters. Here is an example which writes output to test.maf
.
./augment_maf.py -m indels.maf -m snvs.maf -t tumour.bam -n normal.bam GRCh37-lite.fa test.maf