You will need the following software tools to be installed on the system:
- A job scheduler - the current code base includes a wrapper for 'slurm'
- 'R'
- The Bioconductor package 'QDNASeq'
- 'FastQC'
- A quality and adapter trimming tool - the current code base includes a wrapper for
Trimgalore
- An alignment tool - the current code base includes a wrapper for 'bwa' version 0.7.15-r1140
- 'Samtools' - the current code base includes a wrapper for version 0.1.19-44428cd
Obtain a local copy of the pipeline by running the following command from the directory in which you would like to store the source code:
git clone https://github.com/crickbabs/LowPassKaryo
Next, edit the following scripts to customise to your own system:
- ./LowPassKaryo.R - Set the variable "SITE.CONFIG.FILE" to point at your copy of config.R (which is to be found in the SRC directory)
- ./SRC/config.R - This file should be edited to represent your own system. In particular the following variables should be set explicitly CONTACT.ADDRESS : the email address of the first point of contact for people receiving the report files. SOURCE.DIR: the location of the pipeline source files (almost certainly the directory that config.R is stored in) GENOMES.LOOKUP.FILE: the location of the genomes look up file -- see below for the appropriate format for this file BINARY.CALLS: the system specific call for each binary used. If you use a module system you should include the module load statements here.
If you choose to use an aligner or trimmer other than bwa/Trimmomatic respectively, then you will also need to provide an appropriate WRAPPED_do script and update the appropriate WRAPPED.FILES and VERSION.COMMANDS entries in config.R
If you use a scheduler other than slurm, you will also need to define appropriate build.submisison, get.id & check.for.id functions in the scheduler_submisison_functions.R script & update the assignment of build.scheduler.submisison, get.scheduler.id, check.queue.for.id, DEFAULT.SCHEDULER.OPTIONS and THREADS.REGEXP.
##GENOMES.LOOKUP.FILE
This should be a tab delimited text file containing five columns with the following names
- Tag: This column should contain the "name" corresponding to this genome which will be included in the design file (e.g. Homo sapiens or mm10)
- Description: The formal name of the genome including build and release if appropriate (e.g. GRCh38-r39 or hg19)
- Species: The generic name for the species (e.g. Homo sapiens, Mus musculus)
- Reference_Path: path to the reference sequence file to be passed to your aligner of choice.
- QDNAseq_Annotated_Bins_Path: path to an RDat object for use by QDNASeq. The script build_qdnaseq_binfile.R can be used to generate objects in the appropriate format - see the script for more details
See genome_lookup_table_template.txt for an example of the format.