This directory contains the workflow for the MetaRiboSeq method. It uses nextflow as the workflow manager and combination of Docker and Singularity to build containers.
The singularity
images for this workflow are built using both docker
and singularity
. Docker is used since it is much more performant and
flexible than singularity
especially when used with buildkit. Two
docker images are used, one that is a base image intend for broad
use and a second that is specific to metariboseq. The singularity
image
packages the metariboseq docker image for use on a local cluster, such
as Stanford's SCG, however, there is nothing specific to SCG in this
workflow and it can run on a single node or even a laptop, albeit slowly.
See the ./docker
and ./singularity
directories for build scripts etc.
Nextflow
version 20.07.1.5412
is used to implement the actual workflow.
Note that this is installed in /labs/asbhatt/tools/swtools/bin
.
Some common conventions:
- All nextflow scripts have the suffix
.nf
- A shell script,
run-nextflow.sh
is used to runnextflow
with appropriate parameters. The samples are specified via a parameter file, as are common workflow options. - Nextflow configuration files are specified by these scripts as appropriate
for runing locally or using
slurm
. These are in thenextflow-configs
directory.
The parameters file is in yaml
format and is structured as follows:
option1: some options
option2: some options
sampleSpecs:
- name: <name>
metagenomic: <metogenomic-data>
metariboseq: <metariboseq-data>
The options are used to configure various command line arguments
passed to the various commands used, e.g. number of threads for spades
,
or arguments to trim_galore
. The sampleSpecs
simple enumerate all
of the samples to be processed by giving each a name and the file name
component (without .fq.gz
) for each paired set of sequencing files.
Here is a complete example.
trimGaloreOptions: "--cores 4 -q 30 --illumina"
spadesOptions: "--threads=4 --memory=96"
alignmentMemory: "96 GB"
assemblyMemory: "96 GB"
bowtieIndexOptions: "--threads 4"
bowtieAlignmentOptions: "--threads 4"
sampleSpecs:
- name: sampleA
metagenomic: SampleADNA
metariboseq: BrayonRibo_1_S7_R1_001
All workflows are in the workflows
directory:
subsample
: generates sub-sampled input data for testing/development.metariboseq
: the actual metariboseq workflow.
A nextflow script is provided for subsampling a given set of samples to
create a small 'test' sample for development purposes. The full samples
take on the order of 24-48 hours to assemble and hence are cumbersome
to work with. See the subsample
directory for details. The run-nextflow.sh
script runs nextflow
with appropriate parameter files and configuration
to generate the subsampled data.
The run-nextflow.sh
scripts accepts one of the following parameters:
test
: runs with subsampled test datascg-sampleA
: runs on scg for sampleAscg-all
: runs all samples on scg
Making changes to the parameter files should be fairly self-evident.
The nextflow
execution graph is complicated by the need to create
assemblies for the metagenomic
files, but not for the metariboseq
files and to then build indices and aligments using the metagenomic
assembly.