Skip to content
martinghunt edited this page May 26, 2023 · 24 revisions

Installation

The recommended method is to use a pre-built Docker or Singularity container.

Both the Docker and Singularity container have the main script viridian_workflow installed.

Docker

Get a Docker image of the latest release:

docker pull ghcr.io/iqbal-lab-org/viridian_workflow:latest

All Docker images are listed in the packages page.

To build a docker container, clone this repository and then from its root run:

docker build --network=host .

(without --network=host you will likely get pip install timing out and the build failing).

Singularity

Releases include a Singularity image to download. Each release has a singularity image file called viridian_workflow_vX.Y.Z.img, where X.Y.Z is the release version.

To build a singularity container, clone this repository and then from its root run:

singularity build viridian_workflow.img Singularity.def

Basic Usage

The examples below will run the default pipeline, using the built-in SARS-CoV-2 amplicon schemes ampliseq, ARTIC V3-5, and Midnight-1200. The pipeline automatically detects the scheme that best matches the input reads. To use your own amplicon scheme and/or force the choice of scheme, please read the amplicon schemes page. For a more detailed description of the pipeline options, please read the workflow usage page.

To run on paired Illumina reads:

viridian_workflow run_one_sample \
  --tech illumina
  --reads1 reads_1.fastq.gz \
  --reads2 reads_2.fastq.gz \
  --outdir OUT

To run on unpaired nanopore reads:

viridian_workflow run_one_sample \
  --tech ont
  --reads reads.fastq.gz \
  --outdir OUT

The allowed values of --tech are illumina, iontorrent, ont. Nanopore reads must be unpaired (ie use the --reads option). Illumina and iontorrent reads can be paired or unpaired.

Other options:

  • --sample_name MY_NAME: use this to change the sample name (default is "sample") that is put in the final FASTA file, BAM file, and VCF file.
  • --keep_bam: use this option to keep the BAM file of original input reads mapped to the reference genome.
  • --force: use with caution - it will overwrite the output directory if it already exists.

Output files

The default files in the output directory are:

  • consensus.fa.gz: a gzipped FASTA file of the consensus sequence.
  • variants.vcf: a VCF file of the identified variants between the consensus sequence and the reference genome.
  • log.json.gz: a gzipped JSON file of logging information for the viridian workflow run. This is described in detail in the JSON output file page.
  • scheme_id.depth_across_genome.pdf: a plot of the read depth across the genome, with amplicons coloured in the background.
  • scheme_id.score_plot.pdf: a plot of the scoring for amplicon scheme identification.

If the option --keep_bam is used, then a sorted BAM file of the reads mapped to the reference will also be present, called reference_mapped.bam (and its index file reference_mapped.bam.bai).

Clone this wiki locally