From 70b2d9c0c643213f0fe08bf604c7c9c08949ecff Mon Sep 17 00:00:00 2001 From: Daniel Park Date: Tue, 31 Oct 2017 18:35:50 -0400 Subject: [PATCH] slight updates to RTD docs (#709) * begin some updates to docs * fix code block * rst fixes * rst fixes * more docs updates * more clean up of install.rst * update image tag in example --- docs/description.rst | 25 +++++++++++------------- docs/install.rst | 46 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 14 deletions(-) diff --git a/docs/description.rst b/docs/description.rst index ea936787d..9d46b2481 100644 --- a/docs/description.rst +++ b/docs/description.rst @@ -37,9 +37,10 @@ Viral genome assembly ~~~~~~~~~~~~~~~~~~~~~ The filtered and trimmed reads are subsampled to at most 100,000 pairs. -*de novo* assemby is performed using Trinity_. +*de novo* assemby is performed using Trinity_. SPAdes_ is also offered as +an alternative *de novo* assembler. Reference-assisted assembly improvements follow (contig scaffolding, orienting, etc.) -with MUMMER_ and MAFFT_. +with MUMMER_ and MUSCLE_ or MAFFT_. Gap2Seq_ is used to seal gaps between scaffolded *de novo* contigs with sequencing reads. Each sample's reads are aligned to its *de novo* assembly using Novoalign_ and any remaining duplicates were removed using Picard_ MarkDuplicates. @@ -51,8 +52,11 @@ reads were changed to N. This align-call-refine cycle is iterated twice, to minimize reference bias in the assembly. .. _Trinity: http://trinityrnaseq.github.io/ +.. _SPAdes: http://bioinf.spbau.ru/en/spades .. _MUMMER: http://mummer.sourceforge.net/ +.. _MUSCLE: https://www.drive5.com/muscle/ .. _MAFFT: http://mafft.cbrc.jp/alignment/software/ +.. _Gap2Seq: https://www.cs.helsinki.fi/u/lmsalmel/Gap2Seq/ .. _Novoalign: http://www.novocraft.com/products/novoalign/ .. _Picard: http://broadinstitute.github.io/picard .. _GATK: https://www.broadinstitute.org/gatk/ @@ -82,16 +86,9 @@ assembly. Annotations are computed with snpEff_. Taxonomic read identification ----------------------------- -Nothing here at the moment. That comes later, but we will later -integrate it when it's ready. +Metagenomic classifiers include Kraken_ and Diamond_. In each case, results are +visualized with Krona_. - -Cloud compute implementation ----------------------------- - -This assembly pipeline is also available via the DNAnexus cloud -platform. RNA paired-end reads from either HiSeq or MiSeq instruments -can be securely uploaded in FASTQ or BAM format and processed through -the pipeline using graphical and command-line interfaces. Instructions -for the cloud analysis pipeline are available at -https://github.com/dnanexus/viral-ngs/wiki +.. _Kraken: https://ccb.jhu.edu/software/kraken/ +.. _Diamond: https://ab.inf.uni-tuebingen.de/software/diamond +.. _Krona: https://github.com/marbl/Krona/wiki diff --git a/docs/install.rst b/docs/install.rst index 9fa7c4cc6..7bfcb914f 100644 --- a/docs/install.rst +++ b/docs/install.rst @@ -2,6 +2,52 @@ Installation ============ +Cloud compute implementations +----------------------------- + +Docker Images +~~~~~~~~~~~~~ + +To facilitate cloud compute deployments, we have published a complete Docker +image with associated dependencies at +`DockerHub `_. +Simply ``docker pull broadinstitute/viral-ngs:1.18.2`` (or some other tagged version). + + +DNAnexus +~~~~~~~~ + +This assembly pipeline is also available via the DNAnexus cloud +platform. RNA paired-end reads from either HiSeq or MiSeq instruments +can be securely uploaded in FASTQ or BAM format and processed through +the pipeline using graphical and command-line interfaces. Instructions +for the cloud analysis pipeline are available at +https://github.com/dnanexus/viral-ngs/wiki + + +Google Cloud Platform: dsub +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +All of the command line functions in viral-ngs are accessible from the docker image_ and can be invoked directly using dsub_. + +.. _dsub: https://cloud.google.com/genomics/v1alpha2/dsub +.. _image: https://hub.docker.com/r/broadinstitute/viral-ngs/ + +Here is an example invocation of ``illumina.py illumina_demux`` (replace the project with your GCP project, and the input, output-recursive, and logging parameters with URIs within your GCS buckets):: + + dsub --project broad-sabeti-lab --zones "us-east1-*" \ + --image broadinstitute/viral-ngs:1.18.2 \ + --name illumina_demux-test \ + --logging gs://sabeti-temp-30d/dpark/test-demux/logs \ + --input FC_TGZ=gs://sabeti-sequencing/flowcells/broad-walkup/160907_M04004_0066_000000000-AJH8U.tar.gz \ + --output-recursive OUTDIR=gs://sabeti-temp-30d/dpark/test-demux \ + --command 'illumina.py illumina_demux ${FC_TGZ} 1 ${OUTDIR}' \ + --min-ram 30 \ + --min-cores 8 \ + --disk-size 100 + + + Manual Installation -------------------