From 70b2d9c0c643213f0fe08bf604c7c9c08949ecff Mon Sep 17 00:00:00 2001
From: Daniel Park <dpark@broadinstitute.org>
Date: Tue, 31 Oct 2017 18:35:50 -0400
Subject: [PATCH] slight updates to RTD docs (#709)

* begin some updates to docs

* fix code block

* rst fixes

* rst fixes

* more docs updates

* more clean up of install.rst

* update image tag in example
---
 docs/description.rst | 25 +++++++++++-------------
 docs/install.rst     | 46 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 14 deletions(-)

diff --git a/docs/description.rst b/docs/description.rst
index ea936787d..9d46b2481 100644
--- a/docs/description.rst
+++ b/docs/description.rst
@@ -37,9 +37,10 @@ Viral genome assembly
 ~~~~~~~~~~~~~~~~~~~~~
 
 The filtered and trimmed reads are subsampled to at most 100,000 pairs.
-*de novo* assemby is performed using Trinity_.
+*de novo* assemby is performed using Trinity_. SPAdes_ is also offered as
+an alternative *de novo* assembler.
 Reference-assisted assembly improvements follow (contig scaffolding, orienting, etc.)
-with MUMMER_ and MAFFT_.
+with MUMMER_ and MUSCLE_ or MAFFT_. Gap2Seq_ is used to seal gaps between scaffolded *de novo* contigs with sequencing reads.
 
 Each sample's reads are aligned to its *de novo* assembly using Novoalign_
 and any remaining duplicates were removed using Picard_ MarkDuplicates.
@@ -51,8 +52,11 @@ reads were changed to N.
 This align-call-refine cycle is iterated twice, to minimize reference bias in the assembly.
  
 .. _Trinity: http://trinityrnaseq.github.io/
+.. _SPAdes: http://bioinf.spbau.ru/en/spades
 .. _MUMMER: http://mummer.sourceforge.net/
+.. _MUSCLE: https://www.drive5.com/muscle/
 .. _MAFFT: http://mafft.cbrc.jp/alignment/software/
+.. _Gap2Seq: https://www.cs.helsinki.fi/u/lmsalmel/Gap2Seq/
 .. _Novoalign: http://www.novocraft.com/products/novoalign/
 .. _Picard: http://broadinstitute.github.io/picard
 .. _GATK: https://www.broadinstitute.org/gatk/
@@ -82,16 +86,9 @@ assembly. Annotations are computed with snpEff_.
 Taxonomic read identification
 -----------------------------
 
-Nothing here at the moment. That comes later, but we will later
-integrate it when it's ready.
+Metagenomic classifiers include Kraken_ and Diamond_. In each case, results are
+visualized with Krona_.
 
-
-Cloud compute implementation
-----------------------------
-
-This assembly pipeline is also available via the DNAnexus cloud
-platform. RNA paired-end reads from either HiSeq or MiSeq instruments
-can be securely uploaded in FASTQ or BAM format and processed through
-the pipeline using graphical and command-line interfaces. Instructions
-for the cloud analysis pipeline are available at
-https://github.com/dnanexus/viral-ngs/wiki
+.. _Kraken: https://ccb.jhu.edu/software/kraken/
+.. _Diamond: https://ab.inf.uni-tuebingen.de/software/diamond
+.. _Krona: https://github.com/marbl/Krona/wiki
diff --git a/docs/install.rst b/docs/install.rst
index 9fa7c4cc6..7bfcb914f 100644
--- a/docs/install.rst
+++ b/docs/install.rst
@@ -2,6 +2,52 @@ Installation
 ============
 
 
+Cloud compute implementations
+-----------------------------
+
+Docker Images
+~~~~~~~~~~~~~
+
+To facilitate cloud compute deployments, we have published a complete Docker
+image with associated dependencies at
+`DockerHub <https://hub.docker.com/r/broadinstitute/viral-ngs/>`_.
+Simply ``docker pull broadinstitute/viral-ngs:1.18.2`` (or some other tagged version).
+
+
+DNAnexus
+~~~~~~~~
+
+This assembly pipeline is also available via the DNAnexus cloud
+platform. RNA paired-end reads from either HiSeq or MiSeq instruments
+can be securely uploaded in FASTQ or BAM format and processed through
+the pipeline using graphical and command-line interfaces. Instructions
+for the cloud analysis pipeline are available at
+https://github.com/dnanexus/viral-ngs/wiki
+
+
+Google Cloud Platform: dsub
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All of the command line functions in viral-ngs are accessible from the docker image_ and can be invoked directly using dsub_.
+
+.. _dsub: https://cloud.google.com/genomics/v1alpha2/dsub
+.. _image: https://hub.docker.com/r/broadinstitute/viral-ngs/
+
+Here is an example invocation of ``illumina.py illumina_demux`` (replace the project with your GCP project, and the input, output-recursive, and logging parameters with URIs within your GCS buckets)::
+
+  dsub --project broad-sabeti-lab --zones "us-east1-*" \
+    --image broadinstitute/viral-ngs:1.18.2 \
+    --name illumina_demux-test \
+    --logging gs://sabeti-temp-30d/dpark/test-demux/logs \
+    --input FC_TGZ=gs://sabeti-sequencing/flowcells/broad-walkup/160907_M04004_0066_000000000-AJH8U.tar.gz \
+    --output-recursive OUTDIR=gs://sabeti-temp-30d/dpark/test-demux \
+    --command 'illumina.py illumina_demux ${FC_TGZ} 1 ${OUTDIR}' \
+    --min-ram 30 \
+    --min-cores 8 \
+    --disk-size 100
+
+
+
 Manual Installation
 -------------------