refactor: update README.md (#620)

* update README.md * change * change * fix * fix * fix
IKIM-Essen · Dec 21, 2023 · b521e42 · b521e42
1 parent 77f7960
commit b521e42
Showing 1 changed file with 141 additions and 59 deletions.
diff --git a/README.md b/README.md
@@ -1,75 +1,157 @@
 # UnCoVar
 
-
 <picture>
-  <source media="(prefers-color-scheme: dark)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/8e17c6fc-ff7a-4c25-afc9-7888036d693e">
-  <source media="(prefers-color-scheme: light)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/c99f5a94-749b-422e-b319-1e3700d40a8e">
+  <source media="(prefers-color-scheme: dark)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/8e17c6fc-ff7a-4c25-afc9-7888036d693e" width="40%">
+  <source media="(prefers-color-scheme: light)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/c99f5a94-749b-422e-b319-1e3700d40a8e" width="40%">
   <img alt="UnCoVar Logo dark/light">
 </picture>
 
+<h1>
+Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction
+ and Lineage Assignment
+</h1>
 
-## SARS-CoV-2 Variant Calling and Lineage Assignment
-
-[![Snakemake](https://img.shields.io/badge/snakemake-≥6.3.0-brightgreen.svg)](https://snakemake.bitbucket.io)
+[![Snakemake](https://img.shields.io/badge/snakemake-≥7.0-brightgreen.svg)](https://snakemake.bitbucket.io)
 [![GitHub actions status](https://github.com/koesterlab/snakemake-workflow-sars-cov2/workflows/Tests/badge.svg?branch=master)](https://github.com/koesterlab/snakemake-workflow-sars-cov2/actions?query=branch%3Amaster+workflow%3ATests)
 [![Docker Repository on Quay](https://quay.io/repository/uncovar/uncovar/status)](https://quay.io/repository/uncovar/uncovar)
 
-A reproducible and scalable workflow for transparent and robust SARS-CoV-2
-variant calling and lineage assignment with comprehensive reporting.
-
 ## Usage
 
-This workflow is written with snakemake and its usage is described in the
+<details>
+  <Summary><b>Step 1: Install Snakemake and Snakedeploy</b></Summary>
+
+Snakemake and Snakedeploy are best installed via the [Mamba package manager](https://github.com/mamba-org/mamba)
+ (a drop-in replacement for conda). If you have neither Conda nor Mamba, it can
+  be installed via [Mambaforge](https://github.com/conda-forge/miniforge#mambaforge).
+  For other options see [here](https://github.com/mamba-org/mamba).
+
+Given that Mamba is installed, run
+
+    mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
+
+to install both Snakemake and Snakedeploy in an isolated environment. For all
+ following commands ensure that this environment is activated via
+
+    conda activate snakemake
+</details>
+
+<details>
+  <Summary><b>Step 2: Clone or Deploy workflow</b></Summary>
+
+First, create an appropriate project working directory on your system and enter it:
+
+    mkdir -p path/to/project-workdir
+    cd path/to/project-workdir
+
+In all following steps, we will assume that you are inside of that directory.
+Second, run
+
+Given that Snakemake is installed and you want to clone the full workflow you can
+ do it as follows:
+
+    git clone https://github.com/IKIM-Essen/uncovar
+
+Given that Snakemake and Snakedeploy are installed and available (see Step 1),
+ the workflow can be deployed as follows:
+
+    snakedeploy deploy-workflow https://github.com/IKIM-Essen/uncovar . --tag v0.16.0
+
+Snakedeploy will create two folders `workflow` and `config`. The former contains
+ the deployment of the UnCoVar workflow as a
+  [Snakemake module](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#using-and-combining-pre-exising-workflows),
+  the latter contains configuration files which will be modified in the next step
+  in order to configure the workflow to your needs. Later, when executing the workflow,
+  Snakemake will automatically find the main Snakefile in the workflow subfolder.
+
+</details>
+
+<details>
+  <Summary><b>Step 3: Configure workflow</b></Summary>
+
+### General settings
+
+#### Config file
+
+To configure this workflow, modify `config/config.yaml` according to your
+ needs, following the explanations provided in the file. It is especially recommended
+ to provide a `BED` file with primer coordinates, when the analyzed reads derive
+ from amplicon-tiled sequencing, so the primers are trimmed appropriately.
+
+#### Sample sheet
+
+The sample sheet contains all samples to be analyzed by UnCoVar.
+
+#### Auto filling
+
+UnCoVar offers the possibility to automatically append paired-end sequenced
+ samples to the sample sheet. To load your data into the workflow execute
+
+    snakemake --cores all --use-conda update_sample
+
+with the root of the UnCoVar as working directory. It is recommended to use
+the following structure to when adding data automatically:
+
+    ├── archive
+    ├── incoming
+    └── snakemake-workflow-sars-cov2
+        ├── data
+            └── 2023-12-24
+
+However, this structure is not set in stone and can be adjusted via the
+`config/config.yaml` file under `data-handling`. Only the following path to the
+corresponding folders, relative to the directory of UnCoVar are needed:
+
+- **incoming**: path of incoming data, which is moved to the data directory by
+  the preprocessing script. Defaults to `../incoming/`.
+- **data**: path to store data within the workflow. defaults to `data/`. It is
+ recommend using subfolders named properly (e.g. with date)
+- **archive**: path to archive data from the results from the analysis to.
+  Defaults to `../archive/`.
+
+The incoming directory should contain paired end reads in (compressed) FASTQ
+format. UnCoVar automatically copies your data into the data directory and moves
+all files from incoming directory to the archive. After the analysis, all results
+are compressed and saved alongside the reads.
+
+Moreover, the sample sheet is automatically updated with the new files. Please
+ note, that only the part of the filename before the first '\_' character is used
+ as the sample name within the workflow. Technology and amplicon flag (**is_amplicon_data**)
+ have to be revisited manually
+
+#### Manual filling
+
+Of course, samples to be analyzed can also be added manually to the sample sheet.
+For each sample, the a new line in `config/pep/samples.csv` with the following
+content has to be defined:
+
+- **sample_name**: name or identifier of sample
+- **fq1**: path to read 1 in FASTQ format
+- **fq2**: path to read 2 in FASTQ format (if paired end sequencing)
+- **date**: sampling date of the sample
+- **is_amplicon_data**: indicates whether the data was generated with a
+  shotgun (0) or amplicon (1) sequencing
+- **technology**: indicates the sequencing technology used to generate
+  the samples (illumina, ont, ion)
+- **include_in_high_genome_summary**: indicates if sample should be included in the submission files (1) or not (0)
+
+</details>
+
+<details>
+  <Summary><b>Step 4: Run workflow</b></Summary>
+Given that the workflow has been properly deployed and configured, it can be executed as follows.
+
+Fow running the workflow while deploying any necessary software via conda (using
+ the Mamba package manager by default), run Snakemake with
+
+    snakemake --cores all --use-conda 
+Snakemake will automatically detect the main Snakefile in the workflow subfolder
+ and execute the workflow module that has been defined by the deployment in step 2.
+
+For further options, e.g. for cluster and cloud execution, see the docs.
+</details>
+
+This workflow is written with Snakemake and details and tools are described in the
 [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog?usage=IKIM-Essen/uncovar).
 
 If you use this workflow in a paper, don't forget to give credits to the
 authors by citing the URL of this repository and its DOI (see above).
-
-## Tools, Frameworks and Packages used
-
-This project wouldn't be possible without several open source libraries:
-
-| Tool           | Link                                              |
-| -------------- | ------------------------------------------------- |
-| ABySS          | www.doi.org/10.1101/gr.214346.116                 |
-| Altair         | www.doi.org/10.21105/joss.01057                   |
-| BAMClipper     | www.doi.org/10.1038/s41598-017-01703-6            |
-| BCFtools       | www.doi.org/10.1093/gigascience/giab008           |
-| BEDTools       | www.doi.org/10.1093/bioinformatics/btq033         |
-| Biopython      | www.doi.org/10.1093/bioinformatics/btp163         |
-| bwa            | www.doi.org/10.1093/bioinformatics/btp324         |
-| Covariants     | www.github.com/hodcroftlab/covariants             |
-| delly          | www.doi.org/10.1093/bioinformatics/bts378         |
-| ensembl-vep    | www.doi.org/10.1186/s13059-016-0974-4             |
-| entrez-direct  | www.ncbi.nlm.nih.gov/books/NBK179288              |
-| fastp          | www.doi.org/10.1093/bioinformatics/bty560         |
-| FastQC         | www.bioinformatics.babraham.ac.uk/projects/fastqc |
-| fgbio          | www.github.com/fulcrum-genomics/fgbio             |
-| FreeBayes      | www.arxiv.org/abs/1207.3907                       |
-| intervaltree   | www.github.com/chaimleib/intervaltree             |
-| Jupyter        | www.jupyter.org                                   |
-| kallisto       | www.doi.org/10.1038/nbt.3519                      |
-| Kraken2        | www.doi.org/10.1186/s13059-019-1891-0             |
-| Krona          | www.doi.org/10.1186/1471-2105-12-385              |
-| mason          | www.<http://publications.imp.fu-berlin.de/962>    |
-| MEGAHIT        | www.doi.org/10.1093/bioinformatics/btv033         |
-| Minimap2       | www.doi.org/10.1093/bioinformatics/bty191         |
-| MultiQC        | www.doi.org/10.1093/bioinformatics/btw354         |
-| pandas         | pandas.pydata.org                                 |
-| Picard         | broadinstitute.github.io/picard                   |
-| PySAM          | www.doi.org/10.11578/dc.20190903.1                |
-| QUAST          | www.doi.org/10.1093/bioinformatics/btt086         |
-| RaGOO          | www.doi.org/10.1186/s13059-019-1829-6             |
-| ruamel.yaml    | www.sourceforge.net/projects/ruamel-yaml          |
-| Rust-Bio-Tools | www.github.com/rust-bio/rust-bio-tools            |
-| SAMtools       | www.doi.org/10.1093/bioinformatics/btp352         |
-| Snakemake      | www.doi.org/10.12688/f1000research.29032.1        |
-| sourmash       | www.doi.org/10.21105/joss.00027                   |
-| SPAdes         | www.doi.org/10.1089/cmb.2012.0021                 |
-| SVN            | www.doi.org/10.1142/s0219720005001028             |
-| Tabix          | www.doi.org/10.1093/bioinformatics/btq671         |
-| Trinity        | www.doi.org/10.1038/nprot.2013.084                |
-| Varlociraptor  | www.doi.org/10.1186/s13059-020-01993-6            |
-| Vega-Lite      | www.doi.org/10.1109/TVCG.2016.2599030             |
-| Velvet         | www.doi.org/10.1101/gr.074492.107                 |
-| vembrane       | www.github.com/vembrane/vembrane                  |