Skip to content

Commit

Permalink
refactor: update README.md (#620)
Browse files Browse the repository at this point in the history
* update README.md

* change

* change

* fix

* fix

* fix
  • Loading branch information
alethomas authored Dec 21, 2023
1 parent 77f7960 commit b521e42
Showing 1 changed file with 141 additions and 59 deletions.
200 changes: 141 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,75 +1,157 @@
# UnCoVar


<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/8e17c6fc-ff7a-4c25-afc9-7888036d693e">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/c99f5a94-749b-422e-b319-1e3700d40a8e">
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/8e17c6fc-ff7a-4c25-afc9-7888036d693e" width="40%">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/c99f5a94-749b-422e-b319-1e3700d40a8e" width="40%">
<img alt="UnCoVar Logo dark/light">
</picture>

<h1>
Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction
and Lineage Assignment
</h1>

## SARS-CoV-2 Variant Calling and Lineage Assignment

[![Snakemake](https://img.shields.io/badge/snakemake-≥6.3.0-brightgreen.svg)](https://snakemake.bitbucket.io)
[![Snakemake](https://img.shields.io/badge/snakemake-≥7.0-brightgreen.svg)](https://snakemake.bitbucket.io)
[![GitHub actions status](https://github.com/koesterlab/snakemake-workflow-sars-cov2/workflows/Tests/badge.svg?branch=master)](https://github.com/koesterlab/snakemake-workflow-sars-cov2/actions?query=branch%3Amaster+workflow%3ATests)
[![Docker Repository on Quay](https://quay.io/repository/uncovar/uncovar/status)](https://quay.io/repository/uncovar/uncovar)

A reproducible and scalable workflow for transparent and robust SARS-CoV-2
variant calling and lineage assignment with comprehensive reporting.

## Usage

This workflow is written with snakemake and its usage is described in the
<details>
<Summary><b>Step 1: Install Snakemake and Snakedeploy</b></Summary>

Snakemake and Snakedeploy are best installed via the [Mamba package manager](https://github.com/mamba-org/mamba)
(a drop-in replacement for conda). If you have neither Conda nor Mamba, it can
be installed via [Mambaforge](https://github.com/conda-forge/miniforge#mambaforge).
For other options see [here](https://github.com/mamba-org/mamba).

Given that Mamba is installed, run

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

to install both Snakemake and Snakedeploy in an isolated environment. For all
following commands ensure that this environment is activated via

conda activate snakemake
</details>

<details>
<Summary><b>Step 2: Clone or Deploy workflow</b></Summary>

First, create an appropriate project working directory on your system and enter it:

mkdir -p path/to/project-workdir
cd path/to/project-workdir

In all following steps, we will assume that you are inside of that directory.
Second, run

Given that Snakemake is installed and you want to clone the full workflow you can
do it as follows:

git clone https://github.com/IKIM-Essen/uncovar

Given that Snakemake and Snakedeploy are installed and available (see Step 1),
the workflow can be deployed as follows:

snakedeploy deploy-workflow https://github.com/IKIM-Essen/uncovar . --tag v0.16.0

Snakedeploy will create two folders `workflow` and `config`. The former contains
the deployment of the UnCoVar workflow as a
[Snakemake module](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#using-and-combining-pre-exising-workflows),
the latter contains configuration files which will be modified in the next step
in order to configure the workflow to your needs. Later, when executing the workflow,
Snakemake will automatically find the main Snakefile in the workflow subfolder.

</details>

<details>
<Summary><b>Step 3: Configure workflow</b></Summary>

### General settings

#### Config file

To configure this workflow, modify `config/config.yaml` according to your
needs, following the explanations provided in the file. It is especially recommended
to provide a `BED` file with primer coordinates, when the analyzed reads derive
from amplicon-tiled sequencing, so the primers are trimmed appropriately.

#### Sample sheet

The sample sheet contains all samples to be analyzed by UnCoVar.

#### Auto filling

UnCoVar offers the possibility to automatically append paired-end sequenced
samples to the sample sheet. To load your data into the workflow execute

snakemake --cores all --use-conda update_sample

with the root of the UnCoVar as working directory. It is recommended to use
the following structure to when adding data automatically:

├── archive
├── incoming
└── snakemake-workflow-sars-cov2
├── data
└── 2023-12-24

However, this structure is not set in stone and can be adjusted via the
`config/config.yaml` file under `data-handling`. Only the following path to the
corresponding folders, relative to the directory of UnCoVar are needed:

- **incoming**: path of incoming data, which is moved to the data directory by
the preprocessing script. Defaults to `../incoming/`.
- **data**: path to store data within the workflow. defaults to `data/`. It is
recommend using subfolders named properly (e.g. with date)
- **archive**: path to archive data from the results from the analysis to.
Defaults to `../archive/`.

The incoming directory should contain paired end reads in (compressed) FASTQ
format. UnCoVar automatically copies your data into the data directory and moves
all files from incoming directory to the archive. After the analysis, all results
are compressed and saved alongside the reads.

Moreover, the sample sheet is automatically updated with the new files. Please
note, that only the part of the filename before the first '\_' character is used
as the sample name within the workflow. Technology and amplicon flag (**is_amplicon_data**)
have to be revisited manually

#### Manual filling

Of course, samples to be analyzed can also be added manually to the sample sheet.
For each sample, the a new line in `config/pep/samples.csv` with the following
content has to be defined:

- **sample_name**: name or identifier of sample
- **fq1**: path to read 1 in FASTQ format
- **fq2**: path to read 2 in FASTQ format (if paired end sequencing)
- **date**: sampling date of the sample
- **is_amplicon_data**: indicates whether the data was generated with a
shotgun (0) or amplicon (1) sequencing
- **technology**: indicates the sequencing technology used to generate
the samples (illumina, ont, ion)
- **include_in_high_genome_summary**: indicates if sample should be included in the submission files (1) or not (0)

</details>

<details>
<Summary><b>Step 4: Run workflow</b></Summary>
Given that the workflow has been properly deployed and configured, it can be executed as follows.

Fow running the workflow while deploying any necessary software via conda (using
the Mamba package manager by default), run Snakemake with

snakemake --cores all --use-conda
Snakemake will automatically detect the main Snakefile in the workflow subfolder
and execute the workflow module that has been defined by the deployment in step 2.

For further options, e.g. for cluster and cloud execution, see the docs.
</details>

This workflow is written with Snakemake and details and tools are described in the
[Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog?usage=IKIM-Essen/uncovar).

If you use this workflow in a paper, don't forget to give credits to the
authors by citing the URL of this repository and its DOI (see above).

## Tools, Frameworks and Packages used

This project wouldn't be possible without several open source libraries:

| Tool | Link |
| -------------- | ------------------------------------------------- |
| ABySS | www.doi.org/10.1101/gr.214346.116 |
| Altair | www.doi.org/10.21105/joss.01057 |
| BAMClipper | www.doi.org/10.1038/s41598-017-01703-6 |
| BCFtools | www.doi.org/10.1093/gigascience/giab008 |
| BEDTools | www.doi.org/10.1093/bioinformatics/btq033 |
| Biopython | www.doi.org/10.1093/bioinformatics/btp163 |
| bwa | www.doi.org/10.1093/bioinformatics/btp324 |
| Covariants | www.github.com/hodcroftlab/covariants |
| delly | www.doi.org/10.1093/bioinformatics/bts378 |
| ensembl-vep | www.doi.org/10.1186/s13059-016-0974-4 |
| entrez-direct | www.ncbi.nlm.nih.gov/books/NBK179288 |
| fastp | www.doi.org/10.1093/bioinformatics/bty560 |
| FastQC | www.bioinformatics.babraham.ac.uk/projects/fastqc |
| fgbio | www.github.com/fulcrum-genomics/fgbio |
| FreeBayes | www.arxiv.org/abs/1207.3907 |
| intervaltree | www.github.com/chaimleib/intervaltree |
| Jupyter | www.jupyter.org |
| kallisto | www.doi.org/10.1038/nbt.3519 |
| Kraken2 | www.doi.org/10.1186/s13059-019-1891-0 |
| Krona | www.doi.org/10.1186/1471-2105-12-385 |
| mason | www.<http://publications.imp.fu-berlin.de/962> |
| MEGAHIT | www.doi.org/10.1093/bioinformatics/btv033 |
| Minimap2 | www.doi.org/10.1093/bioinformatics/bty191 |
| MultiQC | www.doi.org/10.1093/bioinformatics/btw354 |
| pandas | pandas.pydata.org |
| Picard | broadinstitute.github.io/picard |
| PySAM | www.doi.org/10.11578/dc.20190903.1 |
| QUAST | www.doi.org/10.1093/bioinformatics/btt086 |
| RaGOO | www.doi.org/10.1186/s13059-019-1829-6 |
| ruamel.yaml | www.sourceforge.net/projects/ruamel-yaml |
| Rust-Bio-Tools | www.github.com/rust-bio/rust-bio-tools |
| SAMtools | www.doi.org/10.1093/bioinformatics/btp352 |
| Snakemake | www.doi.org/10.12688/f1000research.29032.1 |
| sourmash | www.doi.org/10.21105/joss.00027 |
| SPAdes | www.doi.org/10.1089/cmb.2012.0021 |
| SVN | www.doi.org/10.1142/s0219720005001028 |
| Tabix | www.doi.org/10.1093/bioinformatics/btq671 |
| Trinity | www.doi.org/10.1038/nprot.2013.084 |
| Varlociraptor | www.doi.org/10.1186/s13059-020-01993-6 |
| Vega-Lite | www.doi.org/10.1109/TVCG.2016.2599030 |
| Velvet | www.doi.org/10.1101/gr.074492.107 |
| vembrane | www.github.com/vembrane/vembrane |

0 comments on commit b521e42

Please sign in to comment.