-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* update README.md * change * change * fix * fix * fix
- Loading branch information
Showing
1 changed file
with
141 additions
and
59 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,75 +1,157 @@ | ||
# UnCoVar | ||
|
||
|
||
<picture> | ||
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/8e17c6fc-ff7a-4c25-afc9-7888036d693e"> | ||
<source media="(prefers-color-scheme: light)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/c99f5a94-749b-422e-b319-1e3700d40a8e"> | ||
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/8e17c6fc-ff7a-4c25-afc9-7888036d693e" width="40%"> | ||
<source media="(prefers-color-scheme: light)" srcset="https://github.com/IKIM-Essen/uncovar/assets/77535027/c99f5a94-749b-422e-b319-1e3700d40a8e" width="40%"> | ||
<img alt="UnCoVar Logo dark/light"> | ||
</picture> | ||
|
||
<h1> | ||
Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction | ||
and Lineage Assignment | ||
</h1> | ||
|
||
## SARS-CoV-2 Variant Calling and Lineage Assignment | ||
|
||
[![Snakemake](https://img.shields.io/badge/snakemake-≥6.3.0-brightgreen.svg)](https://snakemake.bitbucket.io) | ||
[![Snakemake](https://img.shields.io/badge/snakemake-≥7.0-brightgreen.svg)](https://snakemake.bitbucket.io) | ||
[![GitHub actions status](https://github.com/koesterlab/snakemake-workflow-sars-cov2/workflows/Tests/badge.svg?branch=master)](https://github.com/koesterlab/snakemake-workflow-sars-cov2/actions?query=branch%3Amaster+workflow%3ATests) | ||
[![Docker Repository on Quay](https://quay.io/repository/uncovar/uncovar/status)](https://quay.io/repository/uncovar/uncovar) | ||
|
||
A reproducible and scalable workflow for transparent and robust SARS-CoV-2 | ||
variant calling and lineage assignment with comprehensive reporting. | ||
|
||
## Usage | ||
|
||
This workflow is written with snakemake and its usage is described in the | ||
<details> | ||
<Summary><b>Step 1: Install Snakemake and Snakedeploy</b></Summary> | ||
|
||
Snakemake and Snakedeploy are best installed via the [Mamba package manager](https://github.com/mamba-org/mamba) | ||
(a drop-in replacement for conda). If you have neither Conda nor Mamba, it can | ||
be installed via [Mambaforge](https://github.com/conda-forge/miniforge#mambaforge). | ||
For other options see [here](https://github.com/mamba-org/mamba). | ||
|
||
Given that Mamba is installed, run | ||
|
||
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy | ||
|
||
to install both Snakemake and Snakedeploy in an isolated environment. For all | ||
following commands ensure that this environment is activated via | ||
|
||
conda activate snakemake | ||
</details> | ||
|
||
<details> | ||
<Summary><b>Step 2: Clone or Deploy workflow</b></Summary> | ||
|
||
First, create an appropriate project working directory on your system and enter it: | ||
|
||
mkdir -p path/to/project-workdir | ||
cd path/to/project-workdir | ||
|
||
In all following steps, we will assume that you are inside of that directory. | ||
Second, run | ||
|
||
Given that Snakemake is installed and you want to clone the full workflow you can | ||
do it as follows: | ||
|
||
git clone https://github.com/IKIM-Essen/uncovar | ||
|
||
Given that Snakemake and Snakedeploy are installed and available (see Step 1), | ||
the workflow can be deployed as follows: | ||
|
||
snakedeploy deploy-workflow https://github.com/IKIM-Essen/uncovar . --tag v0.16.0 | ||
|
||
Snakedeploy will create two folders `workflow` and `config`. The former contains | ||
the deployment of the UnCoVar workflow as a | ||
[Snakemake module](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#using-and-combining-pre-exising-workflows), | ||
the latter contains configuration files which will be modified in the next step | ||
in order to configure the workflow to your needs. Later, when executing the workflow, | ||
Snakemake will automatically find the main Snakefile in the workflow subfolder. | ||
|
||
</details> | ||
|
||
<details> | ||
<Summary><b>Step 3: Configure workflow</b></Summary> | ||
|
||
### General settings | ||
|
||
#### Config file | ||
|
||
To configure this workflow, modify `config/config.yaml` according to your | ||
needs, following the explanations provided in the file. It is especially recommended | ||
to provide a `BED` file with primer coordinates, when the analyzed reads derive | ||
from amplicon-tiled sequencing, so the primers are trimmed appropriately. | ||
|
||
#### Sample sheet | ||
|
||
The sample sheet contains all samples to be analyzed by UnCoVar. | ||
|
||
#### Auto filling | ||
|
||
UnCoVar offers the possibility to automatically append paired-end sequenced | ||
samples to the sample sheet. To load your data into the workflow execute | ||
|
||
snakemake --cores all --use-conda update_sample | ||
|
||
with the root of the UnCoVar as working directory. It is recommended to use | ||
the following structure to when adding data automatically: | ||
|
||
├── archive | ||
├── incoming | ||
└── snakemake-workflow-sars-cov2 | ||
├── data | ||
└── 2023-12-24 | ||
|
||
However, this structure is not set in stone and can be adjusted via the | ||
`config/config.yaml` file under `data-handling`. Only the following path to the | ||
corresponding folders, relative to the directory of UnCoVar are needed: | ||
|
||
- **incoming**: path of incoming data, which is moved to the data directory by | ||
the preprocessing script. Defaults to `../incoming/`. | ||
- **data**: path to store data within the workflow. defaults to `data/`. It is | ||
recommend using subfolders named properly (e.g. with date) | ||
- **archive**: path to archive data from the results from the analysis to. | ||
Defaults to `../archive/`. | ||
|
||
The incoming directory should contain paired end reads in (compressed) FASTQ | ||
format. UnCoVar automatically copies your data into the data directory and moves | ||
all files from incoming directory to the archive. After the analysis, all results | ||
are compressed and saved alongside the reads. | ||
|
||
Moreover, the sample sheet is automatically updated with the new files. Please | ||
note, that only the part of the filename before the first '\_' character is used | ||
as the sample name within the workflow. Technology and amplicon flag (**is_amplicon_data**) | ||
have to be revisited manually | ||
|
||
#### Manual filling | ||
|
||
Of course, samples to be analyzed can also be added manually to the sample sheet. | ||
For each sample, the a new line in `config/pep/samples.csv` with the following | ||
content has to be defined: | ||
|
||
- **sample_name**: name or identifier of sample | ||
- **fq1**: path to read 1 in FASTQ format | ||
- **fq2**: path to read 2 in FASTQ format (if paired end sequencing) | ||
- **date**: sampling date of the sample | ||
- **is_amplicon_data**: indicates whether the data was generated with a | ||
shotgun (0) or amplicon (1) sequencing | ||
- **technology**: indicates the sequencing technology used to generate | ||
the samples (illumina, ont, ion) | ||
- **include_in_high_genome_summary**: indicates if sample should be included in the submission files (1) or not (0) | ||
|
||
</details> | ||
|
||
<details> | ||
<Summary><b>Step 4: Run workflow</b></Summary> | ||
Given that the workflow has been properly deployed and configured, it can be executed as follows. | ||
|
||
Fow running the workflow while deploying any necessary software via conda (using | ||
the Mamba package manager by default), run Snakemake with | ||
|
||
snakemake --cores all --use-conda | ||
Snakemake will automatically detect the main Snakefile in the workflow subfolder | ||
and execute the workflow module that has been defined by the deployment in step 2. | ||
|
||
For further options, e.g. for cluster and cloud execution, see the docs. | ||
</details> | ||
|
||
This workflow is written with Snakemake and details and tools are described in the | ||
[Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog?usage=IKIM-Essen/uncovar). | ||
|
||
If you use this workflow in a paper, don't forget to give credits to the | ||
authors by citing the URL of this repository and its DOI (see above). | ||
|
||
## Tools, Frameworks and Packages used | ||
|
||
This project wouldn't be possible without several open source libraries: | ||
|
||
| Tool | Link | | ||
| -------------- | ------------------------------------------------- | | ||
| ABySS | www.doi.org/10.1101/gr.214346.116 | | ||
| Altair | www.doi.org/10.21105/joss.01057 | | ||
| BAMClipper | www.doi.org/10.1038/s41598-017-01703-6 | | ||
| BCFtools | www.doi.org/10.1093/gigascience/giab008 | | ||
| BEDTools | www.doi.org/10.1093/bioinformatics/btq033 | | ||
| Biopython | www.doi.org/10.1093/bioinformatics/btp163 | | ||
| bwa | www.doi.org/10.1093/bioinformatics/btp324 | | ||
| Covariants | www.github.com/hodcroftlab/covariants | | ||
| delly | www.doi.org/10.1093/bioinformatics/bts378 | | ||
| ensembl-vep | www.doi.org/10.1186/s13059-016-0974-4 | | ||
| entrez-direct | www.ncbi.nlm.nih.gov/books/NBK179288 | | ||
| fastp | www.doi.org/10.1093/bioinformatics/bty560 | | ||
| FastQC | www.bioinformatics.babraham.ac.uk/projects/fastqc | | ||
| fgbio | www.github.com/fulcrum-genomics/fgbio | | ||
| FreeBayes | www.arxiv.org/abs/1207.3907 | | ||
| intervaltree | www.github.com/chaimleib/intervaltree | | ||
| Jupyter | www.jupyter.org | | ||
| kallisto | www.doi.org/10.1038/nbt.3519 | | ||
| Kraken2 | www.doi.org/10.1186/s13059-019-1891-0 | | ||
| Krona | www.doi.org/10.1186/1471-2105-12-385 | | ||
| mason | www.<http://publications.imp.fu-berlin.de/962> | | ||
| MEGAHIT | www.doi.org/10.1093/bioinformatics/btv033 | | ||
| Minimap2 | www.doi.org/10.1093/bioinformatics/bty191 | | ||
| MultiQC | www.doi.org/10.1093/bioinformatics/btw354 | | ||
| pandas | pandas.pydata.org | | ||
| Picard | broadinstitute.github.io/picard | | ||
| PySAM | www.doi.org/10.11578/dc.20190903.1 | | ||
| QUAST | www.doi.org/10.1093/bioinformatics/btt086 | | ||
| RaGOO | www.doi.org/10.1186/s13059-019-1829-6 | | ||
| ruamel.yaml | www.sourceforge.net/projects/ruamel-yaml | | ||
| Rust-Bio-Tools | www.github.com/rust-bio/rust-bio-tools | | ||
| SAMtools | www.doi.org/10.1093/bioinformatics/btp352 | | ||
| Snakemake | www.doi.org/10.12688/f1000research.29032.1 | | ||
| sourmash | www.doi.org/10.21105/joss.00027 | | ||
| SPAdes | www.doi.org/10.1089/cmb.2012.0021 | | ||
| SVN | www.doi.org/10.1142/s0219720005001028 | | ||
| Tabix | www.doi.org/10.1093/bioinformatics/btq671 | | ||
| Trinity | www.doi.org/10.1038/nprot.2013.084 | | ||
| Varlociraptor | www.doi.org/10.1186/s13059-020-01993-6 | | ||
| Vega-Lite | www.doi.org/10.1109/TVCG.2016.2599030 | | ||
| Velvet | www.doi.org/10.1101/gr.074492.107 | | ||
| vembrane | www.github.com/vembrane/vembrane | |