This pipeline is used to generate MAGs from different individual assemblies and their respective reads for the VanishingGlaciers project. The pipeline is based on the Snakemake workflow management system and is designed to be run on a high-performance computing cluster.
- This pipeline starts with different individual assemblies (
fasta
files) and their respective reads (mg.r{1,2}.preprocessed.fq
files). - To reduce computational time, the reads are subsampled to 10% reads per sample and the contigs less than 1.5 kbp are removed.
- The subsampled reads are then mapped against the assemblies using BWA.
- The mapped reads are then used to bin the contigs using MetaBAT2, CONCOCT and MetaBinner.
- The bins are then optimized using DAS_Tool.
- CheckM2 is used to estimate the quality of the bins and only the ones that are 50% complete are kept.
- MDMCleaner reduces contamination from those bins.
- Next, bins are dereplicated with dRep to form MAGs and only bins with >70% completeness and < 10% contamination are kept.
- Read mapping against all the MAGs is done using BWA.
- And GtdbTk is used for the taxonomy.
- MGThermometer is used to measure the
optimal growth rate
based on the relative abundance ofFIVYWREL
aminoacids- Optimal growth rate is measured as follows,
# install miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh # follow the instructions
Getting the repository including sub-modules
git clone --recurse-submodules [email protected]:michoug/SnakemakeBinning.git
git checkout busi
Create the main snakemake
environment
# create venv
conda env create -f requirements.yaml -n "snakemake"
- Place your preprocessed/trim reads (e.g.
sample_r1.fastq.gz
andsample_r2.fastq.gz
files) in areads
folder - Place the individual assemblies (e.g.
sample.fa
) into anassembly
folder - Modify the
config/config.yaml
file to change the different paths and eventually the different options - Modify the
config/all_samples.txt
file to include your samples
snakemake -s workflow/Snakefile --configfile config/config.yaml --cores 28 --use-conda -rp
This part was mainly taken from @susheelbhanu nomis_pipeline
- Modify the
slurm.yaml
file by checkingpartition
,qos
andaccount
that heavily depends on your system - Modify the
sbatch.sh
file by checking#SBATCH -p
,#SBATCH --qos=
and#SBATCH -A
options that heavily depends on your system
sbatch config/sbatch.sh