Home

To add when have wifi:

add link to MURI website
add link to I'm using new primers
add picture!
add cutadapt link
link to how to name files
turn flag descriptions into table
link to dada2 page
add link to run the pipeline
update Table of Contents

Welcome to the metabarcoding_QAQC_pipeline wiki!

This pipeline was created for the MURI project to process and do preliminary analysis on eDNA metabarcoding data that was produced using an Illumina MiSeq. This pipeline can be used to process metabarcoding data not affiliated with the project, but will need some tweaking. Check out the guide to using this pipeline on primers other an Mifish, Dloop, and C16/C18.

Overview of Metabarcoding Pipeline:

photo of metadata pipeline

Primer Trimming

We start by trimming off primers, barcodes, and illumina adapters using the Cutadapt software. The primer sequences are read in from the primer_data.csv file. For each file, we detect what primer was used using the file name (see further documentation here), and then use the following command to trim primers using Cutadapt:

${CUTADAPT} -g ${MFU_F} \
     -G "${MFU_R}" \
     -o ../for_dada2/${R1} \
     -p ../for_dada2/${R2} \
    --discard-untrimmed \
    -j 0 \
"${R1}" "${R2}" 1> "../cutadapt_reports/${FILE_NAME}_trim_report.txt"

-g sequence of the forward primer -G sequence of the reverse primer -o output location of forward fastq file -p output location of reverse fastq file --discard-untrimmed discard sequences where the primer is not found -j 0 idk bro i think its parallelized

Taxonomic Assignment

We use the DADA2 software to assign taxonomy to amplicon sequences. However, some processing is necassary before we can assign taxonomy. A brief overview of the process is as follows:

Determine quality trimming length
Filter and trim for quality and length
Dereplicate
Learn error rates
Sample inference
Merge paired reads
Remove chimeras
Filter merged reads by size
Assign Taxonomy
Create and Save Output Files

For a more detailed explanation of these steps, check out this page dedicated to this script.

Preliminary Analysis

This is a quarto file that will take in the output files from DADA2 and create plots and statistics regarding read retention, read lengths, quality, and more.

Running the Pipeline

Step 1: Pull the Repository

Before running this metabarcoding pipeline, it's important to make sure you have the most updated version of scripts and (more importantly) found ASV databases. To do this, pull the most recent version of the github repo*:

# run from the cloned github repository
git pull

*Remember that GitHub/Git performs version control. What this means for you is that if you make changes to these files on your local machine and then try to git pull again, you will receive a warning or error message, because your files have changes that have not been committed to Git. The easiest way around this is to not change the files you pull from this repo.

Step 2: Create File System/Update Files

Below is the file system that this pipeline assumes:

photo of filesystem

To create this filesystem on your computer, run the following line from the repo directory:

sh config.sh {path where to create muri_metabarcoding}

This will create a directory called muri_metabarcoding in the given directory, and the filesystem will be created inside that directory. For example, if I wanted to create these files on my desktop, I would run:

sh config.sh ~/Desktop/

Now, the filesystem is now found at ~/Desktop/muri_metabarcoding.

If you already have the filesystem on your system, the following command will copy over the updated files from the github repo to the filesytem

cp ./bin/* {pathway to muri_metabarcoding}/scripts
cp -r ./metadata/* {pathway to muri_metabarcoding}/metadata

For example:

cp ./bin/*  ~/Desktop/muri_metabarcoding/scripts
cp -r ./metadata/* ~/Desktop/muri_metabarcoding/metadata

Step 3: Move Raw Fastq's into raw_fastq

Copy your raw fastq's into the raw_fastq directory. A command that would do this would look like:

cp /path/to/fastqs/* /path/to/raw_fastqs

Then, copy your Miseq Sample Sheet (which is produced after Illumina sequencing) to the metadata file.

cp /path/to/SampleSheetUsed.csv /path/to/muri_metabarcoding/metadata

Step 4: Run metabarcoding_wrapper.sh

The metabarcoding wrapper takes 2 inputs:

path to your file system (named muri_metabarcoding)
run name (can be any name)

To run this script, use the following command:

bash metabarcoding_wrapper.sh {pathway to muri_metabarcoding} {run name}

Example:

bash metabarcoding_wrapper.sh ~/Desktop/muri_metabarcoding/ MURI304

Step 5: Push the updated ASV databases

To keep the ASV databases updated, push your updated databases to the github repository.

If you're not an admin for the repo skip this step!

# RUN FROM CLONED REPOSITORY
# copy the updated ASV databases into the repository
cp {pathway to muri_metabarcoding}/metadata/known_hashes/* ./data/known_hashes/

# commit and push updated databases
git add ./data/known_hashes/
git commit -m "update known_hashes $(date +"%T")"
git push

Provide feedback

Saved searches

Use saved searches to filter your results more quickly