-
Notifications
You must be signed in to change notification settings - Fork 3
Home
- add link to MURI website
- add link to I'm using new primers
- add picture!
- add cutadapt link
- link to how to name files
- turn flag descriptions into table
- link to dada2 page
- add link to run the pipeline
- update Table of Contents
Welcome to the metabarcoding_QAQC_pipeline wiki!
This pipeline was created for the MURI project to process and do preliminary analysis on eDNA metabarcoding data that was produced using an Illumina MiSeq. This pipeline can be used to process metabarcoding data not affiliated with the project, but will need some tweaking. Check out the guide to using this pipeline on primers other an Mifish, Dloop, and C16/C18.
Dependencies
Preparing Your Data
Docker/Singularity Image
We start by trimming off primers, barcodes, and illumina adapters using the Cutadapt software. The primer sequences are read in from the primer_data.csv
file. For each file, we detect what primer was used using the file name (see further documentation here), and then use the following command to trim primers using Cutadapt:
${CUTADAPT} -g ${MFU_F} \
-G "${MFU_R}" \
-o ../for_dada2/${R1} \
-p ../for_dada2/${R2} \
--discard-untrimmed \
-j 0 \
"${R1}" "${R2}" 1> "../cutadapt_reports/${FILE_NAME}_trim_report.txt"
-g sequence of the forward primer -G sequence of the reverse primer -o output location of forward fastq file -p output location of reverse fastq file --discard-untrimmed discard sequences where the primer is not found -j 0 idk bro i think its parallelized
We use the DADA2 software to assign taxonomy to amplicon sequences. However, some processing is necassary before we can assign taxonomy. A brief overview of the process is as follows:
- Determine quality trimming length
- Filter and trim for quality and length
- Dereplicate
- Learn error rates
- Sample inference
- Merge paired reads
- Remove chimeras
- Filter merged reads by size
- Assign Taxonomy
- Create and Save Output Files
For a more detailed explanation of these steps, check out this page dedicated to this script.
This is a quarto file that will take in the output files from DADA2 and create plots and statistics regarding read retention, read lengths, quality, and more.
Before running this metabarcoding pipeline, it's important to make sure you have the most updated version of scripts and (more importantly) found ASV databases. To do this, pull the most recent version of the github repo*:
# run from the cloned github repository
git pull
*Remember that GitHub/Git performs version control. What this means for you is that if you make changes to these files on your local machine and then try to git pull
again, you will receive a warning or error message, because your files have changes that have not been committed to Git. The easiest way around this is to not change the files you pull from this repo.
Below is the file system that this pipeline assumes:
To create this filesystem on your computer, run the following line from the repo directory:
sh config.sh {path where to create muri_metabarcoding}
This will create a directory called muri_metabarcoding in the given directory, and the filesystem will be created inside that directory. For example, if I wanted to create these files on my desktop, I would run:
sh config.sh ~/Desktop/
Now, the filesystem is now found at ~/Desktop/muri_metabarcoding
.
If you already have the filesystem on your system, the following command will copy over the updated files from the github repo to the filesytem
cp ./bin/* {pathway to muri_metabarcoding}/scripts
cp -r ./metadata/* {pathway to muri_metabarcoding}/metadata
For example:
cp ./bin/* ~/Desktop/muri_metabarcoding/scripts
cp -r ./metadata/* ~/Desktop/muri_metabarcoding/metadata
Copy your raw fastq's into the raw_fastq directory. A command that would do this would look like:
cp /path/to/fastqs/* /path/to/raw_fastqs
Then, copy your Miseq Sample Sheet (which is produced after Illumina sequencing) to the metadata file.
cp /path/to/SampleSheetUsed.csv /path/to/muri_metabarcoding/metadata
The metabarcoding wrapper takes 2 inputs:
- path to your file system (named muri_metabarcoding)
- run name (can be any name)
To run this script, use the following command:
bash metabarcoding_wrapper.sh {pathway to muri_metabarcoding} {run name}
Example:
bash metabarcoding_wrapper.sh ~/Desktop/muri_metabarcoding/ MURI304
To keep the ASV databases updated, push your updated databases to the github repository.
If you're not an admin for the repo skip this step!
# RUN FROM CLONED REPOSITORY
# copy the updated ASV databases into the repository
cp {pathway to muri_metabarcoding}/metadata/known_hashes/* ./data/known_hashes/
# commit and push updated databases
git add ./data/known_hashes/
git commit -m "update known_hashes $(date +"%T")"
git push