ReefPipe is a command line workflow tool developed to address the challenging task of analyzing large amounts of metabarcoding data. It specifically focuses on processing COI and ITS metabarcoding data, automating parallel Amplicon Sequence Variant (ASV) inference and multi-reference taxonomic classification.
To use ReefPipe, follow these setup steps. If you encounter installation issues, consider using the Docker image. It includes all required software components and can be accessed at ReefPipe-Docker. Using the Docker image eliminates individual installations and potential compatibility problems, as all dependencies are preconfigured.
Make sure you have R version 4.1.3 or higher installed on your system. If you don't have it already, you can download and install it from the official R project website.
Make sure you have Python version 3.7 or higher installed on your system, by running
python --version
If you don't have Python installed already, you can download and install it from the official Python website.
If you are on a UNIX system and get an error similar to
Command ‘python’ not found, did you mean:
command ‘python3’ from python3
command ‘python’ from python-is-python3
you can add 'python' as an alias for 'python3' – on GNU/Linux by running
echo "alias python='python3'" >> ~/.bashrc
or if you're on Mac by running
echo "alias python='python3'" >> ~/.bash_profile
and then restart your terminal.
R packages are automatically installed by ReefPipe scripts, eliminating the need for manual installation. However, errors might occur during the download of the DADA2 1 package, particularly on certain systems. In such cases, you may need to install additional software specific to your operating system. Please note that we cannot provide troubleshooting support for all operating systems, so it is recommended to address these issues independently.
After installing Python, open your terminal and run the following command to install the necessary modules:
pip install boldigger-cline cutadapt argparse biopython tqdm pandas
For Linux distributions that utilize the dnf package manager, such as Fedora, you can download and install Clustal Omega2 by executing the following command in your terminal:
sudo dnf install clustal-omega
For Linux distributions that utilize the apt package manager, such as Ubuntu, you can download and install Clustal Omega by executing the following command in your terminal:
sudo apt install clustalo
Please note that these examples cover 2 popular Linux distributions, but if you are using a different version or distribution, you will need to determine the appropriate command yourself. The package manager and installation method can vary across different Linux distributions.
For Windows and macOS users, a Clustal Omega binary has been incorporated into the ReefPipe script1.
If you have Git initialized on your system, you can clone or download the ReefPipe code from the GitHub repository by running the following command in your terminal:
git clone [email protected]:hjarnek/ReefPipe.git
If you don't have Git initialized on your system, you can download the ReefPipe code as a ZIP file by following these steps:
- Visit the ReefPipe GitHub repository at https://github.com/hjarnek/ReefPipe.
- Click on the green "Code" button.
- Select "Download ZIP" from the dropdown menu.
- Once the ZIP file is downloaded, extract its contents to a location of your choice on your system.
For detailed instructions on how to use ReefPipe, refer to the documentation.
Please find below the list of references that are relevant to ReefPipe:
- Buchner D, Leese F (2020) BOLDigger – a Python package to identify and organise sequences with the Barcode of Life Data systems. Metabarcoding and Metagenomics 4: e53535. https://doi.org/10.3897/mbmg.4.53535
- 1 Callahan, B. (s.a.). DADA2 Pipeline Tutorial (1.16). https://benjjneb.github.io/dada2/tutorial.html
- Cock PA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B and de Hoon MJL (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25, 1422-1423. https://doi.org/10.1093/bioinformatics/btp163
- Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal, 17(1), 10-12. https://doi.org/10.14806/ej.17.1.200
- 2 Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., . . . Söding, J. (2011). Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology, 7(1), 539. https://doi.org/10.1038/msb.2011.75
Please refer to these references for further reading and to understand the underlying concepts and methodologies used in ReefPipe.