Skip to content

Commit

Permalink
Edited code documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Adam R. Rivers committed Jul 6, 2018
1 parent 1c3f280 commit 23a4dba
Show file tree
Hide file tree
Showing 5 changed files with 181 additions and 171 deletions.
91 changes: 48 additions & 43 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ITSxpress: Software to rapidly trim the Internally transcribed spacer (ITS) region of FASTQ files
ITSxpress: Software to rapidly trim the Internally transcribed spacer (ITS) region of FASTQ files
==================================================================================================
.. image:: https://travis-ci.org/USDA-ARS-GBRU/itsxpress.svg?branch=master
:target: https://travis-ci.org/USDA-ARS-GBRU/itsxpress
Expand All @@ -9,6 +9,9 @@ ITSxpress: Software to rapidly trim the Internally transcribed spacer (ITS) reg
.. image:: https://api.codacy.com/project/badge/Grade/7e2a4c97cde74bccb3e84b706d7a2aa5
:target: https://www.codacy.com/app/GBRU/itsxpress?utm_source=github.com&utm_medium=referral&utm_content=USDA-ARS-GBRU/itsxpress&utm_campaign=Badge_Grade

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1304349.svg
:target: https://doi.org/10.5281/zenodo.1304349

Author
-------
* Adam R. Rivers, US Department of Agriculture, Agricultural Research Service
Expand All @@ -19,7 +22,7 @@ Introduction

The internally transcribed spacer region is a region between highly conserved the small
subunit (SSU) of rRNA and the large subunit (LSU) of the rRNA. In Eukaryotes it contains
the 5.8s genes and two variable length spacer regions. In amplicon sequening studies it is
the 5.8s genes and two variable length spacer regions. In amplicon sequencing studies it is
common practice to trim off the conserved (SSU, 5,8S or LSU) regions. `Bengtsson-Palme
et al. (2013)`_ published software the software package ITSx_ to do this.

Expand All @@ -29,25 +32,27 @@ sequence, so each input sequence must be trimmed. ITSXpress makes this possible
taking FASTQ data, de-replicating the sequences then identifying the start and stop
sites using HMMSearch. Results are parsed and the trimmed files are returned. The ITS 1,
ITS2 or the entire ITS region including the 5.8s rRNA gene can be selected. ITSxpress
uses the hmm model from ITSx so results are comprable.
uses the hmm model from ITSx so results are comparable.

ITSxpress is also available as a `QIIME2 Plugin`_

.. _`Bengtsson-Palme et al. (2013)`: https://doi.org/10.1111/2041-210X.12073
.. _`Bengtsson-Palme et al. (2013)`: https://doi.org/10.1111/2041-210X.12073
.. _ITSx: http://microbiology.se/software/itsx/
.. _OTUs: https://doi.org/10.1038/ismej.2017.119
.. _`QIIME2 Plugin`: https://github.com/USDA-ARS-GBRU/q2_itsxpress


Installation
-------------
ITSxpress can be installed from:

1. Preferred method - Bioconda (to be done):
1. Bioconda: (preferred method because it handles dependencies):

.. code-block:: bash
conda install itsxpress
2. Pip:
2. Pip: https://pypi.org/project/itsxpress/:

.. code-block:: bash
Expand All @@ -63,91 +68,91 @@ ITSxpress can be installed from:
Dependencies
-------------
The software requires Vsearch, BBtools, Hmmer and Biopython. Bioconda takes care of this
for you so it is the preferred installation method.
The software requires Vsearch, BBtools, Hmmer >= 3.1b and Biopython. Bioconda
takes care of this for you so it is the preferred installation method.


Usage
Usage
---------

-h, --help Show this help message and exit.

--fastq A ``.fastq``, ``.fq``, ``.fastq.gz`` or ``.fq.gz`` file. Interleaved
or not.
--single_end A flag to specify if the fastq file is inteleaved.
or not. Required.

--single_end A flag to specify that the fastq file is single-ended (not paired).
single-ended (not paired). Default is false.

--fastq2 A ``.fastq``, ``.fq``, ``.fastq.gz`` or ``.fq.gz`` file representing read 2, optional.

--outfile The trimmed Fastq file, if it ends in ``gz`` it will be gzipped.
--fastq2 A ``.fastq``, ``.fq``, ``.fastq.gz`` or ``.fq.gz`` file representing read 2 if present, optional.

--tempdir Specify the temp file directory.
--outfile The trimmed FASTQ file, if it ends in ``gz`` it will be gzipped.

--keeptemp Should intermediate files be kept?
--tempdir Specify the temp file directory. Default is None.

--keeptemp Should intermediate files be kept? Default is false.

--region Options : {ITS2, ITS1, ALL}

--taxa Select the taxonomic group sequenced: {Alveolata, Bryophyta,
Bacillariophyta, Amoebozoa, Euglenozoa, Fungi, Chlorophyta,
Rhodophyta, Phaeophyceae, Marchantiophyta, Metazoa, Microsporidia,
Rhodophyta, Phaeophyceae, Marchantiophyta, Metazoa,
Oomycota, Haptophyceae, Raphidophyceae, Rhizaria, Synurophyceae,
Tracheophyta, Eustigmatophyceae, Apusozoa, Parabasalia}
--log Log file
Tracheophyta, Eustigmatophyceae, All}. Default Fungi.

--log Log file. Default is ITSxpress.log.

--threads Number of processor threads to use
--threads Number of processor threads to use. Default is 1.


Examples
---------

Use case 1: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
forward and reverse gzipped fastq files using two cpu threads.
Use case 1: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
forward and reverse gzipped FASTQ files using two cpu threads.

.. code-block:: bash
itsxpress --fastq r1.fastq.gz --fastq2 r2.fastq.gz --region ITS2 --taxa Fungi \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2
itsxpress --fastq r1.fastq.gz --fastq2 r2.fastq.gz --region ITS2 \
--taxa Fungi --log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2
ITSxpress can take gzipped or un-gzipped FASTQ files and it can write gzipped or
un-gzipped FASTQ files. It expects FASTQ files to end in: .fq, .fastq, .fq.gz or fastq.gz.

ITSxpress can take gzipped or ungzipped fastq files and it can write gzipped or
ungzipped fastq files. It expects fastq files to end in : .fq, .fastq, .fq.gz or fastq.gz

Use case 2: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
an interleaved gzipped FASTQ files using two cpu threads.

Use case 2: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
an interleaved gzipped fastq files using two cpu threads.

.. code-block:: bash
itsxpress --fastq interleaved.fastq.gz --region ITS2 --taxa Fungi \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2
Use case 3: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
an interleaved gzipped fastq files using two cpu threads.
Use case 3: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
an single-ended gzipped FASTQ files using two cpu threads.

.. code-block:: bash
itsxpress --fastq single-end.fastq.gz --single_end --region ITS2 --taxa Fungi \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2
Single ended data is less common and may come from a dataset where the reads have already
Single ended data is less common and may come from a dataset where the reads have already
been merged.

Use case 4: Trimming the ITS1 region from a Microsporidia amplicon sequencing dataset with
an interleaved gzipped fastq files using 40 cpu threads.
Use case 4: Trimming the ITS1 region from a Alveolata amplicon sequencing dataset with
an interleaved gzipped FASTQ files using 40 cpu threads.

.. code-block:: bash
itsxpress --fastq interleaved.fastq.gz --region ITS1 --taxa Microsporidia \
itsxpress --fastq interleaved.fastq.gz --region ITS1 --taxa Alveolata \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 40
License information
--------------------

This software is a work of the United States Department of Agriculture, Agricultural
Research Service. 17 U.S.C. Section 105 states that "Copyright protection under this
title is not available for any work of the United States Government". While I anticipate
that this work will be released under a CC0 public domain attribution, only the USDA ARS
Office of Technology transfer has the authority to make that determination.
This software is a work of the United States Department of Agriculture, Agricultural
Research Service. 17 U.S.C. Section 105 states that "Copyright protection under this
title is not available for any work of the United States Government". While I anticipate
that this work will be released under a CC0 public domain attribution, only the USDA ARS
Office of Technology transfer has the authority to make that determination.
5 changes: 4 additions & 1 deletion itsxpress/definitions.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
"""Definitions.py: variables shared across the package.
"""
import os

# This is the project Root
Expand Down Expand Up @@ -36,4 +39,4 @@
"Rhodophyta":"H.hmm","Phaeophyceae":"I.hmm","Marchantiophyta":"L.hmm","Metazoa":"M.hmm",
"Oomycota":"O.hmm","Haptophyceae":"P.hmm",
"Raphidophyceae":"Q.hmm"," Rhizaria":"R.hmm","Synurophyceae":"S.hmm",
"Tracheophyta":"T.hmm","Eustigmatophyceae":"U.hmm","All":"all.hmm"}
"Tracheophyta":"T.hmm","Eustigmatophyceae":"U.hmm","All":"all.hmm"}
Loading

0 comments on commit 23a4dba

Please sign in to comment.