ITSoneDB population pipeline

ITSoneDB is a database collecting Eukayotic ITS1 sequences and consistent taxonomic annotations. It is available at http://itsonedb.cloud.ba.infn.it/.

RATIONALE

The pipeline designed for ITSoneDB population integrates ad-hoc Python and BASH scripts and third-party tools (See the figure below).

In the initial step the ENA entries are locally downloaded and eukaryotic entries are extracted.
From each entry specific information (i.e. accession number, version, description line and annotation under specific keys) are pulled out and stored in a TSV file and consistent sequence data are annotated in FASTA files. TSV and FASTA files are analyzed by two parallel procedures to extract or to infer the ITS1 location. The TSV files are parsed out to extract the annotation relative to ITS1 boundaries by means of a commonly used ITS1 synonyms dictionary.
In parallel, HMM profiles for 18S and 5.8S rRNA genes are mapped on FASTA files by means of hmmsearch (HMMER 3.1) (right diagram part). The ITS1 boundaries information obtained by both procedures are merged in order to produce the files needed to populate the database.

REQUIREMENTS

The HMMER is required (for conda installation info see https://anaconda.org/bioconda/hmmer).
The Species Representative Entry Identification procedure require VSEARCH (for installation info see https://github.com/torognes/vsearch).
Python3 dependencies are argparse, argcomplete, getopt, multiprocessing, numpy, biopython, datetime.

USAGE

Following the instruction to execute the scripts:

$./ITSoneDB_upgrade_pipeline.sh
        -s full path directory containing managed scripts
        -x full path auxiliary files directory
        -r full path previous releases directory
        -c cpus number
        
$./ETL_population_pipeline.sh
        -s full path directory containing managed scripts
        -n release number
        -p previous release number
        -r full path releases directory

Auxiliary files are 16S and 5.8S HMM models and txt file containing ITS1 synonyms dictionary.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
ITSoneDB_etl_popul_pipeline		ITSoneDB_etl_popul_pipeline
ITSoneDB_upgrade_pipeline		ITSoneDB_upgrade_pipeline
librs		librs
ETL_population_pipeline.sh		ETL_population_pipeline.sh
ITSoneDB_Eukaryotes.png		ITSoneDB_Eukaryotes.png
ITSoneDB_upgrade_pipeline.sh		ITSoneDB_upgrade_pipeline.sh
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ITSoneDB population pipeline

RATIONALE

REQUIREMENTS

USAGE

About

Releases 2

Packages

Languages

gdefazio/ITSoneDB-population-pipeline

Folders and files

Latest commit

History

Repository files navigation

ITSoneDB population pipeline

RATIONALE

REQUIREMENTS

USAGE

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages