Skip to content

Workflow examples

SikolenkoMaxim edited this page Feb 21, 2020 · 7 revisions

Workflow examples

1. You can run whole "pipeline" with default settings with a single command and process all FASTA and FASTQ files in the working directory:

prober.py && barapost.py && fastQA5-sorter.py

2. You can try Barapost on test dataset named test_reads.fastq in examples directory (there are 10 reads in this file):

Classify 4 (-b 4) reads with "prober.py". Two requests will be sent to NCBI BLAST server, each containing 2 (-p 2) reads. Search only among Bacteria (-g 2) reference sequences:

prober.py test_reads.fastq -b 4 -p 2 -g 2 -o classif_dir

Download reference genome sequences "discovered" by "prober.py", create a database on local machine and classify remaining reads using recently created database:

barapost.py test_reads.fastq -r classif_dir

Sort classified reads and place sorted files to directory some_sorted_reads:

fastQA5-sorter.py test_reads.fastq -r classif_dir -o some_sorted_reads

3. Sorting a FAST5 file raw_signal.fast5.

Once FAST5 file raw_signal.fast5 is basecalled and result file reads.fastq is generated, the latter can be classified with "prober.py" and/or "barapost.py":

prober.py reads.fastq -o fastq_classification

barapost.py reads.fastq -r fastq_classification

Then source FAST5 file can be sorted according to classification of FASTQ file:

fastQA5-sorter.py raw_signal.fast5 -r fastq_classification -o fast5_sorted

4. Sorting "twisted" FAST5 files.

Once FAST5 files raw_signal<1...N>.fast5 are basecalled and result files reads<1...M>.fastq are generated, the latter can be classified with "prober.py" and/or "barapost.py":

prober.py reads*.fastq -o fastq_classification

barapost.py reads*.fastq -r fastq_classification

The we try to sort source FAST5 data:

fastQA5-sorter.py raw_signal*.fast5 -r fastq_classification -o fast5_sorted

The process ends with error message like this:

Read <read_ID> not found in TSV file containing taxonomic annotation.

Try running sorter with '-u' (--untwist-fast5') flag.

We will follow this suggestion and run:

fastQA5-sorter.py raw_signal*.fast5 -r fastq_classification -o fast5_sorted -u

And now everything should be right. :)