Skip to content

Workflow examples

masikol edited this page May 26, 2023 · 7 revisions

Workflow examples

1. You can run whole "pipeline" with default settings with a single command and process all FASTA and FASTQ files in the working directory:

barapost-prober.py && barapost-local.py && barapost-binning.py

2. You can try Barapost on test dataset named test_reads.fastq.gz in examples directory (there are 100 reads in this file):

Classify 10 (-b 10) reads with "barapost-prober.py". Two requests will be sent to NCBI BLAST server, each containing 5 (-p 5) reads. Search among Pseudomonas, Rhodococcus and Escherichia (-g 286,1827,561, correspondingly) reference sequences:

barapost-prober.py test_reads.fastq.gz -b 10 -p 5 -g 286,1827,561 -o classif_dir

Download reference genome sequences "discovered" by "barapost-prober.py", create a database on local machine and classify remaining reads using recently created database:

barapost-local.py test_reads.fastq.gz -r classif_dir

Sort classified reads and place binned files to directory some_binned_reads:

barapost-binning.py test_reads.fastq.gz -r classif_dir -o some_binned_reads

3. Binning a FAST5 file raw_signal.fast5.

Once FAST5 file raw_signal.fast5 is basecalled and result file reads.fastq is generated, the latter can be classified with "barapost-prober.py" and/or "barapost-local.py":

barapost-prober.py reads.fastq -o fastq_classification

barapost-local.py reads.fastq -r fastq_classification

Then source FAST5 file can be binned according to classification of FASTQ file:

barapost-binning.py raw_signal.fast5 -r fastq_classification -o fast5_binned

4. Binning "twisted" FAST5 files.

Once FAST5 files raw_signal<1...N>.fast5 are basecalled and result files reads<1...M>.fastq are generated, the latter can be classified with "barapost-prober.py" and/or "barapost-local.py":

barapost-prober.py reads*.fastq -o fastq_classification

barapost-local.py reads*.fastq -r fastq_classification

Then we try to bin source FAST5 data:

barapost-binning.py raw_signal*.fast5 -r fastq_classification -o fast5_binned

The process ends with error message like this:

Read <read_ID> not found in TSV file containing taxonomic annotation.

Try running barapost-binning with '-u' (--untwist-fast5') flag.

We will follow this suggestion and run:

barapost-binning.py raw_signal*.fast5 -r fastq_classification -o fast5_binned -u

And now everything should be right. :)