Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 Bracken added reads with GTDB database for all taxonomic units #38

Open
alimayy opened this issue Dec 21, 2022 · 4 comments
Open

0 Bracken added reads with GTDB database for all taxonomic units #38

alimayy opened this issue Dec 21, 2022 · 4 comments

Comments

@alimayy
Copy link

alimayy commented Dec 21, 2022

Hi there,

Thanks for developing Struo2 and for releasing the GTDB kraken2 dbs prepared with struo2.

I'm using Kraken v2.1.1 and Bracken v2.8 on paired-end animal microbiome data and the struo2 GTDB_release207 kraken2 DB GTDB . The files I have under in my kraken2/bracken DB are as follows with matching md5sums to those in the mentioned DB repository.

-rw-rw-r--  1 ali  ali  287G dec 21 18:53 hash.k2d
-rw-rw-r--  1 ali  ali    64 dec 21 18:53 opts.k2d
-rw-rw-r--  1 ali  ali  9,8M dec 21 18:53 taxo.k2d
-rw-rw-r--  1 ali  ali    59 dec 21 18:53 database150mers.kmer_distrib
-rw-r--r--  1 ali  ali  578M dec 21 18:53 database150mers.kraken

When I run kraken2 and Bracken using this database, I don't get any added reads from Bracken. Any idea why this might be the case, e.g. could I be missing some required files/folders here, or a kraken2 or Bracken version issue?

image

@nick-youngblut
Copy link
Contributor

Can you please share the log for the run?

@alimayy alimayy changed the title 0 added reads with GTDB database for all taxonomic units 0 Bracken added reads with GTDB database for all taxonomic units Dec 22, 2022
@alimayy
Copy link
Author

alimayy commented Dec 22, 2022

@nick-youngblut , sure

~/packages/kraken2/build/kraken2 --threads 12 \
> --confidence 0 \
> --unclassified-out HAV1804-11-8-18_39-A_S8_unclassified_R#.fastq.gz \
> --output HAV1804-11-8-18_39-A_S8_kraken_output \
> --report HAV1804-11-8-18_39-A_S8_kraken_report.txt \
> --memory-mapping \
> --use-names \
> --db /ramdisk/ \
> --paired \
> --gzip-compressed \
> HAV1804-11-8-18_39-A_S8_L001_R1_001.fastq.gz HAV1804-11-8-18_39-A_S8_L001_R2_001.fastq.gz
Loading database information... done.
9599389 sequences (2882.08 Mbp) processed in 84.199s (6840.5 Kseq/m, 2053.77 Mbp/m).
  6580152 sequences classified (68.55%)
  3019237 sequences unclassified (31.45%)

~/packages/Bracken-2.8/bracken \
> -i HAV1804-11-8-18_39-A_S8_kraken_report.txt \
> -d /ramdisk/ \
> -l S \
> -o HAV1804-11-8-18_39-A_S8_bracken_output_S-lvl.txt \
> -w HAV1804-11-8-18_39-A_S8_bracken_report_S-lvl \
> -r 150 \
> -t 100
 >> Checking for Valid Options...
 >> Running Bracken
      >> python src/est_abundance.py -i HAV1804-11-8-18_39-A_S8_kraken_report.txt -o HAV1804-11-8-18_39-A_S8_bracken_output_S-lvl.txt -k /ramdisk/database150mers.kmer_distrib -l S -t 100
PROGRAM START TIME: 12-22-2022 09:56:34
>> Checking report file: HAV1804-11-8-18_39-A_S8_kraken_report.txt
BRACKEN SUMMARY (Kraken report: HAV1804-11-8-18_39-A_S8_kraken_report.txt)
    >>> Threshold: 100
    >>> Number of species in sample: 51679
          >> Number of species with reads > threshold: 2155
          >> Number of species with reads < threshold: 49524
    >>> Total reads in sample: 9599389
          >> Total reads kept at species level (reads > threshold): 5371999
          >> Total reads discarded (species reads < threshold): 435116
          >> Reads distributed: 0
          >> Reads not distributed (eg. no species above threshold): 773037
          >> Unclassified reads: 3019237
BRACKEN OUTPUT PRODUCED: HAV1804-11-8-18_39-A_S8_bracken_output_S-lvl.txt
PROGRAM END TIME: 12-22-2022 09:56:36
  Bracken complete.

@nick-youngblut
Copy link
Contributor

I don't see anything obvious in the logs that could explain your outcome. Do you get a different result when using the default kraken2 & bracken databases?

@alimayy
Copy link
Author

alimayy commented Dec 22, 2022

@nick-youngblut yes! Also, I do get added reads with Bracken when I use the struo1 database (with lower % of Kraken2-classified reads, of course).

Can you confirm that you get added reads with Bracken when you use the abovementioned database?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants