Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No GFF file #31

Open
vupadhyay-code opened this issue Sep 3, 2024 · 9 comments
Open

No GFF file #31

vupadhyay-code opened this issue Sep 3, 2024 · 9 comments

Comments

@vupadhyay-code
Copy link

Hello Biologger - your prior suggestion for Issue 15 was great and solved the problem I was running into. However, now when I run the version of the pipeline you suggested, I get the following error:

Run: quality_control(rRNA)
Starting QC with rRNA
found 0 gff files
Error: No .gff files found for QualityControl rRNA
Error report:
for target Anaerostipes_hadrus
Error 1:
Error: No .gff files found for QualityControl rRNA

The folder called gff_files and ffn_files are empty. It looks like prokka is not running to me. These are the lines run before hand (genomes are downloaded)

GCF_000210695v1 annotation required
Run prokka --kingdom Bacteria --outdir GCF_000210695v1_20240903 --genus Anaerostipes --locustag GCF_000210695v1 --prefix GCF_000210695v1_20240903 --cpus 0 genomic_fna/GCF_000210695.1_ASM21069v1_genomic.fna

Thanks in advance!

@biologger
Copy link
Owner

Could you post the command you used to run the pipeline so I can try to reproduce the issue?

@vupadhyay-code
Copy link
Author

I ran:
speciesprimer.py

Then just went through prompts in the shell script.

I heard there is some issue with prokka support from my lab mate who is struggling with new install of prokka.

@biologger
Copy link
Owner

Can you share the content of the ./Anaerostipes_hadrus/config/config.json file?
"." is probably the directory where you started speciesprimer.py.

If you are using the docker container prokka installation is not a issue in this case.

@vupadhyay-code
Copy link
Author

Here you go:

{"exception": ["Eubacterium hadrum", "Anaerostipes sp. 5/1/63FAA", "Clostridiales bacterium SSC/2", "butyrate-producing bacterium SS2/1", "butyrate-producing bacterium SSC/2"], "mfold": -3.0, "intermediate": false, "path": "/turnbaugh/qb3share/shared_resources/apptainer_containers/new_speciesprimer", "ignore_qc": false, "assemblylevel": ["all"], "skip_tree": false, "nolist": false, "offline": false, "target": "Anaerostipes_hadrus", "maxsize": 200, "mfethreshold": 90, "blastseqs": 1000, "customdb": null, "skip_download": false, "probe": false, "minsize": 70, "mpprimer": -3.5, "blastdbv5": false, "qc_gene": ["rRNA"]}

Thanks for being so responsive.

@biologger
Copy link
Owner

Thanks for sharing!
I will look into this issue and answer as soon as I have news...

@biologger
Copy link
Owner

Hi,

I was not able to reproduce this Error, I was able to annotate all the Anaerostipes_hadrus genomes and QC worked with the latest Docker container.

Does restarting the run always lead to the same result?
Did you check if you have enough disk space left for the annotated files?

@vupadhyay-code
Copy link
Author

Yeah I got the same error again. I've tried it twice. It downloads the genomes just fine and fails at this gff level. I have a lot of disk space. You want me to try to clear some out and run it again? Are there any other requirements that I might need to change at the system level?

@biologger
Copy link
Owner

Hm, strange... Could you try to run the follwing command inside the /primerdesign/Anaerostipes_hadrus directory and post the output from the terminal?

prokka --kingdom Bacteria --outdir GCF_000210695v1_20240903 --genus Anaerostipes --locustag GCF_000210695v1 --prefix GCF_000210695v1_20240903 --cpus 0 genomic_fna/GCF_000210695.1_ASM21069v1_genomic.fna

@vupadhyay-code
Copy link
Author

[09:04:53] This is prokka 1.14.5
[09:04:53] Written by Torsten Seemann [email protected]
[09:04:53] Homepage is https://github.com/tseemann/prokka
[09:04:53] Local time is Mon Sep 9 09:04:53 2024
[09:04:53] You are vupadhyay
[09:04:53] Operating system is linux
[09:04:53] You have BioPerl 1.006924
[09:04:53] System has 48 cores.
[09:04:53] Will use maximum of 48 cores.
[09:04:53] Annotating as >>> Bacteria <<<
[09:04:53] Creating new output folder: GCF_000210695v1_20240903
[09:04:53] Running: mkdir -p GCF_000210695v1_20240903
[09:04:53] Using filename prefix: GCF_000210695v1_20240903.XXX
[09:04:53] Setting HMMER_NCPU=1
[09:04:53] Writing log to: GCF_000210695v1_20240903/GCF_000210695v1_20240903.log
[09:04:53] Command: /programs/prokka/bin/prokka --kingdom Bacteria --outdir GCF_000210695v1_20240903 --genus Anaerostipes --locustag GCF_000210695v1 --prefix GCF_000210695v1_20240903 --cpus 0 genomic_fna/GCF_000210695.1_ASM21069v1_genomic.fna
[09:04:53] Appending to PATH: /programs/prokka/bin/../binaries/linux
[09:04:53] Appending to PATH: /programs/prokka/bin/../binaries/linux/../common
[09:04:53] Appending to PATH: /programs/prokka/bin
[09:04:53] Looking for 'aragorn' - found /usr/bin/aragorn
[09:04:53] Determined aragorn version is 001002 from 'ARAGORN v1.2.36 Dean Laslett'
[09:04:53] Looking for 'barrnap' - found /usr/bin/barrnap
[09:04:53] Determined barrnap version is 000007 from 'barrnap 0.7'
[09:04:53] Looking for 'blastp' - found /programs/ncbi-blast/bin/blastp
[09:04:54] Determined blastp version is 002009 from 'blastp: 2.9.0+'
[09:04:54] Looking for 'cmpress' - found /usr/bin/cmpress
[09:04:54] Determined cmpress version is 001001 from '# INFERNAL 1.1.1 (July 2014)'
[09:04:54] Looking for 'cmscan' - found /usr/bin/cmscan
[09:04:54] Determined cmscan version is 001001 from '# INFERNAL 1.1.1 (July 2014)'
[09:04:54] Looking for 'egrep' - found /bin/egrep
[09:04:54] Looking for 'find' - found /usr/bin/find
[09:04:54] Looking for 'grep' - found /bin/grep
[09:04:54] Looking for 'hmmpress' - found /usr/bin/hmmpress
[09:04:54] Determined hmmpress version is 003001 from '# HMMER 3.1b2 (February 2015); http://hmmer.org/'
[09:04:54] Looking for 'hmmscan' - found /usr/bin/hmmscan
[09:04:54] Determined hmmscan version is 003001 from '# HMMER 3.1b2 (February 2015); http://hmmer.org/'
[09:04:54] Looking for 'java' - found /usr/bin/java
[09:04:54] Looking for 'makeblastdb' - found /programs/ncbi-blast/bin/makeblastdb
[09:04:54] Determined makeblastdb version is 002009 from 'makeblastdb: 2.9.0+'
[09:04:54] Looking for 'minced' - found /programs/prokka/bin/../binaries/linux/../common/minced
[09:04:54] Determined minced version is 002000 from 'minced 0.2.0'
[09:04:54] Looking for 'parallel' - found /usr/bin/parallel
[09:04:55] Determined parallel version is 20161222 from 'GNU parallel 20161222'
[09:04:55] Looking for 'prodigal' - found /usr/bin/prodigal
[09:04:55] Determined prodigal version is 002006 from 'Prodigal V2.6.2: February, 2015'
[09:04:55] Looking for 'prokka-genbank_to_fasta_db' - found /programs/prokka/bin/prokka-genbank_to_fasta_db
[09:04:55] Looking for 'sed' - found /bin/sed
[09:04:55] Looking for 'tbl2asn' - found /programs/prokka/bin/../binaries/linux/tbl2asn
[tbl2asn] This copy of tbl2asn is more than a year old. Please download the current version.
[09:04:55] Determined tbl2asn version is 025007 from 'tbl2asn 25.7 arguments:'
[09:04:55] Using genetic code table 11.
[09:04:55] Loading and checking input file: genomic_fna/GCF_000210695.1_ASM21069v1_genomic.fna
[09:04:55] Wrote 1 contigs totalling 3114788 bp.
[09:04:55] Predicting tRNAs and tmRNAs
[09:04:55] Running: aragorn -l -gc11 -w GCF_000210695v1_20240903/GCF_000210695v1_20240903.fna
[09:04:56] 1 tRNA-Ser c[61020,61108] 35 (gct)
[09:04:56] 2 tRNA-Glu [118882,118953] 34 (ttc)
[09:04:56] 3 tRNA-Cys [118986,119056] 33 (gca)
[09:04:56] 4 tRNA-Met [119099,119173] 35 (cat)
[09:04:56] 5 tRNA-Glu [218867,218938] 34 (ttc)
[09:04:56] 6 tRNA-Thr [218973,219045] 34 (tgt)
[09:04:56] 7 tRNA-Met [219052,219125] 35 (cat)
[09:04:56] 8 tRNA-Asp [219150,219224] 35 (gtc)
[09:04:56] 9 tRNA-Val [219250,219323] 34 (tac)
[09:04:56] 10 tRNA-Leu [219353,219440] 36 (taa)
[09:04:56] 11 tRNA-Arg [219487,219561] 35 (acg)
[09:04:56] 12 tRNA-Tyr [559549,559631] 35 (gta)
[09:04:56] 13 tRNA-Leu [559635,559720] 35 (taa)
[09:04:56] 14 tRNA-Ser [685506,685593] 37 (tga)
[09:04:56] 15 tRNA-Ser [685621,685711] 36 (gga)
[09:04:56] 16 tRNA-Arg c[1373568,1373642] 35 (ccg)
[09:04:56] 17 tRNA-Met [1474024,1474097] 35 (cat)
[09:04:56] 18 tRNA-Gln [1696504,1696574] 33 (ctg)
[09:04:56] 19 tRNA-Lys [1696580,1696653] 34 (ttt)
[09:04:56] 20 tRNA-Gln [1696743,1696813] 33 (ctg)
[09:04:56] 21 tRNA-Lys [1696818,1696890] 34 (ttt)
[09:04:56] 22 tRNA-Thr c[1703992,1704064] 34 (ggt)
[09:04:56] 23 tRNA-Glu [1713841,1713914] 35 (ctc)
[09:04:56] 24 tRNA-Arg [1736603,1736678] 36 (tct)
[09:04:56] 25 tRNA-His [1736720,1736794] 35 (gtg)
[09:04:56] 26 tRNA-Gln [1736818,1736889] 33 (ttg)
[09:04:56] 27 tRNA-Lys [1736908,1736980] 34 (ttt)
[09:04:56] 28 tRNA-Leu [1737319,1737401] 36 (tag)
[09:04:56] 29 tRNA-Arg [1737956,1738031] 36 (tct)
[09:04:56] 30 tRNA-His [1738079,1738153] 35 (gtg)
[09:04:56] 31 tRNA-Lys [1738193,1738266] 34 (ttt)
[09:04:56] 32 tRNA-Asp [1813557,1813631] 35 (gtc)
[09:04:56] 33 tRNA-Val [1813650,1813723] 34 (tac)
[09:04:56] 34 tRNA-Thr [1813743,1813815] 34 (tgt)
[09:04:56] 35 tRNA-Tyr [1813865,1813947] 35 (gta)
[09:04:56] 36 tRNA-Met [1813958,1814033] 36 (cat)
[09:04:56] 37 tRNA-Phe [1814055,1814130] 35 (gaa)
[09:04:56] 38 tRNA-Gly [1838421,1838491] 33 (tcc)
[09:04:56] 39 tRNA-Gly [1958568,1958640] 34 (gcc)
[09:04:56] 40 tRNA-Thr [2067260,2067332] 34 (cgt)
[09:04:56] 41 tRNA-Lys c[2247340,2247413] 34 (ctt)
[09:04:56] 42 tRNA-Phe [2307923,2307997] 35 (gaa)
[09:04:56] 43 tRNA-Gly [2308003,2308074] 33 (tcc)
[09:04:56] 44 tRNA-Met [2599944,2600019] 36 (cat)
[09:04:56] 45 tRNA-Val [2674822,2674895] 34 (tac)
[09:04:56] 46 tRNA-Met [2674932,2675006] 35 (cat)
[09:04:56] 47 tRNA-Ser c[2787732,2787815] 35 (cag)
[09:04:56] 48 tRNA-Leu c[2787820,2787905] 35 (aag)
[09:04:56] 49 tRNA-Arg c[2863143,2863217] 35 (cct)
[09:04:56] 50 tRNA-Trp [2930556,2930627] 34 (cca)
[09:04:56] 51 tmRNA [2957248,2957587] 86,115 ADNKLAYAA*
[09:04:56] 52 tRNA-Ile [2979859,2979934] 34 (tat)
[09:04:56] 53 tRNA-Leu c[3075729,3075812] 35 (caa)
[09:04:56] Found 53 tRNAs
[09:04:56] Predicting Ribosomal RNAs
[09:04:56] Running Barrnap with 48 threads
[09:04:56] 1 NC_021016.1 44193 16S ribosomal RNA
[09:04:56] Found 1 rRNAs
[09:04:56] Skipping ncRNA search, enable with --rfam if desired.
[09:04:56] Total of 53 tRNA + rRNA features
[09:04:56] Searching for CRISPR repeats
[09:04:57] Found 0 CRISPRs
[09:04:57] Predicting coding sequences
[09:04:57] Contigs total 3114788 bp, so using single mode
[09:04:57] Running: prodigal -i GCF_000210695v1_20240903/GCF_000210695v1_20240903.fna -c -m -g 11 -p single -f sco -q
[09:05:03] Found 3014 CDS
[09:05:03] Connecting features back to sequences
[09:05:03] Not using genus-specific database. Try --usegenus to enable it.
[09:05:03] Annotating CDS, please be patient.
[09:05:03] Will use 48 CPUs for similarity searching.
[09:05:05] There are still 3014 unannotated CDS left (started with 3014)
[09:05:05] Will use blast to search against /programs/prokka/db/kingdom/Bacteria/IS with 48 CPUs
[09:05:05] Running: cat GCF_000210695v1_20240903/GCF_000210695v1_20240903.IS.tmp.3673189.faa | parallel --gnu --plain -j 48 --block 9615 --recstart '>' --pipe blastp -query - -db /programs/prokka/db/kingdom/Bacteria/IS -evalue 1e-30 -qcov_hsp_perc 90 -num_threads 1 -num_descriptions 1 -num_alignments 1 -seg no > GCF_000210695v1_20240903/GCF_000210695v1_20240903.IS.tmp.3673189.blast 2> /dev/null
[09:05:06] Could not run command: cat GCF_000210695v1_20240903/GCF_000210695v1_20240903.IS.tmp.3673189.faa | parallel --gnu --plain -j 48 --block 9615 --recstart '>' --pipe blastp -query - -db /programs/prokka/db/kingdom/Bacteria/IS -evalue 1e-30 -qcov_hsp_perc 90 -num_threads 1 -num_descriptions 1 -num_alignments 1 -seg no > GCF_000210695v1_20240903/GCF_000210695v1_20240903.IS.tmp.3673189.blast 2> /dev/null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants