Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single results file not created #24

Open
JensPee opened this issue Jul 3, 2023 · 1 comment
Open

Single results file not created #24

JensPee opened this issue Jul 3, 2023 · 1 comment

Comments

@JensPee
Copy link

JensPee commented Jul 3, 2023

Hi,

When I run or rerun speciesprimer (on a docker container with 15.51 gb RAM allocated to it) a single results file is not successfully created. I can not determine from the logs what the problem is. Any help would be appreciated. ( I noticed that the BLAST DB is 250 gb not 60 gb and I don't know why. Is this maybe part of the problem?)
Settings are as follows:
{'blastseqs': 500, 'skip_tree': False, 'minsize': 75, 'path': '/primerdesign', 'mfethreshold': 90, 'nolist': False, 'ignore_qc': False, 'maxsize': 150, 'probe': False, 'offline': False, 'nontargetlist': [...], 'assemblylevel': ['complete'], 'skip_download': False, 'target': 'Azotobacter_chroococcum', 'intermediate': False, 'qc_gene': ['rRNA'], 'exception': [], 'mpprimer': -3.5, 'blastdbv5': False, 'customdb': None, 'mfold': -3.0}

The following problem shows up in the logs:
Run: run_blast - Start BLAST
27 Jun 2023 05:18:42: Run blastn -task blastn-short -num_threads 4 -query primer.part-0 -evalue 500 -out primer_0_results.xml -outfmt 5 -db nt
27 Jun 2023 14:41:00: Run blastn -task blastn-short -num_threads 4 -query primer.part-1 -evalue 500 -out primer_1_results.xml -outfmt 5 -db nt
27 Jun 2023 23:47:50: Run blastn -task blastn-short -num_threads 4 -query primer.part-2 -evalue 500 -out primer_2_results.xml -outfmt 5 -db nt
28 Jun 2023 09:20:30: Run blastn -task blastn-short -num_threads 4 -query primer.part-3 -evalue 500 -out primer_3_results.xml -outfmt 5 -db nt
28 Jun 2023 18:47:13: Run blastn -
speciesprimer_2023_06_25.log
task blastn-short -num_threads 4 -query primer.part-4 -evalue 500 -out primer_4_results.xml -outfmt 5 -db nt
29 Jun 2023 03:47:32: Run blastn -task blastn-short -num_threads 4 -query primer.part-5 -evalue 500 -out primer_5_results.xml -outfmt 5 -db nt
29 Jun 2023 13:16:20: Run blastn -task blastn-short -num_threads 4 -query primer.part-6 -evalue 500 -out primer_6_results.xml -outfmt 5 -db nt
29 Jun 2023 22:27:04: Run blastn -task blastn-short -num_threads 4 -query primer.part-7 -evalue 500 -out primer_7_results.xml -outfmt 5 -db nt
30 Jun 2023 07:32:29: > Blast duration: 3 days, 2:13:47
30 Jun 2023 07:32:29: Run: run_blastparser(Azotobacter_chroococcum), primer
30 Jun 2023 07:32:29: Run: blast_parser
30 Jun 2023 07:32:29: Run: blastresults_files(Azotobacter_chroococcum)
30 Jun 2023 07:32:46: > A problem with the BLAST results file /primerdesign/Azotobacter_chroococcum/Pangenome/results/primer/primerblast/primer_4_results.xml was detected. Please check if the file was removed and start the run again
30 Jun 2023 07:32:46: ['fatal error while working on', 'Azotobacter_chroococcum', 'check logfile', '/primerdesign/speciesprimer_2023_06_25.log']
fatal error while working on Azotobacter_chroococcum
Traceback (most recent call last):
File "/pipeline/speciesprimer.py", line 4168, in main
run_pipeline_for_target(target, config)
File "/pipeline/speciesprimer.py", line 4082, in run_pipeline_for_target
config, primer_dict).run_primer_qc()
File "/pipeline/speciesprimer.py", line 3537, in run_primer_qc
self.call_blastparser.run_blastparser("primer")
File "/pipeline/speciesprimer.py", line 2588, in run_blastparser
align_dict = self.blast_parser(self.primerblast_dir)
File "/pipeline/speciesprimer.py", line 2518, in blast_parser
align_dict = self.bp_parse_xml_files(blast_dir)
File "/pipeline/speciesprimer.py", line 2485, in bp_parse_xml_files
blastrecords = self.parse_BLASTfile(filename)
File "/pipeline/speciesprimer.py", line 2155, in parse_BLASTfile
record_list = list(blast_records)
File "/usr/local/lib/python3.5/dist-packages/Bio/Blast/NCBIXML.py", line 824, in parse
expat_parser.Parse(NULL, True) # End of XML record
xml.parsers.expat.ExpatError: no element found: line 3874641, column 0
30 Jun 2023 07:32:46: > Error report:
30 Jun 2023 07:32:46: > for target Azotobacter_chroococcum
30 Jun 2023 07:32:46: > Error 1:
30 Jun 2023 07:32:46: > A problem with the BLAST results file /primerdesign/Azotobacter_chroococcum/Pangenome/results/primer/primerblast/primer_4_results.xml was detected. Please check if the file was removed and start the run again
30 Jun 2023 07:32:46: > for target Azotobacter_chroococcum
30 Jun 2023 07:32:46: > Error 2:
30 Jun 2023 07:32:46: > fatal error while working on Azotobacter_chroococcum check logfile /primerdesign/speciesprimer_2023_06_25.log

I attached the broken file 4 and a working file 3 for comparison. Renamed to txt so github will let me upload.
primer.part-4.txt
primer.part-3.txt

@biologger
Copy link
Owner

Hi,
From the log it looks like the blast output file is not complete. This may be due to a lot of results and not enough RAM, even 15 GB should be enough.
There is a chance that it would work if you reduce the blastseqs to a value below 500.
You may try to remove the primerblast directory, change the configuration to blastseqs below 500 and try to re-run the pipeline.
Another option may be to use the ref_prok_rep_genomes database, as there is way less redundancy of sequences.
For the size of your current nt database it looks as it grew a lot in size in recent years and the actual size seems legitimate.
Please tell me if it is working or not, I may need to change the output of the blast results from .xml to .csv/.txt as there I can select the actual data (columns) that are written to the output file, and this may reduce the required RAM.
Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants