Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise FileNotFoundError() while pharsing hmmtext #37

Open
ZehanDai opened this issue Apr 13, 2021 · 2 comments
Open

raise FileNotFoundError() while pharsing hmmtext #37

ZehanDai opened this issue Apr 13, 2021 · 2 comments

Comments

@ZehanDai
Copy link

When I tried to run bigslice with my deepBGC predicted output, an error occurred.

Here's the log information (I tried to figure out the reason by modifying the original scripts and printing the intermediate data objects, so you may see some odd message in logs).

(BiGSLICE-py3.6) wolfgang@DESKTOP-647U8AG:/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script$
bigslice -i `pwd`/test_wd `pwd`/test_oud

pid 14503's current affinity list: 0-11
pid 14503's new affinity list: 11
pid 14504's current affinity list: 0-11
pid 14504's new affinity list: 10
pid 14505's current affinity list: 0-11
pid 14505's new affinity list: 9
pid 14506's current affinity list: 0-11
pid 14506's new affinity list: 8
pid 14507's current affinity list: 0-11
pid 14507's new affinity list: 7
pid 14508's current affinity list: 0-11
pid 14508's new affinity list: 6
pid 14509's current affinity list: 0-11
pid 14509's new affinity list: 5
pid 14510's current affinity list: 0-11
pid 14510's new affinity list: 4
pid 14511's current affinity list: 0-11
pid 14511's new affinity list: 3
pid 14512's current affinity list: 0-11
pid 14512's new affinity list: 2
pid 14513's current affinity list: 0-11
pid 14513's new affinity list: 1
pid 14514's current affinity list: 0-11
pid 14514's new affinity list: 0
pid 14479's current affinity list: 0-11
pid 14479's new affinity list: 0-11
Folder /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_oud exists! continue running program (Y/[N])? Y

output_folder /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_oud

File /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_oud/result/data.db.bak exists! it will get overwritten, continue (Y/[N])?Y
Loading database into memory (this can take a while)...

data_db_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_oud/result/data.db

[0.06678962707519531s] loading sqlite3 database
Using HMM database version 'bigslice-models-R01' (built using antiSMASH version 5.1.1)
Loading HMM databases...
[1.1011223793029785s] loading hmm databases

metadata_file /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets.tsv SRR10037259_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10037259_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/taxonomy/SRR10037259_1.taxonomy.tsv
NA
folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd

dataset_folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10037259_1

SRR10037265_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10037265_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/taxonomy/SRR10037265_1.taxonomy.tsv
NA
folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd

dataset_folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10037265_1

SRR10037270_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10037270_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/taxonomy/SRR10037270_1.taxonomy.tsv
NA
folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd

dataset_folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10037270_1

SRR10338929_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10338929_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/taxonomy/SRR10338929_1.taxonomy.tsv
NA
folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd

dataset_folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10338929_1

SRR10338933_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10338933_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/taxonomy/SRR10338933_1.taxonomy.tsv
NA
folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd

dataset_folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10338933_1

SRR10338934_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10338934_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/taxonomy/SRR10338934_1.taxonomy.tsv
NA
folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd

dataset_folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10338934_1

SRR10338936_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10338936_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/taxonomy/SRR10338936_1.taxonomy.tsv
NA
folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd

dataset_folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10338936_1

SRR10583077_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10583077_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/taxonomy/SRR10583077_1.taxonomy.tsv
NA
folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd

dataset_folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10583077_1

SRR10613871_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10613871_1
/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/taxonomy/SRR10613871_1.taxonomy.tsv
NA
folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd

dataset_folder_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_wd/datasets/SRR10613871_1


eligible_regexes [re.compile('^BGC[0-9]{7}$'), re.compile('^.+\\.cluster[0-9]+$'), re.compile('^.+\\.region[0-9]+$')]

Found 8 BGCs in the database.
[0.006556510925292969s] processing dataset: SRR10037259_1
Found 0 BGCs in the database.
[0.00015473365783691406s] processing dataset: SRR10037265_1
Found 0 BGCs in the database.
[0.0001461505889892578s] processing dataset: SRR10037270_1
Found 0 BGCs in the database.
[0.0001442432403564453s] processing dataset: SRR10338929_1
Found 0 BGCs in the database.
[0.00014829635620117188s] processing dataset: SRR10338933_1
Found 0 BGCs in the database.
[0.00015497207641601562s] processing dataset: SRR10338934_1
Found 0 BGCs in the database.
[0.0001513957977294922s] processing dataset: SRR10338936_1
Found 0 BGCs in the database.
[0.00014734268188476562s] processing dataset: SRR10583077_1
Found 0 BGCs in the database.
[0.0001461505889892578s] processing dataset: SRR10613871_1
dataset_name, dataset_bgc_ids SRR10613871_1 []
Found 8 BGC(s) from 9 dataset(s)

self <bigslice.modules.data.database.Database object at 0x7f19acc25cc0>

Dumping in-memory database content into /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_oud/result/data.db...
self._db_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_oud/result/data.db

0.0887s
Checking run status of 8 BGCs...
[0.00022101402282714844s] checking run status
Doing biosyn_pfam scan on 8 BGCs...
2 BGCs are already scanned in previous run
Preparing fasta files for hmmscans...
Running hmmscans in parallel...
Parsing hmmscans results...

self <bigslice.modules.data.database.Database object at 0x7f19acc25cc0>

Dumping in-memory database content into /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_oud/result/data.db...
self._db_path /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script/test_oud/result/data.db

0.1545s
Traceback (most recent call last):
  File "/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/bin/bigslice", line 1607, in <module>
    return_code = main()
  File "/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/bin/bigslice", line 1226, in main
    out_result_path, hmm_ids):
  **File "/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/lib/python3.6/site-packages/bigslice/modules/data/hsp.py", line 75, in parse_hmmtext**
    raise FileNotFoundError()
FileNotFoundError

(BiGSLICE-py3.6) wolfgang@DESKTOP-647U8AG:/mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/wd/script$

The hsp.py and bigslice script seems a bit complicated to me. Would you please take a look into it and tell me how to solve it.

I created the input folder with tree structure as below in reference to the input_folder_template. All gbks were formatted by generate_antismash_gbk.py.


test_wd
├── datasets
│   ├── SRR10037259_1
│   │   ├── SRR10037259_1.region001.gbk
│   │   ├── SRR10037259_1.region002.gbk
│   │   ├── SRR10037259_1.region003.gbk
│   │   ├── SRR10037259_1.region004.gbk
│   │   ├── SRR10037259_1.region005.gbk
│   │   ├── SRR10037259_1.region006.gbk
│   │   ├── SRR10037259_1.region007.gbk
│   │   └── SRR10037259_1.region008.gbk
│   ├── SRR10037265_1
│   │   ├── SRR10037265_1.region001.gbk
│   │   ├── SRR10037265_1.region002.gbk
│   │   ├── SRR10037265_1.region003.gbk
│   │   ├── SRR10037265_1.region004.gbk
│   │   └── SRR10037265_1.region005.gbk
│   ├── SRR10037270_1

...

│   │   ├── SRR10583077_1.region004.gbk
│   │   ├── SRR10583077_1.region005.gbk
│   │   └── SRR10583077_1.region006.gbk
│   └── SRR10613871_1
│       ├── SRR10613871_1.region001.gbk
│       ├── SRR10613871_1.region002.gbk
│       ├── SRR10613871_1.region003.gbk
│       ├── SRR10613871_1.region004.gbk
│       ├── SRR10613871_1.region005.gbk
│       ├── SRR10613871_1.region006.gbk
│       ├── SRR10613871_1.region007.gbk
│       ├── SRR10613871_1.region008.gbk
│       ├── SRR10613871_1.region009.gbk
│       ├── SRR10613871_1.region010.gbk
│       ├── SRR10613871_1.region011.gbk
│       ├── SRR10613871_1.region012.gbk
│       ├── SRR10613871_1.region013.gbk
│       ├── SRR10613871_1.region014.gbk
│       ├── SRR10613871_1.region015.gbk
│       ├── SRR10613871_1.region016.gbk
│       ├── SRR10613871_1.region017.gbk
│       └── SRR10613871_1.region018.gbk
├── datasets.tsv
└── taxonomy
    ├── SRR10037259_1.taxonomy.tsv
    ├── SRR10037265_1.taxonomy.tsv
    ├── SRR10037270_1.taxonomy.tsv
    ├── SRR10338929_1.taxonomy.tsv
    ├── SRR10338933_1.taxonomy.tsv
    ├── SRR10338934_1.taxonomy.tsv
    ├── SRR10338936_1.taxonomy.tsv
    ├── SRR10583077_1.taxonomy.tsv
    └── SRR10613871_1.taxonomy.tsv

11 directories, 96 files
@ZehanDai
Copy link
Author

I addition, I tried running with input_folder_template and everything looks fine.

(BiGSLICE-py3.6) wolfgang@DESKTOP-647U8AG:/mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template$ bigslice -i `pwd` test_oud
pid 12218's current affinity list: 0-11
pid 12218's new affinity list: 11
pid 12219's current affinity list: 0-11
pid 12219's new affinity list: 10
pid 12220's current affinity list: 0-11
pid 12220's new affinity list: 9
pid 12221's current affinity list: 0-11
pid 12221's new affinity list: 8
pid 12222's current affinity list: 0-11
pid 12222's new affinity list: 7
pid 12223's current affinity list: 0-11
pid 12223's new affinity list: 6
pid 12224's current affinity list: 0-11
pid 12224's new affinity list: 5
pid 12225's current affinity list: 0-11
pid 12225's new affinity list: 4
pid 12226's current affinity list: 0-11
pid 12226's new affinity list: 3
pid 12227's current affinity list: 0-11
pid 12227's new affinity list: 2
pid 12228's current affinity list: 0-11
pid 12228's new affinity list: 1
pid 12229's current affinity list: 0-11
pid 12229's new affinity list: 0
pid 12194's current affinity list: 0-11
pid 12194's new affinity list: 0-11
creating output folder...
template_dir /mnt/d/Programming/miniconda3/envs/BiGSLICE-py3.6/lib/python3.6/site-packages/bigslice/modules/output/flask_app
output_folder /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud

output_folder /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud

Loading database into memory (this can take a while)...

data_db_path /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db

[0.008028745651245117s] loading sqlite3 database
Using HMM database version 'bigslice-models-R01' (built using antiSMASH version 5.1.1)
Loading HMM databases...
[2.8430631160736084s] loading hmm databases

metadata_file /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/datasets.tsv
dataset_1
dataset_1/
taxonomy/dataset_1_taxonomy.tsv
Dummy dataset #1, please replace.
folder_path /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template

dataset_folder_path /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/dataset_1/


eligible_regexes [re.compile('^BGC[0-9]{7}$'), re.compile('^.+\\.cluster[0-9]+$'), re.compile('^.+\\.region[0-9]+$')]

processing dataset: dataset_1...
Found 0 BGCs from 0 GBKs, another 2 to be parsed.
Parsing and inserting 2 GBKs...

Inserted 2 new BGCs.
Parsing and inserting taxonomy information...
Added taxonomy info for 2 BGCs...
[0.0585322380065918s] processing dataset: dataset_1
dataset_name, dataset_bgc_ids dataset_1 {1, 2}
Found 2 BGC(s) from 1 dataset(s)

self <bigslice.modules.data.database.Database object at 0x7f08489189b0>

Dumping in-memory database content into /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db...
self._db_path /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db

0.1296s
Checking run status of 2 BGCs...
[0.0001609325408935547s] checking run status
Doing biosyn_pfam scan on 2 BGCs...
0 BGCs are already scanned in previous run
Preparing fasta files for hmmscans...
Running hmmscans in parallel...
Parsing hmmscans results...

self <bigslice.modules.data.database.Database object at 0x7f08489189b0>

Dumping in-memory database content into /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db...
self._db_path /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db

0.0895s
[7.026905298233032s] biosyn_pfam scan
run_status is now BIOSYN_SCANNED
Doing sub_pfam scan on 2 BGCs...
0 BGCs are already scanned in previous run
Preparing fasta files for subpfam_scans...
Running subpfam_scans in parallel...
Parsing subpfam_scans results...
[4.56568455696106s] sub_pfam scan

self <bigslice.modules.data.database.Database object at 0x7f08489189b0>

Dumping in-memory database content into /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db...
self._db_path /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db

0.1135s
run_status is now SUBPFAM_SCANNED
Extracting features from 2 BGCs...
0 BGCs are already extracted in previous run
Extracting features...
[0.08884334564208984s] features extraction

self <bigslice.modules.data.database.Database object at 0x7f08489189b0>

Dumping in-memory database content into /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db...
self._db_path /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db

0.1023s
run_status is now FEATURES_EXTRACTED
Building GCF models...

self <bigslice.modules.data.database.Database object at 0x7f08489189b0>

Dumping in-memory database content into /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db...
self._db_path /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db

0.1098s
[0.18785667419433594s] clustering
run_status is now CLUSTERING_FINISHED
Assigning GCF membership...
[0.04875326156616211s] membership_assignment
run_status is now MEMBERSHIPS_ASSIGNED
[1.9073486328125e-06s] preparing_output
run_status is now RUN_FINISHED

self <bigslice.modules.data.database.Database object at 0x7f08489189b0>

Dumping in-memory database content into /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db...
self._db_path /mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template/test_oud/result/data.db

0.1062s
BiG-SLiCE run complete!
(BiGSLICE-py3.6) wolfgang@DESKTOP-647U8AG:/mnt/d/SoftwareInstallation/01linux/06BGC/bigslice-master/misc/input_folder_template$

@satriaphd
Copy link
Member

sorry for the (very) late reply. have you in any case solved this? the first error seems to suggest that this is a second or later run, and the first run was corrupted in some sort. have you tried rerunning fresh?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants