Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some conflicts and still can not run #1

Closed
marieBvr opened this issue Jul 5, 2021 · 3 comments
Closed

some conflicts and still can not run #1

marieBvr opened this issue Jul 5, 2021 · 3 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@marieBvr
Copy link
Owner

marieBvr commented Jul 5, 2021

Link to issue from @poursalavati

@marieBvr marieBvr added bug Something isn't working help wanted Extra attention is needed labels Jul 5, 2021
@marieBvr marieBvr self-assigned this Jul 5, 2021
@poursalavati
Copy link

Thank you, dear Marie, for your help and update.

I tried the slurm version, and unfortunately, it has some conflict yet.

For example, in loadTaxonomy.pl, line number 272, there is unnecessary space before EOF. After removing them, the script is running. But again, create an 80 kb SQL database. Also, I have enough free space in our HPC.

This is the summary of running:

>virAnnot/slurm/db$ ./loadTaxonomy.pl -struct taxonomyStructure.sql -index taxonomyIndex.sql -acc_prot acc2taxid.prot -acc_nucl acc2taxid.nucl -names names.dmp -nodes nodes.dmp -gi_prot gi_taxid_prot.dmp -acc_wgs acc2taxid.nucl -dead_prot dead_prot.accession2taxid -dead_nucl dead_nucl.accession2taxid
2021/07/12 21:36:03  INFO> loadTaxonomy.pl:122 main::_create_sqlite_db - Creating database.
2021/07/12 21:36:05  INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - Inserting tables into database...
2021/07/12 21:36:05  INFO> loadTaxonomy.pl:80 main::_insertingCSVDataInDatabase - prot_accession2taxid
2021/07/12 21:36:05  INFO> loadTaxonomy.pl:80 main::_insertingCSVDataInDatabase - nucl_accession2taxid
2021/07/12 21:36:05  INFO> loadTaxonomy.pl:80 main::_insertingCSVDataInDatabase - gi_prot
2021/07/12 21:36:05  INFO> loadTaxonomy.pl:80 main::_insertingCSVDataInDatabase - names
2021/07/12 21:36:05  INFO> loadTaxonomy.pl:80 main::_insertingCSVDataInDatabase - nodes

What's your suggestion. How can we handle it and import these data to the SQL and go to the next step?

@marieBvr
Copy link
Owner Author

Hi Naser,

Sorry for my late reply, I have been trying to reproduce your issue but without success.
I found out that the ftp://ftp.ncbi.nih.gov/pub/taxonomy/obsolete/gi_taxid_prot.dmp.gz link is no longer available because it is deprecated.

I will try to update the ./loadTaxonomy.pl script with the new Ncbi files. This may take some time, I apologize for the inconvenience.

Sincerely yours,
Marie

@poursalavati
Copy link

Thank you very much for redeveloping this code.

Yes, as you mentioned. NCBI has changed the structure of the database and some files.

Recently for another tool, I needed gi_taxid_nucl.dmp.gz and gi_taxid_prot.dmp.gz files and they are no longer available.

This is the way I used to extract these files from existing NCBI accession2taxid files (may need to be added to the loadTaxonomy.pl script or as a separate script before using loadTaxonomy.pl)

For extract gi_taxid_nucl.dmp.gz from acc2taxid.nucl (or from other accession2taxid files from https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/):

awk '{ print $4 " " $3}' acc2taxid.nucl > gi_taxid_nucl_temp1.dmp

tail -n +2 gi_taxid_nucl_temp1.dmp > gi_taxid_nucl_temp2.dmp

rm gi_taxid_nucl_temp1.dmp

tr ' ' \\t < gi_taxid_nucl_temp2.dmp > gi_taxid_nucl_new.dmp

rm gi_taxid_nucl_temp2.dmp

and for extract gi_taxid_prot.dmp.gz from acc2taxid.prot (or from other accession2taxid files from https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/):

awk '{ print $4 " " $3}' acc2taxid.prot > gi_taxid_prot_temp1.dmp

tail -n +2 gi_taxid_prot_temp1.dmp > gi_taxid_prot_temp2.dmp

rm gi_taxid_prot_temp1.dmp

tr ' ' \\t < gi_taxid_prot_temp2.dmp > gi_taxid_prot_new.dmp

rm gi_taxid_prot_temp2.dmp

Sincerely yours,

Naser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants