Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadTaxonomy.pl output a 80K taxonomy.tmp.sqlite, I'm not sure that's right. #2

Open
NailouZhang opened this issue Jan 3, 2022 · 1 comment
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@NailouZhang
Copy link

Hi @marieBvr ,

I met same issues with #1 , I also continue the install piplines, but the size of taxonomy.tmp.sqlite not changed. can you help me resolve it?

Sincerely yours,

Nailou

PS: The installation information is as follows:

activate conda envirnment

cd ~/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot
source ~/20T/DataBase/SoftwaresEnsembel/MiniConda/Source.sh
conda activate VirAnnot
export PERL5LIB=/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/lib:$PERL5LIB
export PATH=$PATH:/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/Tools
export PATH=/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/tools:
/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/launchers:$PATH
export PATH=$PATH:/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/db

#下载 & 安装 数据集
####################################################################################################
cd /home/stone/20T/DataBase/SoftwaresEnsembel/BigDataBase
#NCBI Taxonomy
#Download and extract NCBI taxonomy files.
#wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz ;
tar -xf taxdump.tar.gz;
#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz ;
gunzip prot.accession2taxid.gz;
#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz ;
gunzip nucl_gb.accession2taxid.gz;

#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/dead_prot.accession2taxid.gz ;
gunzip dead_prot.accession2taxid.gz;
cat prot.accession2taxid dead_prot.accession2taxid > acc2taxid.prot

#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/nucl_wgs.accession2taxid.gz ;
gunzip nucl_wgs.accession2taxid.gz;
#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/dead_wgs.accession2taxid.gz ;
gunzip dead_wgs.accession2taxid.gz
cat nucl_wgs.accession2taxid nucl_gb.accession2taxid dead_wgs.accession2taxid > acc2taxid.nucl
#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/dead_nucl.accession2taxid.gz;
gunzip dead_nucl.accession2taxid.gz;

#create gi_taxid_prot_new.dmp
awk '{ print $4 " " $3}' acc2taxid.prot > gi_taxid_prot_temp1.dmp
tail -n +2 gi_taxid_prot_temp1.dmp > gi_taxid_prot_temp2.dmp
rm gi_taxid_prot_temp1.dmp
tr ' ' \t < gi_taxid_prot_temp2.dmp > gi_taxid_prot_new.dmp
rm gi_taxid_prot_temp2.dmp

ln -s /home/stone/20T/DataBase/SoftwaresEnsembel/BigDataBase/*
/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/db/

cd /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/db/
mv gi_taxid_prot_new.dmp gi_taxid_prot.dmp

sed -i 's/#!/#!//g' loadTaxonomy.pl
sed -i 's/exit(1)/exit(1)}/g' loadTaxonomy.pl

loadTaxonomy.pl
-struct taxonomyStructure.sql
-index taxonomyIndex.sql
-acc_prot acc2taxid.prot
-acc_nucl acc2taxid.nucl
-names names.dmp
-nodes nodes.dmp
-gi_prot gi_taxid_prot.dmp

2022/01/03 13:53:34 INFO> loadTaxonomy.pl:120 main::_create_sqlite_db - Creating database.
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:76 main::_insertingCSVDataInDatabase - Inserting tables into database...
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - nodes
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - gi_prot
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - names
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - prot_accession2taxid
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - nucl_accession2taxid

ll -h taxonomy.tmp.sqlite

-rw-rw-r-- 1 stone stone 80K 1月 3 13:53 taxonomy.tmp.sqlite

tar -xzf fasta.tar.gz;
mkdir pfam
mv pfam*.FASTA pfam/
rm .FASTA
mv pfam/pfam
.

ls -1 pfam*.FASTA |
sed 's,^(.*).FASTA,./gi2taxonomy.pl -i & -o \1.tax.txt -db taxonomy.tmp.sqlite -r,' | bash
#Create a file of file for the *.tax.txt files:
listPath.pl -d . | grep 'tax.txt' > idx
#Compute taxonomy statistic for each domain and create a sql file to load into the database:
taxo_profile_to_sql.pl -i idx > taxo_profile.sql #just two lines ???

Load into the database: ??no thing import because the size of taxonomy.tmp.sqlite change nothing

sqlite3 taxonomy.tmp.sqlite < taxo_profile.sql
####################################################################################################

@marieBvr
Copy link
Owner

marieBvr commented Jan 3, 2022

Hi @NailouZhang,
Thank you for notifying me about this error. I am refactoring my scripts so that it doesn't use this database anymore.
I plan to push the new version of the code this month if everything goes well.
For now, you can skip the database installation as it will no longer be needed. The first steps of the pipeline (readsoustraction, demultiplex, assembly, map) doesn't need the database anyway.

Sincerely yours,
Marie

@marieBvr marieBvr added bug Something isn't working help wanted Extra attention is needed labels Jan 3, 2022
@marieBvr marieBvr self-assigned this Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants