loadTaxonomy.pl output a 80K taxonomy.tmp.sqlite, I'm not sure that's right. #2

NailouZhang · 2022-01-03T06:28:59Z

I met same issues with #1 , I also continue the install piplines, but the size of taxonomy.tmp.sqlite not changed. can you help me resolve it?

Sincerely yours,

Nailou

PS: The installation information is as follows:

activate conda envirnment

cd ~/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot
source ~/20T/DataBase/SoftwaresEnsembel/MiniConda/Source.sh
conda activate VirAnnot
export PERL5LIB=/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/lib:$PERL5LIB
export PATH=$PATH:/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/Tools
export PATH=/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/tools:
/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/launchers:$PATH
export PATH=$PATH:/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/db

#下载 & 安装数据集
####################################################################################################
cd /home/stone/20T/DataBase/SoftwaresEnsembel/BigDataBase
#NCBI Taxonomy
#Download and extract NCBI taxonomy files.
#wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz ；
tar -xf taxdump.tar.gz;
#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz ;
gunzip prot.accession2taxid.gz;
#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz ;
gunzip nucl_gb.accession2taxid.gz;

#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/dead_prot.accession2taxid.gz ;
gunzip dead_prot.accession2taxid.gz;
cat prot.accession2taxid dead_prot.accession2taxid > acc2taxid.prot

#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/nucl_wgs.accession2taxid.gz ;
gunzip nucl_wgs.accession2taxid.gz;
#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/dead_wgs.accession2taxid.gz ;
gunzip dead_wgs.accession2taxid.gz
cat nucl_wgs.accession2taxid nucl_gb.accession2taxid dead_wgs.accession2taxid > acc2taxid.nucl
#wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/dead_nucl.accession2taxid.gz;
gunzip dead_nucl.accession2taxid.gz;

#create gi_taxid_prot_new.dmp
awk '{ print $4 " " $3}' acc2taxid.prot > gi_taxid_prot_temp1.dmp
tail -n +2 gi_taxid_prot_temp1.dmp > gi_taxid_prot_temp2.dmp
rm gi_taxid_prot_temp1.dmp
tr ' ' \t < gi_taxid_prot_temp2.dmp > gi_taxid_prot_new.dmp
rm gi_taxid_prot_temp2.dmp

ln -s /home/stone/20T/DataBase/SoftwaresEnsembel/BigDataBase/*
/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/db/

cd /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/virAnnot/db/
mv gi_taxid_prot_new.dmp gi_taxid_prot.dmp

sed -i 's/#!/#!//g' loadTaxonomy.pl
sed -i 's/exit(1)/exit(1)}/g' loadTaxonomy.pl

loadTaxonomy.pl
-struct taxonomyStructure.sql
-index taxonomyIndex.sql
-acc_prot acc2taxid.prot
-acc_nucl acc2taxid.nucl
-names names.dmp
-nodes nodes.dmp
-gi_prot gi_taxid_prot.dmp

2022/01/03 13:53:34 INFO> loadTaxonomy.pl:120 main::_create_sqlite_db - Creating database.
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:76 main::_insertingCSVDataInDatabase - Inserting tables into database...
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - nodes
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - gi_prot
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - names
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - prot_accession2taxid
2022/01/03 13:53:35 INFO> loadTaxonomy.pl:78 main::_insertingCSVDataInDatabase - nucl_accession2taxid

ll -h taxonomy.tmp.sqlite

-rw-rw-r-- 1 stone stone 80K 1月 3 13:53 taxonomy.tmp.sqlite

tar -xzf fasta.tar.gz;
mkdir pfam
mv pfam*.FASTA pfam/
rm .FASTA
mv pfam/pfam .

ls -1 pfam*.FASTA |
sed 's,^(.*).FASTA,./gi2taxonomy.pl -i & -o \1.tax.txt -db taxonomy.tmp.sqlite -r,' | bash
#Create a file of file for the *.tax.txt files:
listPath.pl -d . | grep 'tax.txt' > idx
#Compute taxonomy statistic for each domain and create a sql file to load into the database:
taxo_profile_to_sql.pl -i idx > taxo_profile.sql #just two lines ???

Load into the database: ？？no thing import because the size of taxonomy.tmp.sqlite change nothing

sqlite3 taxonomy.tmp.sqlite < taxo_profile.sql
####################################################################################################

marieBvr · 2022-01-03T14:53:17Z

Hi @NailouZhang,
Thank you for notifying me about this error. I am refactoring my scripts so that it doesn't use this database anymore.
I plan to push the new version of the code this month if everything goes well.
For now, you can skip the database installation as it will no longer be needed. The first steps of the pipeline (readsoustraction, demultiplex, assembly, map) doesn't need the database anyway.

Sincerely yours,
Marie

marieBvr added bug Something isn't working help wanted Extra attention is needed labels Jan 3, 2022

marieBvr self-assigned this Jan 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loadTaxonomy.pl output a 80K taxonomy.tmp.sqlite, I'm not sure that's right. #2

loadTaxonomy.pl output a 80K taxonomy.tmp.sqlite, I'm not sure that's right. #2

NailouZhang commented Jan 3, 2022

marieBvr commented Jan 3, 2022

loadTaxonomy.pl output a 80K taxonomy.tmp.sqlite, I'm not sure that's right. #2

loadTaxonomy.pl output a 80K taxonomy.tmp.sqlite, I'm not sure that's right. #2

Comments

NailouZhang commented Jan 3, 2022

activate conda envirnment

-rw-rw-r-- 1 stone stone 80K 1月 3 13:53 taxonomy.tmp.sqlite

Load into the database: ？？no thing import because the size of taxonomy.tmp.sqlite change nothing

marieBvr commented Jan 3, 2022