-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
gabrielluishernandez
committed
Oct 27, 2021
1 parent
63bb6b5
commit 5b2aeb8
Showing
52 changed files
with
45,080 additions
and
0 deletions.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Dated phylogenies | ||
|
||
First, we created a dated phylogeny for the species tree. We included the node separating the species _S. fugax_ from the other _Solenopsis_ species. The directory `species_tree_dating` includes all scripts used for this, including: | ||
* _De novo_ assembly of the genomes of different fire ants | ||
* The use of BUSCO to identify single-copy orthologs between these species | ||
* Multiple-sequence alignment between these single-copy orthologs | ||
* Dated phylogeny with IQ-Tree | ||
|
||
We then created dated tree with all samples in our analysis (for the supergene and the rest of the genome), calibrated with nodes retrieved from the tree above. The scripts used in this analysis are in the directory `supergene_dating`. |
15 changes: 15 additions & 0 deletions
15
Dating divergence times/species_tree_dating/01-assembly.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
|
||
#SAMPLE="SRR9008173" | ||
#SAMPLE="SRR9008142" | ||
#SAMPLE="SRR9008228" | ||
#SAMPLE="SRR9008253" | ||
#SAMPLE="SRR9008168" | ||
#SAMPLE="SRR9008232" | ||
#SAMPLE="SRR9008217" | ||
#SAMPLE="SRR9008215" | ||
#SAMPLE="SRR9008150" | ||
#SAMPLE="SRR9008158" | ||
#SAMPLE="SRR9008133" | ||
#SAMPLE="SRR9008200" | ||
|
||
qsub -pe smp 10 -q large.q,smalle.q,medium.q -v "SAMPLE=$SAMPLE" ~/scripts/2021.soli.masurca.assembly.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
conda activate busco_old_405 | ||
CPUs=50 | ||
INPUTFASTA="$PWD/NNNNNNNNNNNNNN.fa" | ||
~/scripts/busco405.sh -t $CPUs -i $INPUTFASTA | ||
|
54 changes: 54 additions & 0 deletions
54
Dating divergence times/species_tree_dating/03-busco.gene.lists.common.to.all.samples.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
|
||
#chr1-15 | ||
cat Solenopsis.BUSCO.chr1-15.lst \ | ||
gng20170922.fa.busco405/gng20170922.fa.busco405.complete.lst \ | ||
GCF_016802725.1_UNIL_Sinv_3.0_genomic.fna.busco405/GCF_016802725.1_UNIL_Sinv_3.0_genomic.fna.busco405.complete.lst \ | ||
GCA_018691235.1_QMUL_Sinv_Sequel2_genomic.fna.busco405/GCA_018691235.1_QMUL_Sinv_Sequel2_genomic.fna.busco405.complete.lst \ | ||
GCA_010367695.1_ASM1036769v1_genomic.fna.busco405/GCA_010367695.1_ASM1036769v1_genomic.fna.busco405.complete.lst \ | ||
GCA_009299975.1_ASM929997v1_genomic.fna.busco405/GCA_009299975.1_ASM929997v1_genomic.fna.busco405.complete.lst \ | ||
GCA_009299965.1_ASM929996v1_genomic.fna.busco405/GCA_009299965.1_ASM929996v1_genomic.fna.busco405.complete.lst \ | ||
SRR9008133.masurca.3.3.7.fa.busco405/SRR9008133.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008142.masurca.3.3.7.fa.busco405/SRR9008142.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008150.masurca.3.3.7.fa.busco405/SRR9008150.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008158.masurca.3.3.7.fa.busco405/SRR9008158.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008168.masurca.3.3.7.fa.busco405/SRR9008168.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008173.masurca.3.3.7.fa.busco405/SRR9008173.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008200.masurca.3.3.7.fa.busco405/SRR9008200.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008215.masurca.3.3.7.fa.busco405/SRR9008215.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008217.masurca.3.3.7.fa.busco405/SRR9008217.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008228.masurca.3.3.7.fa.busco405/SRR9008228.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008232.masurca.3.3.7.fa.busco405/SRR9008232.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008253.masurca.3.3.7.fa.busco405/SRR9008253.masurca.3.3.7.fa.busco405.complete.lst \ | ||
Sgeminata.fa.busco405/Sgeminata.fa.busco405.complete.lst \ | ||
Spusillignis.fa.busco405/Spusillignis.fa.busco405.complete.lst \ | ||
Ssaevissima2.fa.busco405/Ssaevissima2.fa.busco405.complete.lst \ | ||
Sfugax.fa.busco405/Sfugax.fa.busco405.complete.lst | \ | ||
sort | uniq -c | sed -e 's/^[ \t]*//' | tr ' ' '\t' | awk '$1 == 23' | cut -f2 > common.buscos.chr1-15.lst | ||
cat common.buscos.chr1-15.lst | wc -l | ||
|
||
#chr16nr | ||
cat Solenopsis.BUSCO.chr16nr.lst \ | ||
gng20170922.fa.busco405/gng20170922.fa.busco405.complete.lst \ | ||
GCF_016802725.1_UNIL_Sinv_3.0_genomic.fna.busco405/GCF_016802725.1_UNIL_Sinv_3.0_genomic.fna.busco405.complete.lst \ | ||
GCA_018691235.1_QMUL_Sinv_Sequel2_genomic.fna.busco405/GCA_018691235.1_QMUL_Sinv_Sequel2_genomic.fna.busco405.complete.lst \ | ||
GCA_010367695.1_ASM1036769v1_genomic.fna.busco405/GCA_010367695.1_ASM1036769v1_genomic.fna.busco405.complete.lst \ | ||
GCA_009299975.1_ASM929997v1_genomic.fna.busco405/GCA_009299975.1_ASM929997v1_genomic.fna.busco405.complete.lst \ | ||
GCA_009299965.1_ASM929996v1_genomic.fna.busco405/GCA_009299965.1_ASM929996v1_genomic.fna.busco405.complete.lst \ | ||
SRR9008133.masurca.3.3.7.fa.busco405/SRR9008133.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008142.masurca.3.3.7.fa.busco405/SRR9008142.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008150.masurca.3.3.7.fa.busco405/SRR9008150.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008158.masurca.3.3.7.fa.busco405/SRR9008158.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008168.masurca.3.3.7.fa.busco405/SRR9008168.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008173.masurca.3.3.7.fa.busco405/SRR9008173.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008200.masurca.3.3.7.fa.busco405/SRR9008200.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008215.masurca.3.3.7.fa.busco405/SRR9008215.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008217.masurca.3.3.7.fa.busco405/SRR9008217.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008228.masurca.3.3.7.fa.busco405/SRR9008228.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008232.masurca.3.3.7.fa.busco405/SRR9008232.masurca.3.3.7.fa.busco405.complete.lst \ | ||
SRR9008253.masurca.3.3.7.fa.busco405/SRR9008253.masurca.3.3.7.fa.busco405.complete.lst \ | ||
Sgeminata.fa.busco405/Sgeminata.fa.busco405.complete.lst \ | ||
Spusillignis.fa.busco405/Spusillignis.fa.busco405.complete.lst \ | ||
Ssaevissima2.fa.busco405/Ssaevissima2.fa.busco405.complete.lst \ | ||
Sfugax.fa.busco405/Sfugax.fa.busco405.complete.lst | \ | ||
sort | uniq -c | sed -e 's/^[ \t]*//' | tr ' ' '\t' | awk '$1 == 23' | cut -f2 > common.buscos.chr16nr.lst | ||
cat common.buscos.chr16nr.lst | wc -l |
17 changes: 17 additions & 0 deletions
17
Dating divergence times/species_tree_dating/04-extract.fasta.sequences.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
|
||
cd $INPUTFASTA.busco405 | ||
cat run_hymenoptera_odb10/full_table.tsv | grep -vP 'Missing|Duplicated|Fragmented|#|Sequence' > $INPUTFASTA.busco405.complete.tsv | ||
cat run_hymenoptera_odb10/full_table.tsv | grep -vP 'Missing|Duplicated|Complete|#|Sequence' > $INPUTFASTA.busco405.fragmented.tsv | ||
cat $INPUTFASTA.busco405.complete.tsv | tabtk cut -r -f 3,4,5,1 | tee $INPUTFASTA.busco405.complete.bed | cut -f4 > $INPUTFASTA.busco405.complete.lst | ||
cat $INPUTFASTA.busco405.fragmented.tsv | tabtk cut -r -f 3,4,5,1 | tee $INPUTFASTA.busco405.fragmented.bed | cut -f4 > $INPUTFASTA.busco405.fragmented.lst | ||
|
||
#test if files are there | ||
cat $INPUTFASTA.busco405.complete.lst | head -n 50 | parallel -k "ls -1 run_hymenoptera_odb10/busco_sequences/single_copy_busco_sequences/{}.fna" | ||
|
||
#get single gene fasta's and rename header to sample | ||
mkdir $INPUTFASTA.busco405.complete.fasta | ||
for GENE in $(ls -1 run_hymenoptera_odb10/busco_sequences/single_copy_busco_sequences/*.fna | rev | cut -d"/" -f1 | cut -d"." -f2- | rev) | ||
do | ||
cat run_hymenoptera_odb10/busco_sequences/single_copy_busco_sequences/$GENE.fna | seqtk seq -l 0 -C | sed -r "s,^>.+$,>$SAMPLENAME,g" > $INPUTFASTA.busco405.complete.fasta/$GENE.fa | ||
done | ||
cd .. |
157 changes: 157 additions & 0 deletions
157
Dating divergence times/species_tree_dating/05-multiple.sequence.alignment.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
|
||
## fuse fastas for alignment and relabel | ||
rm -rf busco.fasta.fused.chr1-15 | ||
mkdir busco.fasta.fused.chr1-15 | ||
|
||
cat common.buscos.chr1-15.lst | parallel -j30 "echo {}; cat Sfugax.fa.busco405/Sfugax.fa.busco405.complete.fasta/{}.fa \ | ||
gng20170922.fa.busco405/gng20170922.fa.busco405.complete.fasta/{}.fa \ | ||
Sgeminata.fa.busco405/Sgeminata.fa.busco405.complete.fasta/{}.fa \ | ||
Spusillignis.fa.busco405/Spusillignis.fa.busco405.complete.fasta/{}.fa \ | ||
Ssaevissima2.fa.busco405/Ssaevissima2.fa.busco405.complete.fasta/{}.fa \ | ||
GCF_016802725.1_UNIL_Sinv_3.0_genomic.fna.busco405/GCF_016802725.1_UNIL_Sinv_3.0_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
GCA_018691235.1_QMUL_Sinv_Sequel2_genomic.fna.busco405/GCA_018691235.1_QMUL_Sinv_Sequel2_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
GCA_010367695.1_ASM1036769v1_genomic.fna.busco405/GCA_010367695.1_ASM1036769v1_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
GCA_009299975.1_ASM929997v1_genomic.fna.busco405/GCA_009299975.1_ASM929997v1_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
GCA_009299965.1_ASM929996v1_genomic.fna.busco405/GCA_009299965.1_ASM929996v1_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
SRR9008133.masurca.3.3.7.fa.busco405/SRR9008133.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008142.masurca.3.3.7.fa.busco405/SRR9008142.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008150.masurca.3.3.7.fa.busco405/SRR9008150.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008158.masurca.3.3.7.fa.busco405/SRR9008158.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008168.masurca.3.3.7.fa.busco405/SRR9008168.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008173.masurca.3.3.7.fa.busco405/SRR9008173.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008200.masurca.3.3.7.fa.busco405/SRR9008200.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008215.masurca.3.3.7.fa.busco405/SRR9008215.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008217.masurca.3.3.7.fa.busco405/SRR9008217.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008228.masurca.3.3.7.fa.busco405/SRR9008228.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008232.masurca.3.3.7.fa.busco405/SRR9008232.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008253.masurca.3.3.7.fa.busco405/SRR9008253.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa |\ | ||
seqtk seq -l 0 > busco.fasta.fused.chr1-15/{}.fa" | ||
cat busco.fasta.fused.chr1-15/9932at7399.fa | grep ">" | ||
cat busco.fasta.fused.chr1-15/9932at7399.fa | grep ">" | wc -l | ||
#22 | ||
|
||
rm -rf busco.fasta.fused.chr16nr | ||
mkdir busco.fasta.fused.chr16nr | ||
cat common.buscos.chr16nr.lst | parallel -j30 "echo {}; cat Sfugax.fa.busco405/Sfugax.fa.busco405.complete.fasta/{}.fa \ | ||
gng20170922.fa.busco405/gng20170922.fa.busco405.complete.fasta/{}.fa \ | ||
Sgeminata.fa.busco405/Sgeminata.fa.busco405.complete.fasta/{}.fa \ | ||
Spusillignis.fa.busco405/Spusillignis.fa.busco405.complete.fasta/{}.fa \ | ||
Ssaevissima2.fa.busco405/Ssaevissima2.fa.busco405.complete.fasta/{}.fa \ | ||
GCF_016802725.1_UNIL_Sinv_3.0_genomic.fna.busco405/GCF_016802725.1_UNIL_Sinv_3.0_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
GCA_018691235.1_QMUL_Sinv_Sequel2_genomic.fna.busco405/GCA_018691235.1_QMUL_Sinv_Sequel2_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
GCA_010367695.1_ASM1036769v1_genomic.fna.busco405/GCA_010367695.1_ASM1036769v1_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
GCA_009299975.1_ASM929997v1_genomic.fna.busco405/GCA_009299975.1_ASM929997v1_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
GCA_009299965.1_ASM929996v1_genomic.fna.busco405/GCA_009299965.1_ASM929996v1_genomic.fna.busco405.complete.fasta/{}.fa \ | ||
SRR9008133.masurca.3.3.7.fa.busco405/SRR9008133.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008142.masurca.3.3.7.fa.busco405/SRR9008142.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008150.masurca.3.3.7.fa.busco405/SRR9008150.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008158.masurca.3.3.7.fa.busco405/SRR9008158.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008168.masurca.3.3.7.fa.busco405/SRR9008168.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008173.masurca.3.3.7.fa.busco405/SRR9008173.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008200.masurca.3.3.7.fa.busco405/SRR9008200.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008215.masurca.3.3.7.fa.busco405/SRR9008215.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008217.masurca.3.3.7.fa.busco405/SRR9008217.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008228.masurca.3.3.7.fa.busco405/SRR9008228.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008232.masurca.3.3.7.fa.busco405/SRR9008232.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa \ | ||
SRR9008253.masurca.3.3.7.fa.busco405/SRR9008253.masurca.3.3.7.fa.busco405.complete.fasta/{}.fa |\ | ||
seqtk seq -l 0 > busco.fasta.fused.chr16nr/{}.fa" | ||
cat busco.fasta.fused.chr16nr/531at7399.fa | grep ">" | ||
cat busco.fasta.fused.chr16nr/531at7399.fa | grep ">" | wc -l | ||
#22 | ||
|
||
|
||
|
||
## run parallel alignments and sort by ID (phyx pxsort) | ||
mkdir run1/dating.msa.chr16nr | ||
mkdir run1/dating.msa.chr16nr.sorted | ||
cat common.buscos.chr16nr.lst | parallel -j 91 "echo {}; prank -d=busco.fasta.fused.chr16nr/{}.fa -f=fasta -DNA -iterate=10 -o=run1/dating.msa.chr16nr/{} && cat run1/dating.msa.chr16nr/{}.best.fas | pxssort --sortby 1 > run1/dating.msa.chr16nr.sorted/{}.fa" | ||
|
||
|
||
mkdir run1/dating.msa.chr1-15 | ||
mkdir run1/dating.msa.chr1-15.sorted | ||
cat common.buscos.chr1-15.lst | parallel -j 100 "echo {}; prank -d=busco.fasta.fused.chr1-15/{}.fa -f=fasta -DNA -iterate=10 -o=run1/dating.msa.chr1-15/{}; cat run1/dating.msa.chr1-15/{}.best.fas | pxssort --sortby 1 > run1/dating.msa.chr1-15.sorted/{}.fa" | ||
|
||
|
||
|
||
## remove alignment sites with gaps | ||
INPUTLIST="common.buscos.chr16nr.lst" | ||
mkdir run1/dating.msa.chr16nr.sorted.pxclsq | ||
rm -f run1/dating.msa.chr16nr.sorted.pxclsq/stats.txt | ||
|
||
N=$(cat $INPUTLIST | wc -l) && echo $N | ||
for (( i = 1 ; i < $N+1 ; i++)) | ||
do | ||
BUSCOGENE=$(cat $INPUTLIST | sed -n $i'p') | ||
echo $i". "$BUSCOGENE | ||
cat run1/dating.msa.chr16nr.sorted/$BUSCOGENE.fa | pxclsq -p 1.0 > run1/dating.msa.chr16nr.sorted.pxclsq/$BUSCOGENE.fa | ||
cat run1/dating.msa.chr16nr.sorted.pxclsq/$BUSCOGENE.fa | pxlssq > run1/dating.msa.chr16nr.sorted.pxclsq/$BUSCOGENE.fa.stats | ||
echo $BUSCOGENE >> run1/dating.msa.chr16nr.sorted.pxclsq/stats.txt | ||
cat run1/dating.msa.chr16nr.sorted.pxclsq/$BUSCOGENE.fa.stats >> run1/dating.msa.chr16nr.sorted.pxclsq/stats.txt | ||
done | ||
|
||
cat run1/dating.msa.chr16nr.sorted.pxclsq/stats.txt | grep -B 2 "Number of sequences: " | grep -vP 'Number of|File type|--' > run1/dating.msa.chr16nr.sorted.pxclsq/stats.short.txt | ||
|
||
cat run1/dating.msa.chr16nr.sorted.pxclsq/stats.txt | grep " -" | ||
cat run1/dating.msa.chr16nr.sorted.pxclsq/stats.txt | grep " ?" | ||
cat run1/dating.msa.chr16nr.sorted.pxclsq/stats.txt | grep "Number of sequences: " | wc -l | ||
cat run1/dating.msa.chr16nr.sorted.pxclsq/stats.txt | grep -B 2 "Number of sequences: " | grep -vP 'Number of|File type|--' | wc -l | ||
|
||
### concat alignment | ||
FOLDERCONCAT="run1/dating.msa.chr16nr.sorted.pxclsq.concat" | ||
mkdir $FOLDERCONCAT | ||
pxcat -s run1/dating.msa.chr16nr.sorted.pxclsq/*.fa -p $FOLDERCONCAT/busco.dating.chr16nr.partitions -o $FOLDERCONCAT/busco.dating.chr16nr.supermatrix.fa | ||
pxlssq -s $FOLDERCONCAT/busco.dating.chr16nr.supermatrix.fa > $FOLDERCONCAT/busco.dating.chr16nr.supermatrix.fa.stats | ||
cat $FOLDERCONCAT/busco.dating.chr16nr.supermatrix.fa.stats | ||
|
||
|
||
|
||
|
||
## remove alignment sites with gaps | ||
#INPUTLIST="common.buscos.chr1-15.lst" | ||
cat common.buscos.chr1-15.lst | grep -vwP '3303at7399|850at7399' > common.buscos.chr1-15.removed.3303.850.lst | ||
INPUTLIST="common.buscos.chr1-15.removed.3303.850.lst" | ||
#2161 samples | ||
mkdir run1/dating.msa.chr1-15.sorted.pxclsq | ||
rm -f run1/dating.msa.chr1-15.sorted.pxclsq/stats.txt | ||
|
||
N=$(cat $INPUTLIST | wc -l) && echo $N | ||
for (( i = 1 ; i < $N+1 ; i++)) | ||
do | ||
BUSCOGENE=$(cat $INPUTLIST | sed -n $i'p') | ||
echo $i". "$BUSCOGENE | ||
cat run1/dating.msa.chr1-15.sorted/$BUSCOGENE.fa | pxclsq -p 1.0 > run1/dating.msa.chr1-15.sorted.pxclsq/$BUSCOGENE.fa | ||
cat run1/dating.msa.chr1-15.sorted.pxclsq/$BUSCOGENE.fa | pxlssq > run1/dating.msa.chr1-15.sorted.pxclsq/$BUSCOGENE.fa.stats | ||
echo $BUSCOGENE >> run1/dating.msa.chr1-15.sorted.pxclsq/stats.txt | ||
cat run1/dating.msa.chr1-15.sorted.pxclsq/$BUSCOGENE.fa.stats >> run1/dating.msa.chr1-15.sorted.pxclsq/stats.txt | ||
done | ||
|
||
cat run1/dating.msa.chr1-15.sorted.pxclsq/stats.txt | grep -B 2 "Number of sequences: " | grep -vP 'Number of|File type|--' > run1/dating.msa.chr1-15.sorted.pxclsq/stats.short.txt | ||
|
||
cat run1/dating.msa.chr1-15.sorted.pxclsq/stats.txt | grep " -" | ||
cat run1/dating.msa.chr1-15.sorted.pxclsq/stats.txt | grep " ?" | ||
cat run1/dating.msa.chr1-15.sorted.pxclsq/stats.txt | grep "Number of sequences: " | wc -l | ||
cat run1/dating.msa.chr1-15.sorted.pxclsq/stats.txt | grep -B 2 "Number of sequences: " | grep -vP 'Number of|File type|--' | wc -l | ||
#2161 | ||
|
||
### concat alignment | ||
FOLDERCONCAT="run1/dating.msa.chr1-15.sorted.pxclsq.concat" | ||
mkdir $FOLDERCONCAT | ||
pxcat -s run1/dating.msa.chr1-15.sorted.pxclsq/*.fa -p $FOLDERCONCAT/busco.dating.chr1-15.partitions -o $FOLDERCONCAT/busco.dating.chr1-15.supermatrix.fa | ||
pxlssq -s $FOLDERCONCAT/busco.dating.chr1-15.supermatrix.fa > $FOLDERCONCAT/busco.dating.chr1-15.supermatrix.fa.stats | ||
cat $FOLDERCONCAT/busco.dating.chr1-15.supermatrix.fa.stats | ||
|
||
|
||
|
||
|
||
### 2 genes cause segfault because during filtering for 0% gaps no sequence is left. lets remove these from the analysis | ||
1635. 3303at7399 | ||
Segmentation fault (core dumped) | ||
2042. 850at7399 | ||
Segmentation fault (core dumped) | ||
run1/dating.msa.chr1-15.sorted.pxclsq/850at7399.fa | ||
run1/dating.msa.chr1-15.sorted.pxclsq/3303at7399.fa | ||
|
||
## these genes were removed from further analysis (above) | ||
|
||
|
||
|
Oops, something went wrong.