This script generates Euclidean genetic distancedistance matrix(es) between pairs of individuals in a vcf file to generate a phylogenetic tree. Transversional genetic variants are weighted to two.
INSTALLATION: Not required
INPUT FILE: gz-compressed vcf file
HOW TO USE:
<1> converting gz-compressed vcf file to genotype file
$ perl reducedvcf.pl VCF_FILE GENOTYPE_FILE , where VCF_FILE is the file name of your gz-compressed vcf file and GENOTYPE_FILE is the name of output genotype file. When you finished running this script, GENOTYPE_FILE.gz will be created.
<2> creating distance matrixes from the gz-compressed genotype file.
$ perl VCF_FILE GENOTYPE_FILE OUTPUT_PREFIX NUMBER_OF_BOOTSTRAPPING_REPLICATES , where VCF_FILE is the name of gz-compressed vcf file,and GENOTYPE_FILE is gz-compressed genotype file, OUTPUT_PREFIX is output prefix, and NUMBER_OF_BOOTSTRAPPING_REPLICATES is the number of bootstrapping replication.
When you finished running this file, the following two files will be created. (a) OUTPUT_PREFIX.bg.tbl => Distance matrix showing Euclidean distance between a pair of individiuals (b) OUTPUT_PREFIX.boot.tbl => Boostrapping distance matrixes generated by resampling
<3> Generating phylogenetic tree
You can use external software to generate a phylogenetic tree. For example, you can use FastME (http://www.atgc-montpellier.fr/fastme/).
<4> Generating a bootstrapping consensus tree
You can use consense in the Phylip package for this. http://evolution.genetics.washington.edu/phylip/
Citation
Please cite this paper if you use these scripts: https://link.springer.com/article/10.1186/s12862-020-01715-3