PhyloHeat

Phylogenomic clustering and visualization for species determination. For now, PhyloHeat can be used to visualize the species and subgroup differentiation of bacterial genomes. There are two main clustering approach as comparing ANI values (with recommended thresholds) and creating a distance matrix based on shared protein families. The two approach is supporting each other most of the time.

Requirements

Linux (not tested on Windows and MacOS)
Phyton 3.8 or later

The required packages

Pandas pandas-1.3.3
Seaborn seaborn-0.11.2
Scipy scipy-1.7.1
Matplotlib matplotlib-3.4.3

Installation

I recommend to use conda but not necessary

conda update conda --all
conda create --name PhyloHeat python=3.8
conda activate PhyloHeat

You can use pip to install dependecies

pip install pandas
pip install seaborn

(seaborn come with matplolib and scipy, so you do not need to install them again)

Demo Run

You can try the initially prepared config file and demo folder to see what can be done with PhyloHeat. All you need is just to run PhyloHeat.py file.

conda activate PhyloHeat
git clone https://github.com/recepcanaltinbag/PhyloHeat.git
cd PhyloHeat
python3 PhyloHeat.py

The outputs can be found in 'demo/output' folder. Example output:

Example Run

All you need to do is changing the parameters in the 'config.conf' file.

Input data formats

All inputs must be in the 'csv' format.

genomes_info.csv : UID (Assembly IDs) and Nice Name (You can use desired names) column names must be same. It just needed for final output because using just UIDs or any IDs can be confusing when looking and analyzing the heatmap.
matrix.csv : A square matrix is needed. Genomes must be represented as UID (Assembly IDs) and their shared protein families must be in intersected cells.
matrix_with_names.csv: Same matrix with previous file but UIDs are changed with Nice Names.
Whole_ANI.csv: Whole genome ANI values in a square matrix form. UIDs must be used.
16S_ANI.csv: 16S ANI values in a square matrix form. UIDs must be used.

You can use different IDs and names as long as they are compatible in all input files.

config.conf file format

inputs

Inputs were explained previously, you must enter the path to config file for all inputs.

settings

ref_genome_name: Reference genome name, the name must be same with the name in the genomes_info file.
ref_genome_id: Reference genome id, the id must be same with the id in the other files.
user_min_ANI_Whole: Initially 0.85. It is for the defining the lowest ANI color.
user_min_ANI_16S: Initially 0.95. It is for the defining the lowest ANI color.
cluster_strictness: Initially 0.2, can be modified to retrieve more stringent or relaxed clusters
vmin: Minimum number of shared protein families (ınitially 4000) Just for coloring, you can try different values
vmax: Maximum number of shared protein families (ınitially 6000) Just for coloring, you can try different values
dpi: Output pdf dpi

Outputs

heatmat.pdf: output heatmap path
species.csv: clustering results as csv file. Same numbers represents same classes. Different classes can be sign for different species.

Uninstallation

conda env remove --name PhyloHeat

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
demo		demo
example_outputs		example_outputs
PhyloHeat.py		PhyloHeat.py
README.md		README.md
config.conf		config.conf
heatmap_functions.py		heatmap_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhyloHeat

Requirements

The required packages

Installation

Demo Run

Example Run

Input data formats

config.conf file format

inputs

settings

Outputs

Uninstallation

About

Releases

Packages

Languages

recepcanaltinbag/PhyloHeat

Folders and files

Latest commit

History

Repository files navigation

PhyloHeat

Requirements

The required packages

Installation

Demo Run

Example Run

Input data formats

config.conf file format

inputs

settings

Outputs

Uninstallation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages