Skip to content

zhongjiew/Scripts4PhD

Repository files navigation

R & Python

Random forest classifier for predicting Staphylococcus aureus strains from atopic dermatitis patients. The input data is the abundance or presence-absence table of gene families generated by MCL as well as health status (e.g., disease vs healthy).

RF_staph_partition.r: Determination of training and test partition ratio

RF_staph_partition9-1.r: RF classifier building with a certain partition ratio (in this case 9:1)

Core_Genome_Cutoff_Calculator.py: Calculate the threshold for the number of genomes to decide when a gene is considered a core gene using genome completeness by CheckM.

KEGG_Gene_Category_Mapper.py: Map genes to KEGG annotations and categories and Get the most common function for each gene family generated by MCL.

ko00001_KEGG_hierarchy.keg: KEGG hierarchy file used by the KEGG_Gene_Category_Mapper.py to map higher KEGG functional categories.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published