Random forest classifier for predicting Staphylococcus aureus strains from atopic dermatitis patients. The input data is the abundance or presence-absence table of gene families generated by MCL as well as health status (e.g., disease vs healthy).
RF_staph_partition.r: Determination of training and test partition ratio
RF_staph_partition9-1.r: RF classifier building with a certain partition ratio (in this case 9:1)
Core_Genome_Cutoff_Calculator.py: Calculate the threshold for the number of genomes to decide when a gene is considered a core gene using genome completeness by CheckM.
KEGG_Gene_Category_Mapper.py: Map genes to KEGG annotations and categories and Get the most common function for each gene family generated by MCL.
ko00001_KEGG_hierarchy.keg: KEGG hierarchy file used by the KEGG_Gene_Category_Mapper.py to map higher KEGG functional categories.