A repository for modified code from the paper "Power k-Means Clustering" for RPI ML and Optimization Spring 2023
This code runs using Python3 with additional support from the machine learning libraries sklearn and PyTorch.
Run experiments using the command python run_experiments.py. Additional experiments can be run by modifying variables within the run_experiments.py file.
Synthetic datasets are seeded so generation is consistent across multiple runs. Default dataset settings use a binomial distribution of 99 data points across 3 clusters. Experiments are run across 250 trials for each Power k-Means initial value
Default experiment settings use
Optional support is added for cluster plotting in 2-D and result graphing across all three metrics. Metrics have the option of being plotted using either individual clusters for each Power k-Means initialization or color-coded such that vanilla k-Means is plotted in blue and all versions of Power k-Means are plotted in red for better visualization. Default settings use the two-color plotting scheme, but this can be changed by switching the colors parameter in make_plots to 0.
For real datasets, use the file "run_experiments_real.py". The High dimensional data must be downloaded from this site https://archive.ics.uci.edu/ml/datasets/gene+expression+cancer+RNA-Seq as it is too large for github, but the label set must be from this repository ("gene_labels.csv"), as the labels were pre processed.
The only modifications that should be made are in the "main" function. CSV files containing numerical results and their respective plots are located in the Results folder.