Skip to content

A simple C++ tool for clustering (greedy and hierarchical)

Notifications You must be signed in to change notification settings

Leo-GG/Clustering

Repository files navigation

Clustering Tools

Tools to perform clustering of elements based on pairwise distance/similarity measurements. The input is a list of pairwise distances or similarities, with the follwing format:

ElementX ElementY Distance/Similarity X-Y

If the distances/similarities are not reciprocal ( d(X-Y) != d(Y-X) ), the program will compute the harmonic average and use this value for the clustering. The program can perform several types of clustering:

  • Single-linkage hierarchical clustering, stopping when it reaches a pair of elements is larger than a given cutoff.
  • UPGMA clustering, a hierarchical clustering where the average distances between clusters is considered.
  • Complete-linkage clustering, where the distance between each individual pair of elements in different clusters is considered.
  • SPICKER clustering, where the element with the most neighbors within the cutoff is selected iteratively as a cluster center with all its neighbors as cluster members.
  • K-means

The output is a list of clusters made below the cutoff that have not been merged into a new cluster yet. For each cluster, the clustroid element, radius, maximum distance between cluster elements and a list of members are reported.

TO DO

  • Allow for full Hierarchical clustering generation, generating a full dendogram
  • Refactor the output generation, currently it is all crumped in main.cpp
  • Refactor the clustering process
  • Optimize the SPICKER code (follow comments on the code)

About

A simple C++ tool for clustering (greedy and hierarchical)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published