Skip to content

Latest commit

 

History

History
71 lines (36 loc) · 2.45 KB

README.md

File metadata and controls

71 lines (36 loc) · 2.45 KB

KMeansClusterer Examples

US Cities

This example clusters US cities based on lat/lng and outputs the clusters to the terminal and to a PNG (requires GNUPlot.)

The number of clusters can be configured on the command line:

./examples/cities.rb -k 10

Cities clustering example

Headlines

This example clusters news headlines using a simple word bag extraction of text features. It outputs random samples from each cluster to the terminal.

./examples/headlines.rb -k 16

Datset: Qazvinian and radev 2011.

Pick Best Value for k

This example shows how to pick the best value for k using both the elbow method and the silhouette method.

./examples/pick_k.rb # requires GNUPlot

Initial setup of points, with 4 fairly well-defined clusters:

unclustered points

Elbow method - find the point of diminishing returns:

chart of elbow for k

Silhouette method - pick k with the highest silhouette score

chart of silhouette for k

Points plotted with best k value of 4:

plot of points with best k

MNIST Handwritten Digits

This example clusters handwritten digits from the MNIST database of handwritten digits.

To run this example:

  1. download the MNIST training set images and training set labels and place them in examples/data/mnist/

  2. run ./examples/mnist.rb -k 10

After running k-means, a test set of digits will be classified (by finding the closest cluster) and outputted to a PNG with each cluster represented as a row.

Example PNG output with k=20:

MNIST clustering example

Output of the training set instances closest to the cluster centroids:

MNIST clustering example