Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 2.18 KB

README.md

File metadata and controls

34 lines (25 loc) · 2.18 KB

phash-hierarchical-clustering

An app clusters a given set of images and displays results via a simple JavaFX GUI. First, Perceptual Hashing is used to map the images to binary feature vectors. Then Agglomerative Hierarchical Clustering with Hamming distance as a distance measure is used to group similar binary vectors.

Note: we use a low hard-coded cutHight value of 8.0 in order to cut the dendrogram tree into small clusters with low number of outliers. You might experiment with different values of cutHeight in the HCluster depending on your dataset size and required 'quality' of the clustering.

Running

Build the project with sbt assembly. This will generate a phash-hierarchical-clustering-assembly-<version>.jar uberjar file in the target/scala-<scalaVersion> subdirectory (where <version> is the current version defined in build.sbt).

Run the application from the .jar with the java -jar command, e.g.:

  • java -jar target/scala-2.12/phash-hierarchical-clustering-assembly-1.0.jar <imageDirectory> this might take a while the 1st time, since the app needs to compute the phash value for every image in the <imageDirectory>

<imageDirectory> is the folder where the images are stored (use as many images as possible for better results).

Sample results

  • Sample clusters from a dataset consisting of 5K images with Apple logo Cluster 1 Cluster 2 Cluster 3

  • A dendrogram illustrate the result of Hierarchical Clustering used with complete agglomeration method (see Smile docs for more details) Dendrogram