Kandinsky

Clustering and Quantization
Using photographs as visual input

Significant colors in a photograph.

Scope

This started as a very simple exploration of the simplest clustering algorithm in use, but I can see that doing a more comprehensive coverage of algorithms may be very valuable. Kandinsky aims to cover:

I. Basic building blocks

Similarity/Distance Measures:
- Euclidean Distance (Cartesian)
- Manhattan Distance
- Cosine Distance
- Mahalanobis Distance
- Domain-specific Distances
Data Preprocessing:
- Feature Scaling and Normalization
- Dimensionality Reduction (e.g., PCA, t-SNE)
Cluster Evaluation:
- Internal Measures (Cohesion, Separation)
  - Silhouette Coefficient
  - Davies-Bouldin Index
- External Measures (vs. Ground Truth)
  - Purity, Rand Index, Adjusted Rand Index

II. Clustering Algorithms

Partitioning-Based
- K-Means (hard assignments)
- K-Medoids (more robust to outliers)
- Fuzzy C-Means (soft assignments)
Hierarchical
- Agglomerative (Bottom-up)
  - Various linkage methods (single, complete, average)
- Divisive (Top-down)
Density-Based
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise, discovers clusters of varying shapes)
- OPTICS (Ordering Points To Identify the Clustering Structure, extension of DBSCAN, provides reachability plot)
- HDBSCAN (Improved density clustering, handles varying densities)
Distribution-Based
- Gaussian Mixture Models (GMM) (assumes data follows mixtures of Gaussian distributions)
Grid-Based
- STING (Statistical Information Grid-based Clustering)
- CLIQUE (Clustering In QUEst)
Neural Network-Based
- Autoencoders (Variational, Denoising, etc.)
  - Learn latent representations for clustering
- Self-Organizing Maps (SOMs)
  - Preserve neighborhood relationships in a grid-like space
- Deep Embedded Clustering (DEC)

III. Additional Stuff to tackle when I get time and braincycles to spare...

Clustering High-Dimensional Data: Image data often results in high-dimensional feature vectors, so techniques for dimensionality reduction become crucial. It is easy to see that distances like Euclidean or Cartesian lose their meaning as we go into higher dimensional data. Also think about situations where one dimension may not advance as much as other - for e.g. considering age and salary, age may only go from 0 to 100, while salary may range from 0 to 1 million (hint: specifically for this example, prefer Manhattan distance over Cartesian).
Clustering Large-Scale Data: When you have many images, scalable clustering algorithms (e.g., sampling or mini-batch variations of standard methods) are essential.
Spectral Clustering (Flexible approach, particularly effective on non-convex cluster shapes)
Graph-Based Clustering
Hybrid Approaches (Combining traditional algorithms with neural networks)
Clustering High-Dimensional Data
Clustering Large-Scale Data (sampling, incremental approaches)
Affinity Propagation (Finds clusters based on message-passing between data points)

...so yeah! there's a bunch of work needed!

Notebooks [WIP]

00 Prep the Pictures
01 K-Means
015 Color Models

Eight Down Toofaan Mail

Kandinsky helped in the cinematography for our feature film Eight Down Toofaan Mail.

Trailer for Eight Down Toofaan Mail on YouTube
After a successful awards run and theatrical distribution in India, the film is now on YouTube (with English Subtitles)
Opening day audience reactions YouTube Shorts
Press release from Ministry of Information and Broadcasting, Govt. of India at IFFI 2021
Full press conference at IFFI 2021 (YouTube)

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
data		data
images		images
misc		misc
pycascades-2024		pycascades-2024
scipy-2024		scipy-2024
vidyalankar-feb-2024		vidyalankar-feb-2024
01-K-Means-and-friends-prep-the-photos.ipynb		01-K-Means-and-friends-prep-the-photos.ipynb
015-Color-Models-and-Conversions.ipynb		015-Color-Models-and-Conversions.ipynb
02-K-Means-and-friends.ipynb		02-K-Means-and-friends.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kandinsky

Scope

Notebooks [WIP]

Eight Down Toofaan Mail

Talks

References

Font

About

Releases

Packages

Contributors 2

Languages

License

shauryashaurya/kandinsky

Folders and files

Latest commit

History

Repository files navigation

Kandinsky

Scope

Notebooks [WIP]

Eight Down Toofaan Mail

Talks

References

Font

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages