Enhance random initialization for K-means part of K-prototypes #116

regorsmitz · 2019-04-04T12:13:44Z

Thanks @nicodv for your response to my previous question about failed KPrototype initialization, and for building this library, which I have found very helpful!

Now I see that your KMeans implementation uses points selected from normal distribution to initialize—sorry for my previous confusion. That being said, I don’t think that the current behavior is appropriate to all use cases, and for example in my case, it is important that the initialization always succeeds, because I’d ideally like to be able to use this job as part of a production pipeline. I think random initialization of K means is a standard thing, and if n_init is set high enough, it should be reasonably accurate depending on the dataset.

I would just select a random set of points from my dataset to explicitly pass to the K Means initialization, but (correct me if I’m wrong but) it seems that this approach does not allow one to take advantage of n_init > 1, which makes random initialization much more likely to be suboptimal.

Thanks for reading and sorry to be filling this repo with issues. If you want me to put in a PR for this change, I can give it a shot (adding something like init=‘all-random’ to KPrototypes only, which randomly initializes the K Means component n_init times).

nicodv · 2019-04-04T17:08:16Z

I've followed the papers by Huang (https://github.com/nicodv/kmodes#huang98), which do the sampling from a normal distribution..

Feel free to make a PR for this. It makes sense to open up the initialization of the k-means part of k-prototypes to enhancements. We'd have init_num and init_cat arguments to k-prototypes, I'd imagine.

In the meantime, you can do the sampling yourself and re-run k-prototypes each time with the chosen points as the initialization points. You're right, it's not supported out of the box.

nicodv changed the title ~~Implement random initialization for Kmeans~~ Enhance random initialization for K-means part of K-prototypes Apr 4, 2019

nicodv added the enhancement label Apr 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance random initialization for K-means part of K-prototypes #116

Enhance random initialization for K-means part of K-prototypes #116

regorsmitz commented Apr 4, 2019 •

edited

Loading

nicodv commented Apr 4, 2019

Enhance random initialization for K-means part of K-prototypes #116

Enhance random initialization for K-means part of K-prototypes #116

Comments

regorsmitz commented Apr 4, 2019 • edited Loading

nicodv commented Apr 4, 2019

regorsmitz commented Apr 4, 2019 •

edited

Loading