You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks @nicodv for your response to my previous question about failed KPrototype initialization, and for building this library, which I have found very helpful!
Now I see that your KMeans implementation uses points selected from normal distribution to initialize—sorry for my previous confusion. That being said, I don’t think that the current behavior is appropriate to all use cases, and for example in my case, it is important that the initialization always succeeds, because I’d ideally like to be able to use this job as part of a production pipeline. I think random initialization of K means is a standard thing, and if n_init is set high enough, it should be reasonably accurate depending on the dataset.
I would just select a random set of points from my dataset to explicitly pass to the K Means initialization, but (correct me if I’m wrong but) it seems that this approach does not allow one to take advantage of n_init > 1, which makes random initialization much more likely to be suboptimal.
Thanks for reading and sorry to be filling this repo with issues. If you want me to put in a PR for this change, I can give it a shot (adding something like init=‘all-random’ to KPrototypes only, which randomly initializes the K Means component n_init times).
The text was updated successfully, but these errors were encountered:
Feel free to make a PR for this. It makes sense to open up the initialization of the k-means part of k-prototypes to enhancements. We'd have init_num and init_cat arguments to k-prototypes, I'd imagine.
In the meantime, you can do the sampling yourself and re-run k-prototypes each time with the chosen points as the initialization points. You're right, it's not supported out of the box.
nicodv
changed the title
Implement random initialization for Kmeans
Enhance random initialization for K-means part of K-prototypes
Apr 4, 2019
Thanks @nicodv for your response to my previous question about failed KPrototype initialization, and for building this library, which I have found very helpful!
Now I see that your KMeans implementation uses points selected from normal distribution to initialize—sorry for my previous confusion. That being said, I don’t think that the current behavior is appropriate to all use cases, and for example in my case, it is important that the initialization always succeeds, because I’d ideally like to be able to use this job as part of a production pipeline. I think random initialization of K means is a standard thing, and if n_init is set high enough, it should be reasonably accurate depending on the dataset.
I would just select a random set of points from my dataset to explicitly pass to the K Means initialization, but (correct me if I’m wrong but) it seems that this approach does not allow one to take advantage of n_init > 1, which makes random initialization much more likely to be suboptimal.
Thanks for reading and sorry to be filling this repo with issues. If you want me to put in a PR for this change, I can give it a shot (adding something like init=‘all-random’ to KPrototypes only, which randomly initializes the K Means component n_init times).
The text was updated successfully, but these errors were encountered: