You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in this topic on Dask's forum, my colleague and I compared in a distributed environment the dask-ml implementation of the KMeans class with our own implementation. During the comparison, we observed that the dask-ml initialization doesn't appear to use weights during the centroid re-clustering phase.
In the current dask-ml KMeans implementation, the standard KMeans algorithm is used for centroid re-clustering. In contrast, we incorporated weights into two areas:
KMeans++ initialization.
Weighted average during centroid re-clustering.
Although our implementation is less efficient than dask-ml in terms of execution time, we achieved better results when clustering a blob dataset, likely due to a reduction in the number of clustering iterations rather than direct code optimizations.
If you're interested, feel free to review our repository for further details on our approach: GitHub Repository.
Thank you for considering this issue.
Best regards,
Chiara
The text was updated successfully, but these errors were encountered:
Hello,
As discussed in this topic on Dask's forum, my colleague and I compared in a distributed environment the
dask-ml
implementation of the KMeans class with our own implementation. During the comparison, we observed that thedask-ml
initialization doesn't appear to use weights during the centroid re-clustering phase.In the current
dask-ml
KMeans implementation, the standard KMeans algorithm is used for centroid re-clustering. In contrast, we incorporated weights into two areas:Although our implementation is less efficient than
dask-ml
in terms of execution time, we achieved better results when clustering a blob dataset, likely due to a reduction in the number of clustering iterations rather than direct code optimizations.If you're interested, feel free to review our repository for further details on our approach:
GitHub Repository.
Thank you for considering this issue.
Best regards,
Chiara
The text was updated successfully, but these errors were encountered: