You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on the results of running kprototypes on the stocks.csv file included in the examples, I have concluded that kprototypes.cluster_centroids_ represents the centroids in the following format:
[array([cluster 0 centroid coordinates in numerical space],
[cluster 1 centroid coordinates in numerical space], ...),
array([cluster 0 centroid coordinates in categorical space],
[cluster 1 centroid coordinates in categorical space], ...)]
where the i-th cluster centroid coordinates in either numerical or categorical space is of the form
[x_i,0, x_i,1, ...]
where
x_i,0 is the coordinate for the first (i.e. left-most) column of (categorical or numerical) data
x_i,1 is the coordinate for the second (i.e. second left-most) column of (categorical or numerical) data
...
and where the j-th cluster centroid coordinate values in categorical space are elements of the set
{0, 1, 2, ...}
where
a value of 0 represents the category value whose name is first in alphabetical order
a value of 1 represents the category value whose name is second in alphabetical order
...
i.e. the numerical values represent the mode (i.e. most frequently occurring) categorical value for the cluster, and where the numerical values shown are chosen by putting the category names in alphabetical order and representing the first name by 0, the second name by 1, etc.
Can you please tell me if my conclusions are correct? If there is documentation that describes all this, I apologize for this long-winded question, and I would greatly appreciate a pointer to that documentation.
Thank you very much.
The text was updated successfully, but these errors were encountered:
The mapping between the original categorical values and the {0, 1, 2, ...} values you see in the cluster centroids is not based on alphabet. Instead, you can look at kprotoypes.enc_map_ how the mapping is defined.
This is how it works in the version 0.6, but in version 0.7 this has changed. Instead of presenting the categorical mapping, it will simply show the original categorical values in the cluster centroids. That way, you don't have to concern yourself with that mapping at all.
Based on the results of running kprototypes on the stocks.csv file included in the examples, I have concluded that kprototypes.cluster_centroids_ represents the centroids in the following format:
[array([cluster 0 centroid coordinates in numerical space],
[cluster 1 centroid coordinates in numerical space], ...),
array([cluster 0 centroid coordinates in categorical space],
[cluster 1 centroid coordinates in categorical space], ...)]
where the i-th cluster centroid coordinates in either numerical or categorical space is of the form
[x_i,0, x_i,1, ...]
where
x_i,0 is the coordinate for the first (i.e. left-most) column of (categorical or numerical) data
x_i,1 is the coordinate for the second (i.e. second left-most) column of (categorical or numerical) data
...
and where the j-th cluster centroid coordinate values in categorical space are elements of the set
{0, 1, 2, ...}
where
a value of 0 represents the category value whose name is first in alphabetical order
a value of 1 represents the category value whose name is second in alphabetical order
...
i.e. the numerical values represent the mode (i.e. most frequently occurring) categorical value for the cluster, and where the numerical values shown are chosen by putting the category names in alphabetical order and representing the first name by 0, the second name by 1, etc.
Can you please tell me if my conclusions are correct? If there is documentation that describes all this, I apologize for this long-winded question, and I would greatly appreciate a pointer to that documentation.
Thank you very much.
The text was updated successfully, but these errors were encountered: