Centroids output format #33

nathanlgrossman · 2017-03-29T21:03:25Z

Based on the results of running kprototypes on the stocks.csv file included in the examples, I have concluded that kprototypes.cluster_centroids_ represents the centroids in the following format:
[array([cluster 0 centroid coordinates in numerical space],
[cluster 1 centroid coordinates in numerical space], ...),
array([cluster 0 centroid coordinates in categorical space],
[cluster 1 centroid coordinates in categorical space], ...)]
where the i-th cluster centroid coordinates in either numerical or categorical space is of the form
[x_i,0, x_i,1, ...]
where
x_i,0 is the coordinate for the first (i.e. left-most) column of (categorical or numerical) data
x_i,1 is the coordinate for the second (i.e. second left-most) column of (categorical or numerical) data
...
and where the j-th cluster centroid coordinate values in categorical space are elements of the set
{0, 1, 2, ...}
where
a value of 0 represents the category value whose name is first in alphabetical order
a value of 1 represents the category value whose name is second in alphabetical order
...
i.e. the numerical values represent the mode (i.e. most frequently occurring) categorical value for the cluster, and where the numerical values shown are chosen by putting the category names in alphabetical order and representing the first name by 0, the second name by 1, etc.

Can you please tell me if my conclusions are correct? If there is documentation that describes all this, I apologize for this long-winded question, and I would greatly appreciate a pointer to that documentation.

Thank you very much.

nicodv · 2017-03-30T02:23:52Z

No documentation yet, sorry about that. (#28)

The mapping between the original categorical values and the {0, 1, 2, ...} values you see in the cluster centroids is not based on alphabet. Instead, you can look at kprotoypes.enc_map_ how the mapping is defined.

This is how it works in the version 0.6, but in version 0.7 this has changed. Instead of presenting the categorical mapping, it will simply show the original categorical values in the cluster centroids. That way, you don't have to concern yourself with that mapping at all.

nicodv added the question label Apr 1, 2017

nicodv closed this as completed Sep 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centroids output format #33

Centroids output format #33

nathanlgrossman commented Mar 29, 2017 •

edited

Loading

nicodv commented Mar 30, 2017 •

edited

Loading

Centroids output format #33

Centroids output format #33

Comments

nathanlgrossman commented Mar 29, 2017 • edited Loading

nicodv commented Mar 30, 2017 • edited Loading

nathanlgrossman commented Mar 29, 2017 •

edited

Loading

nicodv commented Mar 30, 2017 •

edited

Loading