Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with kprototype predict function parameters when used in sklearn pipeline #60

Open
soufianee opened this issue Dec 14, 2017 · 3 comments
Labels

Comments

@soufianee
Copy link

hi,

i appreciate your work in k-prototypes. i have a datasets containing numerical and categorical variables. and i wan't to used with others mehtods inside a SKlearn pipline. the fitting is performed right by addiding the categorical parameter like this "Mymodel__categorical". but when i wan't to use the predict function the pipeline doesn't allow to add any parameters other than the inputs.
Accordingly, i think that the k-prototypes class do not persist the categorical parameter in learning process and miss a predict function with one parameter of inputs to work well with sklearn pipeline.

thank you, scincerly.

@nicodv
Copy link
Owner

nicodv commented Dec 14, 2017

The API of the kmodes.predict method indeed needs a categorical argument, but sklearn does not allow for extra arguments to the predict method. This causes kmodes to be incompatible with some higher-level functionality of sklearn, such as Pipelines.

So, this is a know issue, and can not be resolved without changes to either API.

The only solution that seems somewhat acceptable to me is to move the categorical argument to the __init__ of KModes/KPrototypes, but I don't like it conceptually.

Suggestions are welcome.

@soufianee
Copy link
Author

soufianee commented Dec 14, 2017

thank you Mrs. nicodv.

yes by moving the categorical argument to the init and saved when calling it in fit function is the solution.

eventually, i tinkled my program to work with the current configuration. but i advice you to think about it. to be compatible with the pipeline methods because its widely used data science community and in spark programing.

scincerly.

@aiborra11
Copy link

Hi,
Just wondering if there is any solution for this? Trying to create a sklearn pipeline for my KPrototypes model but can't see how to pass the categorical index list as an argument when fitting/predicting the model...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants