added a data frame mapper which uses just Pipeline and FeatureUnion #62

chanansh · 2016-08-02T06:49:09Z

I tried to use the DataFrameMapper but had problems with setting parameters of internal models and consistency with other wrapping methods of scikit-learn. I found that one can just FeatureUnion a bunch of pipeline where each pipeline has a front-end column selector Transformer ColumnSelectTransformer followed by the requested list of transformer. The resultant pipeline has also names so one can track back what each parameter means when doing get_params(deep=True).

I hope someone will find this useful.

dukebody · 2016-08-04T16:17:50Z

Hi @chanansh !

I integrated a modified version of your FeatureUnion-based pipeline into the current version of DataFrameMapper in https://github.com/paulgb/sklearn-pandas/tree/feature_union_pipe. Let me know what you think.

vzaretsk · 2016-08-05T06:10:56Z

mapping_to_pipeline doesn't seem to check for column_name uniqueness, which leads to strange behavior if the same column is used twice. Alternatively, it seems this functionality could be added to the current implementation of DataFrameMapper if it was modified to accept step names, similarly to sklearn.pipeline.Pipeline. Unfortunately I don't see an easy way to do that. Maybe check the length of the mapping tuples and if no name is provided, automatically generate one such as "col_trans_0". This shouldn't break backwards compatibility.
@chanansh @dukebody

dukebody · 2016-08-25T14:36:12Z

Good idea! I believe we can generate feature names from the selected columns, like it's done in the PR now, and allow the user to provide a custom name as a third argument to the feature tuple.

For example:

mapper = DataFrameMapper([
    (['height'], StandardScaler()),
    (['height'], None, 'unmodified_height'),
])

Feature names (both custom and auto-generated) can be checked during init to ensure they are unique, and raise an exception otherwise.

Does that sound reasonable?

vzaretsk · 2016-08-27T15:08:00Z

Yeah, that looks like a good approach. I can work on this, but I'm very busy these days so it might be a few weeks until I have time to start.

dukebody · 2016-08-27T15:20:09Z

Don't worry, I'm already working on this myself. Will upload updated code this evening.

havardl · 2017-08-22T07:49:39Z

Will this be implemented in the master any time soon?

dukebody · 2017-09-10T08:52:12Z

Hi @havardl . We worked on this like one year ago, but honestly I don't think it adds so much value in terms of features compared to the amount of flexibility lost and the fact that it would break all previous pickled mappers (it's backwards incompatible).

I believe it would be better to try to implement the feature of setting parameters as mentioned in the original post of this PR.

added a data frame mapper which uses just Pipeline and FeatureUnion

0207e36

dukebody mentioned this pull request Aug 27, 2016

DataFrameMapper using FeatureUnion #64

Open

cstjean mentioned this pull request Feb 8, 2017

GridSearch with pipelines of dataframes cstjean/ScikitLearn.jl#24

Open

vsriram11 mentioned this pull request Apr 28, 2020

Enable None sentinel for columns #207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added a data frame mapper which uses just Pipeline and FeatureUnion #62

added a data frame mapper which uses just Pipeline and FeatureUnion #62

chanansh commented Aug 2, 2016

dukebody commented Aug 4, 2016

vzaretsk commented Aug 5, 2016

dukebody commented Aug 25, 2016 •

edited

Loading

vzaretsk commented Aug 27, 2016

dukebody commented Aug 27, 2016

havardl commented Aug 22, 2017

dukebody commented Sep 10, 2017

added a data frame mapper which uses just Pipeline and FeatureUnion #62

Are you sure you want to change the base?

added a data frame mapper which uses just Pipeline and FeatureUnion #62

Conversation

chanansh commented Aug 2, 2016

dukebody commented Aug 4, 2016

vzaretsk commented Aug 5, 2016

dukebody commented Aug 25, 2016 • edited Loading

vzaretsk commented Aug 27, 2016

dukebody commented Aug 27, 2016

havardl commented Aug 22, 2017

dukebody commented Sep 10, 2017

dukebody commented Aug 25, 2016 •

edited

Loading