importance in auto_test #401

be-marc · 2019-11-12T11:47:25Z

auto_test fails if the named vector returned by learner$importance() contains a name which is not present in task$feature_names. However, some learners return the variable importance for each level of a factor. As a result, the returned named vector contains names like factor.level. For example h2o::h2o.deeplearning in mlr.

Do we want to support variable importance for factor levels?

The text was updated successfully, but these errors were encountered:

mllg · 2019-11-12T12:47:59Z

I don't think that this is a proper design. Having a single numeric value for each feature makes everything so much easier.

If the model returns multiple importance values for a single feature, we might want to aggregate it (mean/max)?

@berndbischl ?

berndbischl · 2019-11-27T10:03:42Z

i dont think there is a simple solution. naive aggregation i fear is really not appropriate.

the question is how much we want to implement here ourselves

currently i guess we have to mark the leaners as "does not support importances"?

and open issues in mlr3 learners to repair this?

berndbischl · 2019-11-27T10:04:22Z

for reference we also discussed this for the LM
https://stats.stackexchange.com/questions/31690/how-to-test-the-statistical-significance-for-categorical-variable-in-linear-regr

berndbischl · 2019-11-27T10:06:07Z

OTOH we become super-restrictive if we completely throw these importance scores away if they do not adhere to our scheme. very unsure here what is best.
can we maybe not simply "live" with somewhat more unstructured scores?

berndbischl · 2019-11-27T10:06:46Z

with IML you can then calculate model-agnostic scores which really produce a number per feature?

mllg · 2019-11-27T17:49:55Z

For all numeric features, these scores are well-defined and could be returned.
A naive aggregation of these p-values is surely not statistically sound if you want to interpret them as p-values, but isn't this okay for "scoring" the importance if this is well documented?
The aggregation method could be a parameter of $importance() / hyperparameter of the learner.
We have the same problem for xgboost where we manually preprocess with dummy encoding, right?

berndbischl · 2019-11-28T01:39:25Z

2. but isn't this okay for "scoring" the importance if this is well documented?

i have no idea? but a gut feeling its not ok? do you really want to implement this without being able to cite a singe paper where this is explored? or an "obvious" reason why this can make sense?

berndbischl · 2019-11-28T01:39:47Z

4. We have the same problem for xgboost where we manually preprocess with dummy encoding, right?

what do you mean here exactly?

pat-s · 2020-03-26T18:56:06Z

@berndbischl We should find a solution here, this is kinda blocking.

Also applies to mlr-org/mlr3learners#28 (where glmnet returns scores for every multiclass response instance).

I also do not think that we should do too much automatic aggregation - remember that many people will just "trust" the defaults and might do bad things if the results are "too easy".

berndbischl · 2020-03-30T11:07:00Z

@berndbischl We should find a solution here, this is kinda blocking.

why is this blocking? mark the learner as does-not-support-importance?

mb706 · 2020-04-12T11:41:48Z

If we use importance for feature filtering we need to be able to rely on strict adherence to the interface. Solution that I see:

aggregate importance to feature-level in some way. If there are multiple ways, use a parameter to choose between them.
possibly add a new slot that gives the true importance value that is not bound to a format. We could agree to $importance_raw and make this a part of the Learner interface, but I could also see just choosing any name that fits for each specific learner (and documenting the slot, of course).

berndbischl · 2021-10-04T08:28:44Z

we agreed that the contract is that every baselearner who supports importances needs to be able to return exactly 1 num score per features. and how this produced is up to that learner. we do not implement any aggregation unless it is clear that this is scientifically valid

mllg added Priority: Medium Type: Question labels Jan 8, 2020

be-marc mentioned this issue Mar 25, 2020

Implement importance method in h2odeeplearning mlr3learners/mlr3learners.h2o#2

Open

mllg mentioned this issue Jul 30, 2020

Relax autotest importance and selected features naming requirement #495

Open

berndbischl closed this as completed Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

importance in auto_test #401

importance in auto_test #401

be-marc commented Nov 12, 2019

mllg commented Nov 12, 2019

berndbischl commented Nov 27, 2019

berndbischl commented Nov 27, 2019

berndbischl commented Nov 27, 2019

berndbischl commented Nov 27, 2019

mllg commented Nov 27, 2019

berndbischl commented Nov 28, 2019

berndbischl commented Nov 28, 2019

pat-s commented Mar 26, 2020

berndbischl commented Mar 30, 2020

mb706 commented Apr 12, 2020

berndbischl commented Oct 4, 2021

importance in auto_test #401

importance in auto_test #401

Comments

be-marc commented Nov 12, 2019

mllg commented Nov 12, 2019

berndbischl commented Nov 27, 2019

berndbischl commented Nov 27, 2019

berndbischl commented Nov 27, 2019

berndbischl commented Nov 27, 2019

mllg commented Nov 27, 2019

berndbischl commented Nov 28, 2019

berndbischl commented Nov 28, 2019

pat-s commented Mar 26, 2020

berndbischl commented Mar 30, 2020

mb706 commented Apr 12, 2020

berndbischl commented Oct 4, 2021