Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importance in auto_test #401

Closed
be-marc opened this issue Nov 12, 2019 · 12 comments
Closed

importance in auto_test #401

be-marc opened this issue Nov 12, 2019 · 12 comments

Comments

@be-marc
Copy link
Member

be-marc commented Nov 12, 2019

auto_test fails if the named vector returned by learner$importance() contains a name which is not present in task$feature_names. However, some learners return the variable importance for each level of a factor. As a result, the returned named vector contains names like factor.level. For example h2o::h2o.deeplearning in mlr.

Do we want to support variable importance for factor levels?

@mllg
Copy link
Member

mllg commented Nov 12, 2019

I don't think that this is a proper design. Having a single numeric value for each feature makes everything so much easier.

If the model returns multiple importance values for a single feature, we might want to aggregate it (mean/max)?

@berndbischl ?

@berndbischl
Copy link
Member

i dont think there is a simple solution. naive aggregation i fear is really not appropriate.

the question is how much we want to implement here ourselves

currently i guess we have to mark the leaners as "does not support importances"?

and open issues in mlr3 learners to repair this?

@berndbischl
Copy link
Member

@berndbischl
Copy link
Member

OTOH we become super-restrictive if we completely throw these importance scores away if they do not adhere to our scheme. very unsure here what is best.
can we maybe not simply "live" with somewhat more unstructured scores?

@berndbischl
Copy link
Member

with IML you can then calculate model-agnostic scores which really produce a number per feature?

@mllg
Copy link
Member

mllg commented Nov 27, 2019

  1. For all numeric features, these scores are well-defined and could be returned.
  2. A naive aggregation of these p-values is surely not statistically sound if you want to interpret them as p-values, but isn't this okay for "scoring" the importance if this is well documented?
  3. The aggregation method could be a parameter of $importance() / hyperparameter of the learner.
  4. We have the same problem for xgboost where we manually preprocess with dummy encoding, right?

@berndbischl
Copy link
Member

2. but isn't this okay for "scoring" the importance if this is well documented?

i have no idea? but a gut feeling its not ok? do you really want to implement this without being able to cite a singe paper where this is explored? or an "obvious" reason why this can make sense?

@berndbischl
Copy link
Member

4. We have the same problem for xgboost where we manually preprocess with dummy encoding, right?

what do you mean here exactly?

@pat-s
Copy link
Member

pat-s commented Mar 26, 2020

@berndbischl We should find a solution here, this is kinda blocking.

Also applies to mlr-org/mlr3learners#28 (where glmnet returns scores for every multiclass response instance).

I also do not think that we should do too much automatic aggregation - remember that many people will just "trust" the defaults and might do bad things if the results are "too easy".

@berndbischl
Copy link
Member

@berndbischl We should find a solution here, this is kinda blocking.

why is this blocking? mark the learner as does-not-support-importance?

@mb706
Copy link
Collaborator

mb706 commented Apr 12, 2020

If we use importance for feature filtering we need to be able to rely on strict adherence to the interface. Solution that I see:

  • aggregate importance to feature-level in some way. If there are multiple ways, use a parameter to choose between them.
  • possibly add a new slot that gives the true importance value that is not bound to a format. We could agree to $importance_raw and make this a part of the Learner interface, but I could also see just choosing any name that fits for each specific learner (and documenting the slot, of course).

@berndbischl
Copy link
Member

we agreed that the contract is that every baselearner who supports importances needs to be able to return exactly 1 num score per features. and how this produced is up to that learner. we do not implement any aggregation unless it is clear that this is scientifically valid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants