Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an ensembler of MetaLearners #53

Open
kklein opened this issue Jul 7, 2024 · 3 comments
Open

Implement an ensembler of MetaLearners #53

kklein opened this issue Jul 7, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@kklein
Copy link
Collaborator

kklein commented Jul 7, 2024

sklearn provides a BaseEnsemble class which can be used to ensemble various Estimators.

Unfortunately, sklearn's BaseEnsemble does not work out of the box with a MetaLearner from metalearners due to differences in predict and fit signatures.

In order to facilitate the ensembling of CATE estimates from various MetaLearners, it would be useful to implement helpers.

Some open questions:

  • Should the ensemble be given trained MetaLearners or train the MetaLearners itself?
  • Should the ensemble require all MetaLearners to have been trained on exactly the same data?
  • Should the ensemble work with both, in-sample and out-of-sample data, too?
@kklein kklein added the enhancement New feature or request label Jul 7, 2024
@FrancescMartiEscofetQC
Copy link
Contributor

FrancescMartiEscofetQC commented Jul 10, 2024

If we want to make it work with in-sample data, we obviously need that they have been trained on exactly the same data. I think the best option for this is that the user provides already initialized metalearners (fitted or unfitted) and then we implement a fit method which calls fit for all the metalearners with the same parameters. To avoid issues with copies and stuff I would suggest implementing a clone method for the MetaLearner which initializes a metalearner with the same parameters.
I think implementing it for both in-sample and oos data is not much more work than implementing it only for in-sample and this allows the user to choose which option they want to use.

I think that if the user wants to use only for oos data and metalearners with different training data, they can easily and it does not require a lot of work.

@erikcs
Copy link

erikcs commented Sep 7, 2024

Cool package. Nie & Wager’s R-loss gives you an approach for ensembling CATE estimators: stack many final-stage CATE estimators and minimize that loss. They discuss this in section 4.2 of the R-learner paper. Here’s a paper trying it out in case it’s helpful: https://arxiv.org/abs/2202.12445. On a general note, you can take the same ensembling approach to estimate nuisance components $E[W_i|X_i], E[Y_i|X_i]$ for metalearners too, but then by minimizing the standard predictive loss (that’s what van der Laan typically refers to as superlearning in TMLE).

@kklein
Copy link
Collaborator Author

kklein commented Oct 14, 2024

Hi @erikcs - apologies for the super late reply.
Thanks a lot for the reference (and the kind words :)). We'll take a look asap!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants