Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended heuristic for integrating "policy tree" with "honest causal forest" #124

Open
njawadekar opened this issue Nov 18, 2021 · 5 comments
Labels
question Further information is requested

Comments

@njawadekar
Copy link

njawadekar commented Nov 18, 2021

The below post is more of a methodological question than a technical one.

Based on what I've gathered, the honest causal forest and policy tree are two distinct yet related methods. Both can evidently yield actionable insights on the effects of treatment within a heterogeneous population. However,

  • The honest causal forest can estimate conditional average treatment effects across non-prespecified and heterogeneous subgroups
  • Whereas, the policy tree can identify a data-driven optimal treatment rule for a given sample based on the observed data

So, while the honest causal forest is a bit more exhaustive (since it estimates pretty granular subgroup-specific causal estimates), the policy tree provides a bit more of "broad brush" strategy to these heterogeneities by identifying an optimal treatment rule that can be applied to a population for making treatment decisions.

Question:
Given that both the honest causal forest and policy tree address similar research objectives (i.e., to help understand heterogeneities that exist in a population, so that we can make better decisions), has your research group developed any standard protocols or heuristics for incorporating the results of the honest causal forest into the inputs of the policy tree model? For example, would it be reasonable to develop a protocol whereby we only input covariates into the policy tree that were listed among the top 10% of the most "Important" variables for heterogeneities within the honest causal forest, or something like that?

@erikcs
Copy link
Member

erikcs commented Nov 18, 2021

Hi @njawadekar

For example, would it be reasonable to develop a protocol whereby we only input covariates into the policy tree that were listed among the top 10% of the most "Important" variables for heterogeneities within the honest causal forest

Yes, that's a perfectly fine heuristic and is suggested here #46 in order to make a setting with many covariates feasible for policy_tree.

Our research group (@halflearned) has been working on an online tutorial for ML-based HTE estimation, you might find the section on policy learning useful: https://bookdown.org/stanfordgsbsilab/tutorial/policy-learning-i-binary-treatment.html

@erikcs erikcs added the question Further information is requested label Nov 18, 2021
@erikcs
Copy link
Member

erikcs commented Nov 26, 2021

Also, if you're looking for a real-world empirical application, @hhsievertsen has a paper using causal forest + policy tree here https://github.com/hhsievertsen/hhsievertsen.github.io/raw/master/mat/wp/chx_sep2021.pdf

@hanneleer
Copy link

hanneleer commented Sep 16, 2024

Dear @erikcs ,

First thanks a lot for the package and the guidance!

I have a question related to the one raised by Njawadekar, and I was hoping you could provide some additional intuition.

I'm running the policy tree algorithm, where I only input covariates that are among the top 10% of the most "important" variables based on the DR scores from that causal forest that I took these variables from. I followed the tutorial you recommended (https://bookdown.org/stanfordgsbsilab/tutorial/policy-learning-i-binary-treatment.html). Before running the policy tree, I adjusted the double-robust scores by multiplying them by -1, since I am aiming to minimize dropout rates. However, there are still a few points that seem unclear to me, and I might be misunderstanding some aspects.

When I estimate the value of the learned policy, I observe for example a decrease of about 4 percentage points. However, when I examine the tree and test whether the treatment effects vary across the leaves, I run into some confusion. If I understand correctly, a "Leaf node action = 2" indicates the individuals for whom the policy is recommended. However, I never see a decrease of more than 1 percentage point for any leaf node = 2 (so how can this be an optimal tree and still have an overall value of 4 percentage point, and be a subgroup recommended to treat). Most of the larger percentage decreases occur in leaf node = 1, and even then, for relatively small sample sizes. This makes it difficult for me to reconcile the estimated policy value of a 4 percentage point decrease?

Am I misinterpreting the results of the tree in relation to the estimated value of the learned policy, or might there be something else I am doing incorrectly?

Thank you so much!

@erikcs
Copy link
Member

erikcs commented Sep 22, 2024

Hi @hanneleer, sorry, I'm having some trouble understanding exactly what you are asking for, if you post a simple code example along with toy/real data that illustrates your question, that could probably help.

@hanneleer
Copy link

hanneleer commented Sep 25, 2024

Hi @erikcs - thanks a lot for coming back to me!! (I am not sure how to get it in perfect tables and codes in github - I am really sorry about the way the code is provided!)

So I started with just fitting my causal forest - and retrieved the double robust scores. As I look at the impact of the policy on the probability of dropping out (aim: lower dropout = better) - I also multiplied my gamma.matrix with -1 to be sure that I minimize the rewards.

Fit a policy tree on forest-based AIPW scores

Forest <- causal_forest(data$X, data$Y, data$W, cluster = data$school_level)
CATE_Forest <- average_treatment_effect(Forest, target.sample = "all")

"Estimate: -0.0106, Std. Error: 0.0013"

Gamma.matrix <- double_robust_scores(forest)
Gamma.matrix <- double_robust_scores*-1

head(Gamma.matrix)
[,1] [,2]
[1,] -2.0112 -0.03817
[2,] 0.01469 -0.01055
[3,] -0.0071 0.00363
[4,] 0.05937 -0.04230
[5,] 0.02668 -0.01810
[6,] 0.01027 -0.00745
[reached getOption ("max.print") -- omitted 135000 rows ]

Divided the data into train and evaluation sets

Train <- sample(nrow(data), size = nrow(data)*0.7, replace = F)
Test <- data[-train,]

Fit the policy tree

policy_tree <- hybrid_policy_tree(X[Train,], Gamma.matrix[Train,], depth = 5)
plot(policy_tree)
#policy_tree object
Tree depth: 2
Actions: 1 2
#Variable splits:
#(1) split_variable: X4 split_value: 1
#(2) * split_variable: X3 split_value: 0
# (4) split_variable: X2 split_value: 0
# (8) * action: 2
# (9) * action: 1
# (5) split_variable: X5 split_value: 0
# (10) * action: 2
#(11) * action: 1
#(3) split_variable: X1 split_value: 0
#(6) * action: 2
#(7) split_variable: X5 split_value: 1
#(12) * action: 2
#(13) * action: 1
``
image

Predicting treatment on test set

pi.hat <- predict(policy, X[Test,]) - 1

Predicting leaves

Leaf <- predict(policy, X[Test,], type = "node.id")
Num.leaves <- length(unique(Leaf))

Estimate the value of the learned policy**

gamma.hat.1 <- Gamma.matrix[Test,2]
gamma.hat.0 <- Gamma.matrix[Test,1]
gamma.hat.pi <- pi.hat * gamma.hat.1 + (1 - pi.hat) * gamma.hat.0
value.aipw.estimate <- mean(gamma.hat.pi)
value.aipw.stderr <- sd(gamma.hat.pi) / sqrt(length(gamma.hat.pi))

Estimate [AIPW]: -0.0574 Std. Error: 0.001

Is treatment effect different across treatment or control group?

ols <- lm(gamma.hat.1 - gamma.hat.0 ~ 0 + factor(pi.hat))
coeftest(ols, vcov=vcovHC(ols, 'HC2'))[1:2,]
estimate standard error
factor(pi.hat) 0 0.0089 0.0048
factor(pi.hat) 1 0.0116 0.002


So when I estimate the value of the learned policy, I observe for example a decrease of about 5.7 percentage points. However, when I examine the optimal tree and test whether the treatment effects vary across the leaves, I run into some confusion. If I understand correctly, a "Leaf node action = 2" indicates the individuals for whom the policy is recommended. However, I never see a decrease of more than 1.8 percentage point for any leaf node = 2 (so how can this be an optimal tree and be a subgroup recommended to treat?). This makes it difficult for me to reconcile the estimated policy value of a 5 percentage point decrease?

Moreover if I look if the treatment effect is different across treatment and control group - I also do not see negative results for the estimate for the treatment group (0.0116). When I check whether the tree is optimal: all(apply(Gamma.matrix,1,which.max) == predict(policy_tree, X)) - I also get a FALSE response but I am really confused how this is possible. Am I misinterpreting the results of the tree in relation to the estimated value of the learned policy, or might there be something else I am doing incorrectly?

Thanks a lot!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants