-
Notifications
You must be signed in to change notification settings - Fork 0
/
coursework1_answers.Rmd
371 lines (286 loc) · 11 KB
/
coursework1_answers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
---
title: "Ethics4DS: Coursework 1"
author: "[Candidate number here]"
date: "`r Sys.Date()`"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
theme_set(theme_minimal())
candidate_number <- 1 # change this
set.seed(candidate_number)
```
## Reproducibility
Change parameters of the simulation as necessary (or as desired, to make it unique)
```{r}
generate_data <- function(beta0 = 0,
betaX = 0,
betaA = 0,
betaXA = 0,
n = 200,
proportion = 1/2) {
X <- rnorm(n)
A <- rbinom(n, 1, proportion)
Y <- beta0 + betaX * X + betaA * A + betaXA * X * A + rnorm(n, sd = 1)
data.frame(Y = Y, X = X, A = factor(A))
}
```
Here is a visualisation to help understand the data
```{r}
one_sample <- generate_data(beta0 = 2, betaX = -1, betaA = 2, betaXA = 1.5)
one_sample |>
ggplot(aes(X, Y, color = A)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
```
```{r}
one_fit <- lm(Y ~ X + A + X * A, one_sample)
summary(one_fit)
```
```{r}
summary(one_fit)$coefficients[,4]
```
```{r}
# p-values for all the non-intercept variables
summary(one_fit)$coefficients[-1,4]
```
```{r}
# p-value for second coeff
summary(one_fit)$coefficients[2,4]
```
```{r}
# Every time this function is called, it:
# (1) generates new data
# (2) fits a model using the formula Y ~ X + A + X * A
# (3) returns the p-value for the second coefficient
one_experiment <- function(beta0 = 0, betaX = 0, betaA = 0,
betaXA = 0, n = 200, proportion = 1/2) {
one_sample <- generate_data(beta0, betaX, betaA, betaXA, n, proportion)
one_fit <- lm(Y ~ X + A + X * A, one_sample)
p_values <- summary(one_fit)$coefficients[2,4]
return(p_values)
}
```
```{r}
one_experiment()
```
```{r}
one_experiment(betaX = .3)
```
```{r}
n_experiments <- 5
replicate(n_experiments, one_experiment())
```
```{r}
rejection_level <- 0.05
n_experiments <- 500
# proportion of experiments where p-value
# is below rejection level
mean(replicate(n_experiments, one_experiment()) < rejection_level)
```
Consider: What should this be? Is the above result acceptable or concerning?
```{r}
mean(replicate(n_experiments, one_experiment(betaX = .2)) < rejection_level)
```
A larger proportion of experiments had significant p-values this time. Is that what we expect?
How would this rejection rate depend on each of the parameters input to the `one_experiment()` function (e.g. `betaX`, `betaA`, `n`, etc)?
When do we want the rejection rate to be high, and when do we want it to be low?
Note: it may be helpful to review basic conditional control flow in R from sources like one of the following
- [https://posit.cloud/learn/primers/6.5](https://posit.cloud/learn/primers/6.5)
- [https://adv-r.hadley.nz/control-flow.html](https://adv-r.hadley.nz/control-flow.html) Section 5.2
**Instructions**: after completing the rest of this notebook, delete this comment and all of the demonstration code above. You should only keep the code necessary for your answers below to work.
### Preregistration vs p-hacking
#### p-hacking
- Modify the example code to create a function that simulates p-hacking
- Show the effect of p-hacking on statistical error rates using any appropriate statistics or visualisations that you prefer
- Explain, in your own words, what problems could result from p-hacking
```{r}
one_phacked_experiment <- function(beta0 = 0, betaX = 0, betaA = 0,
betaXA = 0, n = 200, proportion = 1/2) {
one_sample <- generate_data(beta0, betaX, betaA, betaXA, n, proportion)
one_fit <- lm(Y ~ X + A + X * A, one_sample)
min_pvalue <- min(summary(one_fit)$coefficients[-1,4])
another_fit <- lm(Y ~ X, one_sample)
another_pvalue <- summary(another_fit)$coefficients[2,4]
p_values <- min(min_pvalue, another_pvalue)
return(p_values)
}
```
```{r}
rejection_level <- 0.05
n_experiments <- 500
# proportion of experiments where p-value
# is below rejection level
all_experiments_pvalues <- replicate(n_experiments, one_phacked_experiment())
mean(all_experiments_pvalues < rejection_level)
```
```{r}
hist(all_experiments_pvalues)
```
#### Preregistration
- Create another version of the function to simulate the constraints of preregistration
- Compare the error rates with preregistration to those without, and explain whether this could help with any problems you identified above and, if so, how
```{r}
one_preregistered_experiment <- function(beta0 = 0, betaX = 0, betaA = 0,
betaXA = 0, n = 200, proportion = 1/2) {
one_sample <- generate_data(beta0, betaX, betaA, betaXA, n, proportion)
one_fit <- lm(Y ~ X + A + X * A, one_sample)
several_pvalues <- summary(one_fit)$coefficients[-1,4]
another_fit <- lm(Y ~ X, one_sample)
another_pvalue <- summary(another_fit)$coefficients[2,4]
p_values <- c(several_pvalues, another_pvalue)
return(p_values)
}
```
```{r}
rejection_level <- 0.05
n_experiments <- 500
# proportion of experiments where p-value
# is below rejection level
all_experiments_pvalues <- replicate(n_experiments, one_preregistered_experiment())
mean(all_experiments_pvalues < rejection_level)
```
```{r}
hist(all_experiments_pvalues)
```
### Reproducibility and causality
#### Reproducible and causal
- Create another simulation, modifying the data generating code to be consistent with a specific causal model structure, and choosing the structure to satisfy the following the following requirements
- Show that there is a reproducible effect, i.e. one that can be found fairly consistently (e.g. in more than five percent of experiments) without using p-hacking
- Show, by simulating an intervention, that the above effect is causal
```{r}
# no changes necessary from original experiment
# if true betaX is nonzero then rejections are not false discoveries
# rejection rate:
mean(replicate(n_experiments, one_experiment(betaX = .2)) < rejection_level)
```
Why is this effect causal?
Assuming the code that generates the data accurately represents a causal model, we see the function generating Y depends on X. So if X is intervened on, then Y will also change. Here's a simple example showing the effect:
```{r}
# Using betaX = .2, all other coeffs = 0
N <- 1000 # reduce sampling variation
# Before intervention
X <- rnorm(N)
A <- rbinom(N, 1, 1/2)
Y <- 0.2 * X + rnorm(N, sd = 1)
# After intervention
X_new <- 2
A_new <- rbinom(N, 1, 1/2)
Y_new <- 0.2 * X_new + rnorm(N, sd = 1)
mean(Y_new) - mean(Y)
```
This shows that an intervention setting every value of X to 2 (for example) results in a different average value of Y
Conclusion: there is a true causal effect, and regression experiments find a statistically significant coefficient for X (with a rejection rate that is higher than 0.05)
#### Reproducible but not causal
- Repeat the above section, but in this case choose the causal model generating your data so that the reproducible effect is not causal, i.e. an intervention on that variable does not change the outcome variable
```{r}
generate_common_cause_data <- function(n = 100,
sdx = .2,
sdy = .2,
intervention = NULL) {
# Original world
# Y <- U -> X
U <- rnorm(n) # unobserved
f_X <- function(U) U + 1
f_Y <- function(U) (1/5) * (1 + U)^2
X <- f_X(U) + rnorm(n, sd = sdx)
Y <- f_Y(U) + rnorm(n, sd = sdy)
# Optional: intervene on X
if (!is.null(intervention)) {
# Intervened world
# Y <- U, X <- intervention
# U is generated before X
U <- rnorm(n)
# X is set to a specific value
X <- intervention
# Y is generated after X, but keeps its original
# distribution because f_Y does not use X
Y <- f_Y(U) + rnorm(n, sd = sdy)
}
data.frame(Y = Y, X = X)
}
```
Original data
```{r}
one_common_cause_dataset <- generate_common_cause_data(n = 400)
ggplot(one_common_cause_dataset, aes(X, Y)) +
geom_point(alpha = .5) +
geom_smooth(method = "lm", se = FALSE)
```
Data with an intervention setting X to 1
```{r}
intervened_common_cause_dataset <-
generate_common_cause_data(n = 400, intervention = 1)
ggplot(intervened_common_cause_dataset, aes(X, Y)) +
geom_point(alpha = .5) +
geom_smooth(method = "lm", se = FALSE)
```
Averages of Y before and after intervention
```{r}
mean(one_common_cause_dataset$Y)
mean(intervened_common_cause_dataset$Y)
```
Simulate one experiment
```{r}
# Every time this function is called, it:
# (1) generates new "common cause" data
# (2) fits a model using the formula Y ~ X
# (3) generates "common cause" data with intervention on X
# (4) computes mean of Y before and after intervention
# returns: X coefficient and p-value from (2), interventional effect (4)
one_common_cause_experiment <- function(n = 100,
sdx = .2,
sdy = .2,
intervention = 1) {
# without intervention
one_sample <- generate_common_cause_data(n, sdx, sdy)
one_fit <- lm(Y ~ X, one_sample)
beta_hat <- summary(one_fit)$coefficients[2,1]
p_value <- summary(one_fit)$coefficients[2,4]
# with intervention
intervened_sample <- generate_common_cause_data(n, sdx, sdy, intervention)
avg_Y_diff <- mean(intervened_sample$Y) - mean(one_sample$Y)
return(list(beta_hat = beta_hat,
p_value = p_value,
intervention_effect = avg_Y_diff))
}
```
Simulate multiple experiments and transform the results to make them easy to plot
```{r}
multiple_common_cause_experiments <- function(n_experiments = 200,
n = 100,
sdx = .2,
sdy = .2,
intervention = 1) {
experiment_summary_list <- replicate(
n_experiments,
one_common_cause_experiment(n, sdx, sdy, intervention))
experiment_summaries <- apply(t(experiment_summary_list), 2, unlist)
return(data.frame(experiment_summaries))
}
```
```{r}
experiment_summaries <- multiple_common_cause_experiments()
```
Distribution of estimated coefficients for X
```{r}
ggplot(experiment_summaries, aes(beta_hat)) + geom_histogram(bins = 20)
```
Rejection rate using p-values for X
```{r}
rejection_level <- 0.05
# proportion of experiments where p-value
# is below rejection level
mean(experiment_summaries$p_value < rejection_level)
```
Distribution of mean difference in Y when intervening on X
```{r}
ggplot(experiment_summaries, aes(intervention_effect)) +
geom_histogram(bins = 20)
```
Average interventional effect across multiple experiments (meta-analysis)
```{r}
mean(experiment_summaries$intervention_effect)
```
Conclusion: experiments consistently find a statistically significant coefficient for X (**without** using p-hacking to achieve significance), and yet there is no causal effect of X on Y (their significant association is due to having a common cause U, which in our simulation is unobserved by the experimenter)