-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathExamples(1)-Experimental_Design_and_ANOVA-Examples.qmd
255 lines (172 loc) · 6.54 KB
/
Examples(1)-Experimental_Design_and_ANOVA-Examples.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
---
title: "Experimental design and ANOVA examples"
author: "Alex Sanchez-Pla"
date: "`r Sys.Date()`"
format:
html:
embed-resources: true
theme: cerulean
toc: true
toc-depth: 3
execute:
warning: false
---
```{r echo=FALSE}
library(dplyr)
```
# Example 1. A completely randomized experiment
## Description
Gene therapy experiment: compare four techniques for correcting faulty genes
- A: normal gene inserted in non-specific location
- B: Abnormal gene swapped for a normal gene
- C: Abnormal gene repaired through selective reverse mutation
- D: Regulation of a particular gene altered
## Randomization
20 independent individuals are selected.
```{r}
mice <- paste0("m",1:20)
```
Treatments are assigned at random.
```{r}
randomized<- sample (mice, 20)
TREAT<-rep(LETTERS[1:4], each=5)
names(randomized) <- TREAT
```
## Data collection
Once the randomization is done, the experiment is performed and gene expression is measured on each mouse.
```{r}
RESP <- c(96,99,100,104,84,91,90,75,80,90,70,90,84,76,78,78,87,67,66,76)
names(RESP) <- paste(names(randomized), randomized, sep=".")
```
## Data Analysis
```{r}
dades<-data.frame (TREAT, RESP)
dades$TREAT <- as.factor(TREAT)
kableExtra::kable(dades) %>% kableExtra::kable_styling(full_width=FALSE)
```
```{r}
model <- RESP ~ TREAT
aov1<-aov (model, data=dades)
summary(aov1)
model.tables(aov1)
model.tables(aov1,type="means") # Mitjanes dels grups
```
```{r}
(hsd=TukeyHSD(aov1,which="TREAT",conf=0.95))
plot(hsd)
```
## Model assumptions verification
ANOVA does not work under all circumstances. Some assumptions have to be true for the data:
- Homocedasticity or variance homogeneity
- Independence of errors
- Normality of errors
This can be a long process but plots are helpful to provide an overview of assumptions verification.
```{r}
opt<- par(mfrow=c(2,2))
plot(aov1)
par(opt)
```
## Exercises
A gene is suspected to have some connection with blood cancer. There are four stages of blood cancer: stage I, stage II, stage III, and stage IV.
For treating a patient, identification of blood cancer is crucial in the first three stages.
Three mRNA samples were collected from stage I, stage II, and stage III, respectively.
The experiment is repeated six times, as shown in the table.
Find whether there is any difference in mean expression values in the three mRNA samples.
![](images/image-1814946446.png)
# Example 2: Randomized Block design
## Randomization
Randomization is not illustrated here. Try to find out how you can do the randomization per blocks.
```{r}
dades<-expand.grid(medium=1:4, Tx =1:6)
```
## Data collection
Once treatments have been randomly assigned to each culture medium the experiment is performed and the data is collected.
```{r}
regeneration <- c(34.98,41.22,36.94,39.97,40.89,46.69, 46.65, 41.90, 42.07, 49.42, 52.68, 42.91, 37.18, 45.85, 40.23, 39.20, 37.99, 41.99, 37.61, 40.45, 34.89, 50.15, 44.57, 43.29)
dades <-cbind(dades, regeneration)
dades$medium<-as.factor(dades$medium)
dades$Tx<-as.factor(dades$Tx)
kableExtra::kable(dades) %>% kableExtra::kable_styling(full_width=FALSE)
```
## Data analysis
The analysis is based on the following linear model:
```{r}
model.1<- regeneration ~ medium + Tx
model.1
```
The model is used to establish if there are any significant differences between treatments.
The differences between mediums are not relevant, because we assume they are different (that's why we block).
```{r}
aov.1<-aov (model.1, data=dades)
summary(aov.1)
# equatiomatic::extract_eq(aov.1)
```
If we had ignored blocking, the result would have been different:
```{r}
model.0<- regeneration ~ Tx
aov.0<-aov (model.0, data=dades)
summary(aov.0)
```
## Assumptions
Assumptions should always be checked when using ANOVA
```{r}
opt<- par(mfrow=c(2,2))
plot(aov1)
par(opt)
```
## Exercise
**Effect of Atorvastatin (Lipitor) on Gene Expression in People with Vascular Disease (National Institutes of Health)**
It has been known that atherosclerosis and its consequences (coronary heart disease and stroke) are the principal causes of mortality. Gene expression profiling of peripheral white blood cells provides information that may be predictive about vascular risk.
Table 9.2.3 gives gene expression meassurements, classified according to age group and dose level of Atorvastatin treatment.
Test whether the dose level and age groups significantly affect the gene expression.
![](images/image-519026392.png)
# Example 3: Factorial designs
- A study was conducted to study the effect of a drug and a diet on systolic blood pressure.
- 20 people with high blood pressure were randomized to one of four treatment conditions.
- Control group (neither diet nor drug modification)
- Diet modification only
- Drug only
- Modification of both drugs and diet
- At the end of the treatment period, systolic blood pressure was assessed.
- It is a factorial design in which each of the two treatments (drug, diet) can be randomly assigned to each individual.
- By having 20 individuals, there can be replicates of each treatment combination.
## Randomization
We start by considering the possible combinations between all levels of Diet and Treatment.
```{r}
dades<-expand.grid(replica=1:5, Diet = c("Standard", "Modified"), Treatment=c("Placebo", "Drug"))
kableExtra::kable(dades) %>% kableExtra::kable_styling(full_width=FALSE)
```
Participants in the study (call them i1, i2, ...i20) are randomly assigned to one of the combinations.
```{r}
participants <- paste0("i", 1:20)
assignments <- cbind(dades, Partic=sample(participants))
kableExtra::kable(assignments) %>% kableExtra::kable_styling(full_width=FALSE)
```
## Data collection
Next, the experiment is done and the data can be collected
```{r}
WBC<-c( 2, .7, 1, 1.2, 1.3, 1.9, 1.9, 3.5, 1.2, 2.3, 2.4, 2.6, 1.9, 1.6, 1.7, 0.4, 0.2, 0.1, 0.4, 0.3)
names(WBC)<- assignments$Partic
dades <- cbind(assignments, WBC)
kableExtra::kable(dades) %>% kableExtra::kable_styling(full_width=FALSE)
```
## Modeling and Analysis
First, consider a model that ignores interaction between treatment and diet.
```{r}
model.1<- WBC ~ Diet + Treatment # this model ignores interaction
aov.1<-aov (model.1, data=dades)
summary(aov.1)
```
Now add an interaction between Diet and Treatment.
```{r}
model.2<- WBC ~ Diet + Treatment + Diet:Treatment
aov.2<-aov (model.2, data=dades)
summary(aov.2) # this shows that interaction had to be considered
```
```{r}
with (dades, interaction.plot(Diet, Treatment, WBC))
```
## Check ANOVA model assumptions
```{r}
opt<- par(mfrow=c(2,2)); plot(aov1); par(opt)
```