Examples(1)-Experimental_Design_and_ANOVA-Examples.qmd

---
title: "Experimental design and ANOVA examples"
author: "Alex Sanchez-Pla"
date: "`r Sys.Date()`"
format:
  html:
    embed-resources: true
    theme: cerulean
    toc: true
    toc-depth: 3
execute:
  warning: false
---

```{r echo=FALSE}
library(dplyr)
```

# Example 1. A completely randomized experiment

## Description

Gene therapy experiment: compare four techniques for correcting faulty genes

-   A: normal gene inserted in non-specific location
    -   B: Abnormal gene swapped for a normal gene
    -   C: Abnormal gene repaired through selective reverse mutation
    -   D: Regulation of a particular gene altered

## Randomization

20 independent individuals are selected.

```{r}
mice <- paste0("m",1:20)
```

Treatments are assigned at random.

```{r}
randomized<- sample (mice, 20)
TREAT<-rep(LETTERS[1:4], each=5)
names(randomized) <- TREAT
```

## Data collection

Once the randomization is done, the experiment is performed and gene expression is measured on each mouse.

```{r}
RESP <- c(96,99,100,104,84,91,90,75,80,90,70,90,84,76,78,78,87,67,66,76)
names(RESP) <- paste(names(randomized), randomized, sep=".")
```

## Data Analysis

```{r}
dades<-data.frame (TREAT, RESP)
dades$TREAT <- as.factor(TREAT)

kableExtra::kable(dades) %>%  kableExtra::kable_styling(full_width=FALSE)

```

```{r}
model <- RESP ~ TREAT

aov1<-aov (model, data=dades)
summary(aov1)

model.tables(aov1)
model.tables(aov1,type="means") # Mitjanes dels grups
```

```{r}
(hsd=TukeyHSD(aov1,which="TREAT",conf=0.95))
plot(hsd)

```

## Model assumptions verification

ANOVA does not work under all circumstances. Some assumptions have to be true for the data:

-   Homocedasticity or variance homogeneity
-   Independence of errors
-   Normality of errors

This can be a long process but plots are helpful to provide an overview of assumptions verification.

```{r}
opt<- par(mfrow=c(2,2))
plot(aov1)
par(opt)
```

## Exercises

A gene is suspected to have some connection with blood cancer. There are four stages of blood cancer: stage I, stage II, stage III, and stage IV.

For treating a patient, identiﬁcation of blood cancer is crucial in the ﬁrst three stages.

Three mRNA samples were collected from stage I, stage II, and stage III, respectively.

The experiment is repeated six times, as shown in the table.

Find whether there is any difference in mean expression values in the three mRNA samples.

![](images/image-1814946446.png)

# Example 2: Randomized Block design

## Randomization

Randomization is not illustrated here. Try to find out how you can do the randomization per blocks.

```{r}
dades<-expand.grid(medium=1:4, Tx =1:6)
```

## Data collection

Once treatments have been randomly assigned to each culture medium the experiment is performed and the data is collected.

```{r}
regeneration <- c(34.98,41.22,36.94,39.97,40.89,46.69, 46.65, 41.90, 42.07, 49.42, 52.68, 42.91, 37.18, 45.85, 40.23, 39.20, 37.99, 41.99, 37.61, 40.45, 34.89, 50.15, 44.57, 43.29)
dades <-cbind(dades, regeneration)
dades$medium<-as.factor(dades$medium)
dades$Tx<-as.factor(dades$Tx)
kableExtra::kable(dades) %>%  kableExtra::kable_styling(full_width=FALSE)
```

## Data analysis

The analysis is based on the following linear model:

```{r}
model.1<- regeneration ~ medium + Tx
model.1
```

The model is used to establish if there are any significant differences between treatments.

The differences between mediums are not relevant, because we assume they are different (that's why we block).

```{r}
aov.1<-aov (model.1, data=dades)
summary(aov.1)
# equatiomatic::extract_eq(aov.1)
```

If we had ignored blocking, the result would have been different:

```{r}
model.0<- regeneration ~  Tx
aov.0<-aov (model.0, data=dades)
summary(aov.0)
```

## Assumptions

Assumptions should always be checked when using ANOVA

```{r}
opt<- par(mfrow=c(2,2))
plot(aov1)
par(opt)

```

## Exercise

**Effect of Atorvastatin (Lipitor) on Gene Expression in People with Vascular Disease (National Institutes of Health)**

It has been known that atherosclerosis and its consequences (coronary heart disease and stroke) are the principal causes of mortality. Gene expression proﬁling of peripheral white blood cells provides information that may be predictive about vascular risk.

Table 9.2.3 gives gene expression meassurements, classiﬁed according to age group and dose level of Atorvastatin treatment.

Test whether the dose level and age groups signiﬁcantly affect the gene expression.

![](images/image-519026392.png)

# Example 3: Factorial designs

-   A study was conducted to study the effect of a drug and a diet on systolic blood pressure.

-   20 people with high blood pressure were randomized to one of four treatment conditions.

    -   Control group (neither diet nor drug modification)
    -   Diet modification only
    -   Drug only
    -   Modification of both drugs and diet

-   At the end of the treatment period, systolic blood pressure was assessed.

-   It is a factorial design in which each of the two treatments (drug, diet) can be randomly assigned to each individual.

-   By having 20 individuals, there can be replicates of each treatment combination.

## Randomization

We start by considering the possible combinations between all levels of Diet and Treatment.

```{r}
dades<-expand.grid(replica=1:5, Diet = c("Standard", "Modified"), Treatment=c("Placebo", "Drug"))
kableExtra::kable(dades) %>% kableExtra::kable_styling(full_width=FALSE)
```

Participants in the study (call them i1, i2, ...i20) are randomly assigned to one of the combinations.

```{r}
participants <- paste0("i", 1:20)
assignments <- cbind(dades, Partic=sample(participants))
kableExtra::kable(assignments) %>% kableExtra::kable_styling(full_width=FALSE)

```

## Data collection

Next, the experiment is done and the data can be collected

```{r}
WBC<-c( 2, .7, 1, 1.2, 1.3, 1.9, 1.9, 3.5, 1.2, 2.3, 2.4, 2.6, 1.9, 1.6, 1.7, 0.4, 0.2, 0.1, 0.4, 0.3)
names(WBC)<- assignments$Partic
dades <- cbind(assignments, WBC)
kableExtra::kable(dades) %>% kableExtra::kable_styling(full_width=FALSE)
```

## Modeling and Analysis

First, consider a model that ignores interaction between treatment and diet.

```{r}
model.1<- WBC ~ Diet + Treatment # this model ignores interaction
aov.1<-aov (model.1, data=dades)
summary(aov.1)
```

Now add an interaction between Diet and Treatment.

```{r}
model.2<- WBC ~ Diet + Treatment + Diet:Treatment
aov.2<-aov (model.2, data=dades)
summary(aov.2) # this shows that interaction had to be considered
```

```{r}
with (dades, interaction.plot(Diet, Treatment, WBC))
```


## Check ANOVA model assumptions
```{r}
opt<- par(mfrow=c(2,2)); plot(aov1); par(opt)
```