mixeff_modeling_with_R.Rmd

---
title: "Mixed-Effect/Multilevel Modeling in R"
author: "Clay Ford, UVA Library"
output: html_notebook
editor_options: 
  chunk_output_type: inline
---

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. 

Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter* (Win/Linux) or *Cmd+Shift+Return* (Mac). 

```{r}
plot(cars)
```

To hide the output, click the Expand/Collapse output button. To clear results (or an error), click the "x". 

You can also press *Ctrl+Enter* (Win/Linux) or *Cmd+Return* (Mac) to run one line of code at a time (instead of the entire chunk).

Add a new R code chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I* (Win/Linux) or *Cmd+Option+I* (Mac).  

## CODE ALONG 0

Enter a new code chunk and run the code `rnorm(10)` (sample 10 random values from a standard normal distribution.)


## Agenda

- Simple linear regression review
- Motivation for mixed-effect/multilevel models
- Checking assumptions of mixed-effect/multilevel models
- Using our model for predictions
- Interpreting coefficients
- Comparing models
- Comparing effects
- Using simulation with mixed-effect/multilevel models

Appendix Topics (topics cut for time)
- ML vs REML
- Approximate p-values
- Intraclass Correlation Coefficient (ICC)
- Correlation of Fixed Effects
- Simulation-based diagnostics
- Hypothesis tests for random effects
- AIC and BIC review
- Reporting mixed-effect/multilevel model output


## Load packages

We're going to use the following packages in this workshop. 

```{r}
library(lme4)
library(ggplot2)
library(ggeffects)
library(emmeans)
```


## Simple linear regression review

Instead of using mathematical statistics, we'll try to motivate mixed-effect/multilevel models using simulation. Below we generate data from a straight line model. y is completely determined by x. We might say the intercept (3) and slope (2) are _fixed effects_.

```{r}
x <- 1:10
y <- 3 + 2*x
d <- data.frame(x, y)
plot(y ~ x, data = d)
```

Now let's add some noise to each observation. We'll use 10 random draws from a Normal distribution with mean 0 and standard deviation of 1.5. The `set.seed(1)` function ensures we all draw the same values. Now y looks associated with x, but not completely determined by x. We might say there is a _random effect_ associated with each observation.

```{r}
set.seed(1)
noise <- rnorm(10, mean = 0, sd = 1.5)
d$y <- 3 + 2*x + noise
plot(y ~ x, data = d)
```

Now let's fit a simple linear model, or regression line, using the correct model we used to simulate the data. We use the `lm()` function for this. The formula "y ~ x" means we think the model is "y = intercept + slope*x".

```{r}
m <- lm(y ~ x, data = d)
summary(m)
```

Assuming our data came from a function of the form _y = a + b*x_ with noise sampled from a Normal distribution, the model estimates _a_ as 2.7468 and _b_ as 2.0821. Those are fairly close to the true values of 3 and 2. The noise is estimated to be from a Normal distribution with mean 0 and standard deviation 1.214 (Residual standard error). Again close to the true value of 1.5.

We might say the estimated _fixed effects_ are about 2.75 and 2.08, and the estimated _random effect_ is 1.24.

This is one way to think of linear modeling or regression: we try to recover or approximate the process that generated the data. 

If we like we can fit this model to the data using the {ggeffects} package. Below `|>` is the pipe operator. Take the output of one function and pass it to the first argument of the next function.

```{r}
ggpredict(m, terms = "x") |>
  plot(show_data = TRUE)
```

This is one way to think of linear modeling or regression: we try to recover the process that generated the data. We try to work backwards to determine the true values of the fixed and random effects.


## Motivation for mixed-effect/multilevel models

Pretend the previous example data was for one subject. What if we had 7 subjects, each with 10 observations? How might we simulate data for that scenario?

One approach: simulate data where we add noise to the intercept that's _specific to each subject_. In other words each subject has their own _random effect_ on the intercept. 

First let's generate an id variable to identify subjects. We can generate 7 id numbers, 10 each, using the `gl()` function.

```{r}
id <- gl(n = 7, k = 10)
```

Next we generate an independent predictor variable, which is simply the numbers 1 - 10. We use the `rep()` to replicate the vector 1:10 7 times.

```{r}
x <- rep(1:10, 7)
```

Now we generate noise specific to _each observation_. We'll draw a sample of size 7*10 = 70 from a Normal distribution with mean 0 and standard deviation 1.5. `set.seed(2)` ensures we all generate the same "random" data.

```{r}
set.seed(2)
obs_noise <- rnorm(7 * 10, mean = 0, sd = 1.5)
```

We also generate noise specific to _each subject_. We'll draw a sample of size 7 from a Normal distribution with mean 0 and standard deviation 2.

```{r}
set.seed(3)
subj_noise <- rnorm(7, mean = 0, sd = 2)
```

Finally we generate `y` as a function of `x`, but this time we add the subject-specific noise to the _intercept_. The syntax `subj_noise[id]` uses id as index numbers to repeat subj_noise values.

```{r}
y <- (3 + subj_noise[id]) + 2*x + obs_noise
d2 <- data.frame(id, y, x)
```

Now we have two fixed effects:
- Intercept = 3
- Slope = 2

And two random effects:
- Subject: N(0, 2), subject-specific variation
- Observation: N(0, 1.5), measurement error or unrecorded variables

Our data is _not independent_. The first six rows are for subject 1.

```{r}
head(d2)
```


Let's visualize the data using {ggplot2}. The argument `method = "lm"` below adds a regression line to each subject's values. The argument `se = FALSE` suppresses confidence bands.

```{r}
ggplot(d2) +
  aes(x = x, y = y, color = id) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)
```

We have lines with _different intercepts_ but similar slopes. That's because _we added each subject's random effect to the fixed intercept_. Any differences in slopes are due to random effects associated with each observation.

How do we "work backwards" as we did with the simple linear model and recover the true intercept, slope and two random effect parameters?

**Linear Mixed-Effect Models, or Multilevel Models**

Because we are dealing with models with a mix of fixed and random effects, we sometimes call these **Mixed-Effect Models**. Another name is **Multilevel Models** because we're dealing with different levels of observations. (eg, we observe 10 observations [Level 1] on each subject [Level 2].)

Two popular R packages used to fit mixed-effect models are {nlme} and {lme4}. The {nlme} package comes with base R. The {lme4} package needs to be installed. Both were developed by the same author, Doug Bates. The {lme4} package came later. It has an easier modeling syntax and works better for larger data sets. But {nlme} can fit certain models that {lme4} can't fit, and vice versa. We'll use {lme4} today but that doesn't mean {nlme} is obsolete. 

Below we model y as a function of x plus an intercept (1) that is conditional on id (the subject) using the `lmer()` function from the {lme4} package. Notice this is the _correct model_! The y value really is a function of x plus an intercept that varies between subjects. The syntax `(1|id)` means "the intercept is conditional on the id." Each subject exerts a random effect on the intercept.

```{r}
me1 <- lmer(y ~ x + (1|id), data = d2)
summary(me1)
```

The _Fixed Effects_ section says that assuming this data came from a straight line model with an intercept and slope, the model estimates the intercept to be 2.65 and the slope to be 1.97. These are close to the true values of 3 and 2.

The Standard Error column estimates the uncertainty in the coefficients. For example, the x coefficient is estimated to be 1.97 with a SE of 0.07. In other words, we estimate the coefficient to be 1.97 give or take 0.07.

The "t value" column is the ratio Estimate/Std. Error. A t value greater than about 3 in absolute value provides evidence the estimated coefficient is different from 0. In other words, it's more than three standard errors away from 0.

_Where are the p-values?_ In mixed-effect models the distribution of t values for the null hypothesis is not known, at least not for unbalanced data. P-values can be approximated (sometimes), but not calculated precisely. The {lme4} developers elected to not output p-values. Probably better to look at confidence intervals anyway. See _Appendix_ for a package that outputs approximate/questionable p-values, {lmerTest}.

The _Random Effects_ section reports two standard deviations: one for id and and one called "Residual". The first is the estimate of the standard deviation of the normal distribution from which the subject-specific noise was drawn: 1.157. The second is the estimate of the standard deviation of the normal distribution from which the observation-specific noise was drawn: 1.795. Recall the true values were 2 and 1.5, respectively. 

The _Correlation of Fixed Effects_ section has nothing to do with collinearity. These are the covariances of the model coefficients converted to correlations. Going forward we will suppress this output by setting `corr = FALSE`. See the Appendix of this notebook for more information on the correlation of fixed effects.

If we look at the coefficients of this model using the `coef()` function, notice everyone got their _own intercept_ but has the same slope. We have essentially fit a straight line model to each subject. This is sometimes called a _random intercept model_. Each subject has a random effect associated with the intercept.

```{r}
coef(me1)
```

This is one way to think of mixed-effect/multilevel modeling: trying to work backward to determine or approximate the data generation process. Our example was simple and we knew the data generating process because we simulated the data.  In real life we never know if our model is correct (spoiler: it never is). At best we hope build a model that approximates the data generation process.


## Checking assumptions

Notice two assumptions that `lmer()` made:

1. The variance (noise) is constant for random effects
2. The variance (noise) comes from a Normal distribution for random effects

```
Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 1.338    1.157   
 Residual             3.221    1.795   
Number of obs: 70, groups:  id, 7
```

We can assess those assumptions with _diagnostic plots_.

Check _constant variance for Residuals_ (aka, observation-specific noise). Points should be evenly scattered around 0. This plot looks great because we generated the data using constant variance to generate observation-specific noise.  

```{r}
plot(me1)
```

Check _constant variance for Subjects_ (aka, subject-specific noise). Boxplots should be evenly scattered around 0 for each subject. The syntax `id ~ resid(.)` means plot residuals by subject id. This plot also looks great because we generated the data using constant variance to generate subject-specific noise.

```{r}
plot(me1, id ~ resid(.))
```

The above plot is not useful if you have, say, hundreds of subjects.

To assess that the Residuals (aka, observation-specific noise) are approximately Normal, we need to use a function in the lattice package called `qqmath()`. (The {lattice} package comes with R.) The points should lie close to the diagonal line. The syntax `lattice::` allows us to access the function from the package without loading the package. This plot looks great because we generated the Residual noise from a Normal distribution.  

```{r}
lattice::qqmath(me1)
```

To assess that the subject-specific random effects are approximately Normal we can also use the {lattice} `qqmath()` function, but first we have to extract the estimated random effects from the model object using the `ranef()` function. 

```{r}
lattice::qqmath(ranef(me1))
```

This is similar to the previous plot but only has 7 points (one for each subject). It also has a +/- 1 standard deviation bar to give some sense of the uncertainty in the estimate of each subject's random effect. We'd like those points to roughly form a diagonal line.

Violations of these assumptions, especially the constant variance assumption, usually mean we have a poor model. Sometimes we can try transforming the dependent variable, including an interaction, or allowing non-linear effects. But sometimes there's nothing we can do. The variation in the dependent variable may not be well explained by the candidate predictor variables. 

## Using our model for predictions

In our basic linear model, we can make predictions with the `predict()` function. Below we predict y given x = 3 and request a 95% confidence interval. In other words, what's the expected mean of y given x = 3?

```{r}
predict(m, newdata = data.frame(x = 3), 
        interval = "confidence")
```

_Predictions with mixed-effect/multilevel models require extra thought_. We need to decide whether or not we want to condition on random effects. In other words, are we making a prediction....

- for any subject, or 
- for a particular subject, such as subject 3 (id = 3)?

To make a _prediction for any subject_, perhaps a new subject who wasn't sampled, we specify `re.form=NA`. This says just use fixed effects to make a prediction. This is usually how mixed-effect models are used when making predictions. Predict expected y when x = 3 for _any subject_:

```{r}
predict(me1, newdata = data.frame(x = 3),
        re.form=NA)
```

To make a _prediction for a specific subject_, we drop the `re.form=NA` argument and include `id` in our new data frame. Predict expected y when x = 3 for _for subject 3 (id = 3)_.

```{r}
predict(me1, newdata = data.frame(x = 3, id = 3))
```

There is no option for confidence intervals. From the {lme4} documentation for predict: "There is no option for computing standard errors of predictions because it is difficult to define an efficient method that incorporates uncertainty in the variance parameters." That means we cannot simply ask for confidence intervals for our predictions. We can however use the bootstrap. (see _Using Simulation with Mixed-Effect Models section_ below)

(However, as of version 1.1-35, there is an experimental argument available to estimate standard errors for predictions: `se.fit = TRUE`) To see what version of {lme4} you have:

```{r}
packageVersion("lme4")
```

We can also visualize this model with ggeffects. _By default it assumes predictions do NOT make use of each subject's random effect_. Below we use `ggpredict` to make predictions for various values of x, and pipe the result into `plot`. The `show_data` argument adds the raw data to the plot.

```{r}
ggpredict(me1, terms = "x") |>
  plot(show_data = TRUE)
```

To incorporate the additional uncertainty due to each subject's random effect, we set `type = "random"` (for "random effects"). Notice the confidence ribbon is much wider.

```{r}
ggpredict(me1, terms = "x", type = "random") |>
  plot(show_data = TRUE)
```


## CODE ALONG 1

Run the following code to load a new data set. 

```{r}
URL <- "https://raw.githubusercontent.com/clayford/mixeff_modeling_with_r/main/data/d.csv"
d3 <- read.csv(URL)
d3$id <- factor(d3$id)
str(d3)
```

Add new R code chunks by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I* (Win/Linux) or *Cmd+Option+I* (Mac). 

1. Visualize the data using ggplot2. Group by id.


2. Model y as a function of x with a random intercept conditional on id. Call the model `me3`.


3. Check the constant variance assumption for the observations. What do you think?


4. Now fit a model that assumes the subject exerts a random effect on both the intercept and slope, using the formula `y ~ x + (x|id)`. Call the model `me4`.


5. Check the assumption of constant variance for the observation random effect for `me4`. What do you think?


6. View the coefficients of `me4`. Notice each subject has their own intercept and slope.


## Mixed-effect/multilevel modeling with real data

Let's look at some realistic data.

The following data consist of 5 weekly measurements of body weight for 27 rats. 10 rats were on a control treatment. 7 rats had thyroxine added to their drinking water. 10 rats had thiouracil added to their water. We're interested in how the treatments affected the weight of the rats. Source: faraway package (Faraway, 2006) Notice this data is _unbalanced_.

```{r}
URL2 <- "https://raw.githubusercontent.com/clayford/mixeff_modeling_with_r/main/data/ratdrink.csv"
ratdrink <- read.csv(URL2)
ratdrink$subject <- factor(ratdrink$subject)
ratdrink$treat <- factor(ratdrink$treat)
str(ratdrink)
```

Our data has 135 observations, but these are _not independent_. We have 27 subjects (rats), and 5 observations on each of these subjects.

Let's visualize the data grouped by rat. It appears the trajectories of growth change depending on treatment.

```{r}
ggplot(ratdrink) +
  aes(x = weeks, y = wt, color = treat, group = subject) +
  geom_point() + 
  geom_line()
```

We may also want to visualize trend lines grouped by treat.

```{r}
ggplot(ratdrink) +
  aes(x = weeks, y = wt, color = treat) +
  geom_point() +
  geom_smooth(se = FALSE)
```


Again, we're interested in how the treatments affect the weight of the rats.

Let's model wt as a function of treat and weeks and let the intercept be conditional on the random effects of subject. This model allows each rat to have their own intercept but the same effects for treat and weeks. This model says the effect of weeks is the same regardless of treatment.

```{r}
lmm1 <- lmer(wt ~ treat + weeks + (1 | subject), data=ratdrink)
summary(lmm1, corr=FALSE) # suppress "Correlation of Fixed Effects"
```

Naive interpretation:

- Rats gain about 23 grams per week (weeks coefficient)
- mean weight at week 0 for control group is about 59 (Intercept)
- mean weight at week 0 for thiouracil group is about 59 - 13
- mean weight at week 0 for thyroxine group is about 59 + 0.5

Is this model any good? One way to check is to simply check the constant variance assumption of the Residuals. This doesn't look good, at least not for low and high fitted values.

```{r}
plot(lmm1)
```

Check constant variance for rats. 

```{r}
plot(lmm1, subject ~ resid(.))
```

Let's plot the fitted model using ggeffects. 

```{r}
ggpredict(lmm1, terms = c("weeks", "treat")) |> 
  plot(show_ci = FALSE, show_data = TRUE)
```

It appears we may need to let the trajectory over weeks vary by treatment. We can do that with an interaction.


## CODE ALONG 2


1. Let's fit a model with an interaction between treat and weeks, and let's leave the intercept conditional on subject. Call the model `lmm2`


2. Is this model "good"? Check the Residual plot.


3. Create an effect plot using ggeffects.


4. what's the expected mean weight of a rat at week 4 for each treatment?


## Interpreting model coefficients

Recall the model summary from CODE ALONG 2:

```{r}
lmm2 <- lmer(wt ~ treat + weeks + treat:weeks + (1 | subject), data=ratdrink)
summary(lmm2, corr = FALSE)
```

Since we have interactions, it takes some work to interpret the coefficients.

- The `Intercept` is the expected weight of a rat at week 0 in the control group: about 52.8 grams

- The `weeks` coefficient is the expected amount of weight in grams a rat in the control group adds each week: about 26.5 grams.

- The `treatthiouracil` coefficient is what we _add_ to the `Intercept` to get the expected weight of a rat at week 0 is the thiouracil group: about 52.8 + 4.8 = 57.6 grams

- The `treatthiouracil:weeks` coefficient is what we _add_ to the `weeks` coefficient to the expected amount of weight in grams a rat in the thiouracil group adds each week: about 26.5 + -9.4 = 17.1 grams

The same calculations can be done for the thyroxine coefficients.

We can get confidence intervals for the coefficients by using the `confint()` function. Notice we get confidence intervals for the random effect estimates as well. Setting `oldNames = FALSE` returns more informative names for the random effects.

```{r}
confint(lmm2, oldNames = FALSE)
```

We can also request bootstrap confidence intervals.

```{r}
confint(lmm2, oldNames = FALSE, method = "boot")
```


## Comparing models

We have many options when building a mixed-effect model. In addition to what predictors to include and whether they should interact, we get to decide how many random effects to allow. This leads to multiple candidate models. How can we decide which model is "best"?

When comparing mixed-effect models it's probably best to either...

- compare models that differ _only in fixed effects_
- compare models that differ _only in random effects_

For example, lmm1 and lmm2 differ only in their fixed effects. The lmm2 model includes an interaction. The effect of treat depends on weeks, and vice versa. But both have the same random effect.

```{r}
formula(lmm1)
formula(lmm2)
```

Technically model lmm1 is _nested_ within lmm2. It's a version of lmm2 where the interaction coefficients are 0. We can use the base R `anova()` function to compare the models using a hypothesis test. The null hypothesis is no difference in the models. This implies testing that the coefficients on the interaction are jointly 0.

```{r}
anova(lmm1, lmm2)
```

Notice both models are refit using ML. (See REML vs ML in the Appendix below.) However the test statistic and p-value may be questionable. The Chisq test statistic is an approximation. In this case we would probably prefer to compare AIC/BIC values. 

AIC stands for Akaike Information Criterion. BIC stands for Bayesian Information Criterion. (See Appendix of this document for more information.) These are basically measures of _how well the model would fit new data_. Lower values mean a better "fit". An AIC/BIC value by itself doesn't mean much. But if we have multiple AIC/BIC values from different models fit to the same data with the same response, we can use them to help us select a "preferred" model. According to AIC/BIC, the model with the interaction appears to be decisively better.

Now let's compare models that differ only in their random effects.

The lmm2 model allowed for one random effect on the intercept. Let's fit a new model that allows a random effect on weeks as well. We can do that by specifying `(weeks | subject)`. This says the effect of weeks is conditional on each rat. By default this will also estimate subject random effect on the intercept.

```{r}
lmm3 <- lmer(wt ~ treat + weeks + treat:weeks + (weeks | subject),
             data=ratdrink)
summary(lmm3, corr = FALSE)
```

Under Random Effects we have _three estimates_ of "noise" or variance: 

1. subject random effect on the intercept
2. subject random effect on the weeks coefficient
3. correlation between the two subject random effects.

The correlation estimate is small, only -0.13. Correlated random effects in this case means that subjects' random effects on the intercept and weeks coefficients are associated. For example, if one subject has a large random effect on the intercept, they may have a small random effect on the weeks coefficient. That would be a negative correlation.

Let's look at the confidence intervals on the three random effects. We can specify we just want random effects by specifying `parm = "theta_"`.

```{r}
confint(lmm3, parm = "theta_", oldNames = FALSE)
```

The confidence interval on the correlation is wide and uncertain. I don't think we need to estimate that parameter. 

Let's fit a model with _no correlation between random effects_. To do this we can use two pipes in the random effects formula: `(weeks || subject)`. (This only works for numeric predictors.)

```{r}
lmm4 <- lmer(wt ~ treat + weeks + treat:weeks + (weeks || subject),
             data=ratdrink)
```

An alternative way to specify the model is to explicitly state two random effects: one for the intercept and one for weeks. Because the intercept random effect is fit by default, we can use `0 + weeks` to suppress it.

```{r}
lmm4 <- lmer(wt ~ treat + weeks + treat:weeks + 
               (1 | subject) + (0 + weeks | subject),
             data=ratdrink)
summary(lmm4, corr = FALSE)
```

If we look at the coefficients, each rat now has their own intercept and weeks coefficients. 

```{r}
coef(lmm4)$subject
```

Is model lmm4 better than model lmm2 that just has a random intercept? In other words, do we need the random effect for weeks? 

```{r}
formula(lmm2)
formula(lmm4)
```

The easiest thing to do is again look at the confidence intervals on the random effects.

```{r}
confint(lmm4, parm = "theta_", oldNames = FALSE)
```

The 95% confidence interval on the weeks random effect is about [2.6, 4.8]. I think we can safely conclude the random effect is not 0. "Not 0" does not always mean relevant or important. In this case the subject-specific effect on weeks is about 2 - 5 grams. That seems somewhat important given the range of the weight values (about 45 - 190 grams).

Comparing models that differ only in their random effects using a hypothesis test is tricky because it means testing if _variance equals 0_. This is a weird test since variance cannot be less than 0. We're at the boundary of possible values. This means it's not appropriate to use the hypothesis test implemented in the `anova()` function (though it will still run and provide results.)

One option is to use the {RLRsim} package, which provides simulation-based exact likelihood ratio tests for testing random effects. A demonstration is included in the Appendix of this document.


## CODE ALONG 3

1. Update model lmm4 with a non-linear effect on weeks using `poly(weeks, 2)`. This is a second degree polynomial. Call the model `lmm5`.


2. How does this model compare to `lmm4`, which has a simple linear effect on weeks?


## Making effect comparisons

Let's say we're happy with model `lmm4`. It's good enough. How can we use that model to make comparisons between treatments at certain times? How can we compare trends over time between treatments?  

We can use effect plots to visualize the model. They are indispensable for understanding interactions and non-linear effects. The {ggeffects} package makes it easy to create some basic plots. The most simple usage is to call the `ggpredict()` function on your model and specify which terms to visualize in a vector. The first term will form the x-axis, the second will form the groups. Then pipe the result into the {ggeffects} `plot()` method.

```{r}
ggpredict(lmm4, terms = c("weeks", "treat")) |>
  plot()
```

Notice the trend over time for the thiouracil group is less pronounced. That's the interaction at work. It appears at week 4, the end of the observation period, rats on thiouracil weigh less than rats on the other two treatments. How much less? 

The {emmeans} package can help us answer that. "emmeans" is short for Estimated Marginal Means. {emmeans} is a large and powerful package. We show two basic uses in this example.

The `emmeans()` function takes a fitted model as the first argument. The second argument uses {emmeans} syntax to specify we want `pairwise` comparisons between all three treatments. The `at` argument allows us to specify at what time point we want to make the comparisons. Notice this needs to be a data frame. (The note about results being misleading can be ignored since we set the `at` argument.)

```{r}
emmeans(lmm4, pairwise ~ treat, at = data.frame(weeks = 4))
```

The first section, "emmeans", returns the model-estimated mean weight for each treatment at week 4. The second section, "contrasts", compares those weights. The mean weight for rats on thiouracil is about 126. That's about 32-34 grams less than the control and thyroxine groups. Those differences appear to be significant judging by p-values. NOTE: The p-values are approximate. Recall we can't precisely calculate p-values for mixed-effect models with unbalanced groups. The Kenward-Roger method is one approach for obtaining approximate p-values.

Probably better to look at confidence intervals. Below we pipe the previous result into `confint()`.

```{r}
emmeans(lmm4, pairwise ~ treat, at = data.frame(weeks = 4)) |>
  confint()
```

It appears that rats on thiouracil can be expected to weigh at least 14-15 grams less than the other two groups at 4 weeks.

To compare the trends (or slopes) over time, we use the `emtrends()` function. The syntax is almost the same as before, except this time we use the `var` argument to specify the x-axis of the trend line. In this case it's weeks.

```{r}
emtrends(lmm4, pairwise ~ treat, var = "weeks")
```

The trend or slope of thiouracil appears to be about 10 units less than the other two groups: 17 versus 27. That difference appears to be significant according to p-values. Again we can pipe the result into `confint()` to get confidence intervals.

```{r}
emtrends(lmm4, pairwise ~ treat, var = "weeks") |>
  confint()
```

The difference in trend looks to be at least 5 between thioruacil and the other two groups. We might say expected weight gain for rats on thiouracil is about 5 - 15 grams less than thyroxine and the control groups.


## Using Simulation with Mixed-Effect Models

### Posterior predictive checks

If our mixed-effect model is good, it should generate data similar to our observed data. This is sometimes called a "posterior predictive check". The `simulate()` function allows us to use our model to rapidly generate new responses using our model. We can then compare the distribution of the model generated responses to the distribution of our observed response data.

Let's try `lmm5`. Recall the `lmm5` model formula:

```{r}
formula(lmm5)
```

Below we specify 100 simulations and then use a `for` loop to plot a density curve for each simulation over the density curve of our observed data. The `ylim = c(0, 0.011)` argument was determined via trial and error.

```{r}
sim1 <- simulate(lmm5, nsim = 100)
plot(density(ratdrink$wt), ylim = c(0, 0.011))
for(i in 1:100)lines(density(sim1[[i]]), col = "grey80", lty = 2)
```

That looks pretty good!

Let's try a different model. Below we fit a model that doesn't take treat into account and only has a random effect on the intercept.

```{r}
lmmX <- lmer(wt ~ weeks + (1 | subject), 
             data = ratdrink)
```


Let's simulate wt using the `lmmX` model and overlay the model generated distributions on top of the observed distribution. 

```{r}
sim2 <- simulate(lmmX, nsim = 100)
plot(density(ratdrink$wt), ylim = c(0, 0.011))
for(i in 1:100)lines(density(sim2[[i]]), col = "grey80", lty = 2)
```

It doesn't look horrible, but in the 100 - 200 range the observed data does not seem consistent with the model generated data. The `lmm5` model with treat does a better job of capturing the variation in rats _as they get heavier_.

### Confidence intervals on predictions

Another use for simulation is to approximate confidence intervals for predictions. Recall the usual `predict` method for `lmer()` models does not calculate confidence intervals.

To do this we'll use the `bootMer()` function that comes with lme4. This is like `simulate()` but goes a step further. After each simulation of new responses, _a new model is refit_. Once we fit a new model, we can make a prediction. We can then repeat this process many times to get many simulated predictions. We can then use all those predictions to make inference about the uncertainty of our predictions. This is referred to as _model-based bootstrapping_. And it's probably easier to understand with a demonstration.

Let's say we want to calculate a confidence interval on the difference in means between thiouracil and thyroxine at week 4 using our `lmm5` model.

Here's how we can get an estimate of the difference by simply subtracting the result from one `predict()` from another.

```{r}
nd1 <- data.frame(treat = "thiouracil", weeks = 4)
nd2 <- data.frame(treat = "thyroxine", weeks = 4)
predict(lmm5, newdata = nd1, re.form = NA) -
                     predict(lmm5, newdata = nd2, re.form = NA)
```

Rats on thiouracil are about 40 grams lighter than rats on thyroxine. But how can we get a confidence interval on that difference? Here's one way using `bootMer()`.

First we tell the `bootMer()` function to use the `lmm5` model to simulate data and then refit the model. The `FUN` argument tells `bootMer()` what to do with the newly fit model. In this case, make a prediction using our two datasets and take the difference. The `nsim` argument says do it 200 times. (We would typically do something like a 1000, but in the interest of time we do 200.). The `.progress` argument says output a text-based progress bar so we can keep tabs on progress.

```{r}
b.out <- bootMer(x = lmm5, 
                 FUN = function(x){
                   predict(x, newdata = nd1, re.form=NA) -
                     predict(x, newdata = nd2, re.form=NA)}, 
                 nsim = 200, 
                 .progress = "txt")
```

When this finishes running the `b.out` object is a list with several objects. The `t` object is the vector of differences. Using the `quantile()` function we can extract the 2.5% and 97.5% percentiles to construct a confidence interval.

```{r}
head(b.out$t)
quantile(b.out$t, probs = c(0.025, 0.975))
```

Compare the 95% bootstrap confidence interval to the 95% confidence interval returned by {emmeans} which uses something called the "Kenward-Roger approximation". The bootstrap CI is much tighter compared to the approximated CI [-60.0, -19.7].

```{r}
emm_out <- emmeans(lmm5, pairwise ~ treat, at = data.frame(weeks = 4)) |>
  confint()
emm_out$contrasts
```


## CODE ALONG 4

What's the difference in expected weight at week 4 versus week 3 for rats on thiouracil? Use `bootMer()` to calculate a confidence interval on the difference using the `lmm5` model.

Here's how we get one estimate using `predict()`

```{r}
nd <- data.frame(treat = "thiouracil", weeks = 3:4)
# diff(x1, x2) = x2 - x1
diff(predict(lmm5, newdata = nd, re.form = NA))
```

Now use `bootMer()` to perform this 200 times using newly fitted models from simulated responses. Save the result to `b.out2`. When done, use the quantile function to get an approximate confidence interval.


## We're done!

Thanks for coming. See the Appendix below for more topics.

For help and advice with statistics, contact us to set up an appointment: `statlab@virginia.edu`

Sign up for more workshops or see past workshops:
http://data.library.virginia.edu/training/

Register for the Research Data Services newsletter: http://data.library.virginia.edu/newsletters/

## References

- Faraway, J. (2016). _Extending the Linear Model with R, 2nd ed_. Chapman and Hall/CRC.
- Galecki, A. and Burzykowski T. (2013). _Linear Mixed-Effect Models Using R_. Springer.
- Pinheiro, J. & Bates, D. (2000). _Mixed-Effects Models in S and S-PLUS_. Springer.
- West, B., Welch, K., & Galecki, A. (2015) _Linear Mixed Models_. Chapman and Hall/CRC.
- GLMM FAQ: http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html


## Appendix

### REML vs ML

In the model output above it says "Linear mixed model fit by REML". REML is an acronym for _Restricted Maximum Likelihood_. This is a type of estimation procedure. An alternative estimation procedure is ML, or _Maximum Likelihood_. The ML procedure returns biased estimates of the random effects. They're estimated to be _smaller_ than they really are. REML generally returns less biased estimates. If data are _balanced_, that is we have equal number of observations on subjects/clusters, the REML and ML estimates are about the same. We can fit a model using ML by setting `REML = FALSE`

```{r}
me2 <- lmer(y ~ x + (1|id), data = d2, REML = FALSE)
```

Recall we simulated our data using random effects of 1.5 and 2.

The ML estimates are lower than the REML estimates. We can view the model random effects estimates using the `VarCorr()` function.

```{r}
# ML
VarCorr(me2)
```

Notice the REML estimates are slightly higher and closer to the true values.

```{r}
# REML
VarCorr(me1)
```

The fixed effect estimates are the same.

```{r}
cbind("me1" = fixef(me1), "me2" = fixef(me2))
```

So why is ML even available to use as an estimator if it returns biased estimators? It turns out that REML is not a suitable estimator when comparing models that differ _only in their fixed effects_. We will see this in action later in the workshop.

If you have a good amount of data and propose a reasonable model, the difference in ML and REML estimation will likely be slight. Doug Bates, the author of the {lme4} package, had this to say about REML on the R mixed models mailing list back in 2015:

"I will admit that the original sin, the REML criterion, was committed by statisticians. In retrospect I wish that we had not incorporated that criterion into the nlme and lme4 packages but, at the time we wrote them, our work would have been dismissed as wrong if our answers did not agree with SAS PROC MIXED, etc. So we opted for bug-for-bug compatibility with existing software."

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q3/023743.html


### Approximate p-values with lme4

The {lmerTest} package provides approximate p-values in {lme4} summary output. Simply load the package and fit your model as usual. Remember, they're approximate and not necessarily a good approximation.

```{r}
# install.packages("lmerTest")
library(lmerTest)
lmm3a <- lmer(wt ~ treat * weeks + (weeks | subject), data=ratdrink)
summary(lmm3a, corr = FALSE)
```


### Intraclass Correlation Coefficient (ICC)

The ICC describes the similarity of the dependent variable within a cluster or unit of analysis. In our examples above, the unit of analysis was a rat since we had multiple observations per rat. Examples of clusters include schools, hospitals, cites, plots of land, etc., where we would have multiple observations within each cluster. We might think of the ICC as the correlation among observations on the same unit or within the same cluster.

The ICC is usually calculated as the ratio of random effect variance to total variance. Therefore an ICC close to 1 means much of the variability is due to differences between units/clusters, which implies a great deal of similarity within units/clusters. An ICC close to 0 means very little of the variability is due to differences between units/clusters, which implies there is not much similarity within units/clusters.

Whether you report ICC depends on your field of study. Some fields give it a lot of emphasis, others not so much. 

Recall model `lmm1`. Let's calculate the ICC.

```{r}
summary(lmm1, corr = FALSE)
```

Use the variances listed under "Random effects":

```{r}
#  subject/(subject + Residual)
60.42/(60.42 + 105.15)
```

We can use the `icc()` function from the {performance} package to carry out this calculation. It is reported as "Adjusted ICC".

```{r}
# install.packages("performance")
performance::icc(lmm1)
```

The "Unadjusted ICC" includes fixed effects variance in the denominator. In other words, the ICC is not adjusted for fixed effects. We can calculate this by taking the variance of the fitted values generated without random effects.

```{r}
# re.form = NA means do not use random effects when making predictions
var(predict(lmm1, re.form = NA))
```

Add to denominator to calculate Unadjusted ICC:

```{r}
60.42/(60.42 + 105.15 + 1130.058)
```

Recall model `lmm3` which had two random effects and covariance between random effects. 

```{r}
summary(lmm3, corr = FALSE)
```

The ICC for this model is higher.

```{r}
performance::icc(lmm3)
```

The `lmm3` model allows for random effects on the intercept and the weeks coefficients. The variability of those random effects represent about 85% of the total variation in weight. This says there's a lot of similarity within the measurements of each rat, which makes sense.

For mixed-effect models with more than one random effect, ICC calculations are more complex and not as easy to replicate "by hand".

### Correlation of Fixed Effects

By default the `lmer()` summary output includes a section entitled "Correlation of Fixed Effects" at the bottom. 

```{r}
summary(lmm1)
```

These are the coefficient covariances rescaled to correlation. Below we extract the covariances using the `vcov()` function and then convert to correlation using the `cov2cor()` function to replicate the output.

```{r}
vcov(lmm1) |> cov2cor()
```

Imagine we fit our lmm1 model above many times, each time using a new random sample. We will get new estimates of fixed effects every time. The correlation of fixed effects gives us some sense of how those many fixed effect estimates might be associated. For example, it appears the coefficients for thyroxine and thiouracil are somewhat positively correlated (0.4537). If a new model fit based on another random sample produced a higher coefficients for thyroxine, we might expect the coefficient for thiouracil to be slightly higher as well.

NOTE: Examining the correlation of fixed effects is NOT the same as examining your model for collinearity. That involves the correlation of the predictors, not the model’s fixed effects.

For more information on the Correlation of Fixed Effects, see this article:
https://library.virginia.edu/data/articles/correlation-of-fixed-effects-in-lme4

### More diagnostics

The {DHARMa} package uses a simulation-based approach to generating residuals for mixed-effect/multilevel models. This can come in handy when you're fitting generalized mixed-effect models with binary or count responses. It also works with linear mixed-effect models. The basic idea is to simulate new data from the fitted model, say 250 data sets, which means we have 250 simulated values for each observed value. Then for _each data point_, we find the quantile of the observed data point _among the simulated values_ for that data point. So a quantile of 0.50 means half of the simulated values are larger than the observed value. If we have fit a good model, the distribution of the quantiles should be uniformly distributed.

Below is the basic use of the {DHARMa} package. First simulate the residuals using the `simulateResiduals()` function. The default is 250, which we make explicit below. We save the result as `simResids`.

```{r}
library(DHARMa)
simResids <- simulateResiduals(lmm3, n = 250, plot = F)
```

Then plot the results.

```{r}
plot(simResids)
```

The _plot on the left_ is a uniform distribution QQ plot to assess the distribution of the quantile residuals. If they follow a uniform distribution they should lie on a diagonal line. Three hypothesis tests are printed on the plot:

1. Kolmogorov-Smirnov (KS) Test: Null = residuals and theoretical uniform quantiles both drawn from same distribution. 
2. Dispersion Test: Null = residuals are not over- or under-dispersed. 
3. Outlier Test: Null = no outliers in residuals.

We fail to reject all three tests above, which is a good thing.

The _plot on the right_ shows the quantile residuals versus the model fitted values, which have been rank-ordered and divided by their maximum value to rescale to [0,1]. In addition, three quantile regressions are displayed for 0.25, 0.50, and 0.75. If the quantile residuals are uniformly distributed, these three fitted quantile regression lines should be approximately flat. Any outliers will be highlighted in red.

From the {DHARMa} vignette: "DHARMa only flags a difference between the observed and expected data - the user has to decide whether this difference is actually a problem for the analysis!" The plots neither prove the model is correct nor prove the model is wrong. They are simply one more tool to assess goodness of fit.

### Hypothesis tests for random effects

Comparing models that differ only in their random effects using a hypothesis test is tricky because it means testing if _variance equals 0_. This is a weird test since variance cannot be less than 0. We're at the boundary of possible values. This means it's not appropriate to use the hypothesis test implemented in the `anova()` function (though it will still run and provide results.)

Instead we turn to the {RLRsim} package, which provides simulation-based exact likelihood ratio tests for testing random effects. The function we want to use is `exactRLRT()`. 

The `exactRLRT()` function takes three arguments:

1. m: the model with just the random effect were testing (weeks)
2. mA: the full model with both random effects (intercept and weeks)
3. m0: the model with just the random effect that remains (intercept)

All models should have the same fixed effects.

The null of this test is that the weeks random effect is 0 and unnecessary. A small p-value provides evidence against this hypothesis.

We need to fit a model with just the weeks random effect. We'll call it "lmm_red". 

```{r}
lmm4_red <- lmer(wt ~ treat * weeks + (0 + weeks | subject), data=ratdrink)
```

Now we're ready to test whether or not we need the random effect on weeks.

```{r}
# install.packages("RLRsim")
library(RLRsim)
exactRLRT(m = lmm4_red, mA = lmm4, m0 = lmm2)
```

We have very strong evidence that the subject random effect on weeks is good to have in our model. We can reject the null that the random effect variance for weeks is 0. 

We didn't need to go to all this trouble. The confidence interval quickly answered this question and gave us additional information on the uncertainty and magnitude of the weeks random effect. Nevertheless, the above procedure is appropriate in the event you really need to use a formal hypothesis test to decide whether or not a random effect is significantly different from 0.

### AIC and BIC

AIC and BIC are basically log-likelihood measures penalized for number of model parameters. The formula is `-2*log-likelihood + k*npar`, where `npar` represents the number of parameters in the fitted model and `k` is the penalty per parameter.

The AIC penalty: k = 2
The BIC penalty: k = log(n), where n is number of observations.

Result of AIC function for `lmm3`

```{r}
AIC(lmm3)
```

To calculate AIC "by hand", we use the `logLik()` function to calculate the log likelihood and then extract the `df` and `n` attributes.

```{r}
# get log-likelihood
ll <- logLik(lmm3)
# get npar
npar <- attr(ll, "df")
# get n
n <- attr(ll, "nobs")

```

AIC by hand:

```{r}
-2*ll + 2 * npar
```

To strip the formatting we can wrap in `c()`

```{r}
c(-2*ll + 2 * npar)
```

BIC

```{r}
BIC(lmm3)
```

To get BIC by hand

```{r}
c(-2*ll + log(n) * npar)
```

### Reporting mixed-effect model output

There is no universally agreed-upon method to report mixed-effect model results. Some journals prefer every last detail, while others want to see a few selected estimates.

The {modelsummary} package provides a handy function called `modelsummary()` to report the results for several different models. Below we report the results for three of the models we fit earlier. Notice we need to wrap multiple models in a list. Notice it also includes approximate R-squared values. The marginal R2 considers only the variance of the fixed effects (without the random effects), while the conditional R2 takes both the fixed and random effects into account (ie, the total model).

```{r}
library(modelsummary)
modelsummary(list("Model 1" = lmm1, 
                  "Model 2" = lmm2, 
                  "Model 3" = lmm3), 
             output = "markdown")
```

The {modelsummary} package is very powerful and still evolving. See their web page for more information: https://modelsummary.com/index.html