From ce77ed62176c0cd48f7f8445131dce1dee10cdee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Facundo=20Mu=C3=B1oz?= <facundo.munoz@cirad.fr>
Date: Sun, 23 Jun 2024 12:16:04 +0200
Subject: [PATCH] Remove old version copies of vignettes.

---
 vignettes/p10.Rmd |  291 ---------
 vignettes/p8.Rmd  | 1451 ---------------------------------------------
 vignettes/p9.Rmd  |  464 ---------------
 3 files changed, 2206 deletions(-)
 delete mode 100644 vignettes/p10.Rmd
 delete mode 100644 vignettes/p8.Rmd
 delete mode 100644 vignettes/p9.Rmd

diff --git a/vignettes/p10.Rmd b/vignettes/p10.Rmd
deleted file mode 100644
index 0b9cdec..0000000
--- a/vignettes/p10.Rmd
+++ /dev/null
@@ -1,291 +0,0 @@
----
-title: 'Practical 7: Bayesian Hierarchical Modelling'
-author: "VIBASS"
-output:
-  html_vignette:
-    fig_caption: yes
-    number_sections: yes
-    toc: yes
-    fig_width: 6
-    fig_height: 4
-vignette: >
-  %\VignetteIndexEntry{Practical 7: Bayesian Hierarchical Modelling}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
----
-
-# Bayesian Hierarchical Modelling
-
-In this last practical we will consider the analysis of Bayesian hierarchical
-models. As explained in the previous lecture, hierarchical models provide a
-convenient tool to define models so that the different sources of variation in
-the data are clearly identified. Bayesian inference for highly structured
-hierarchical models can be difficult and may require the use of Markov chain
-Monte Carlo methods. However, packages such as `BayesX` and `INLA`, to mention
-just two, provide a very convenient way to fit and make inference about certain
-types of Bayesian hierarchical models.
-
-Regarding software, we will use `INLA` to fit the models in the examples from
-this point on. The reason is that it is a popular software for Bayesian
-inference and very fast.
-
-# Linear Mixed Models
-
-Linear mixed models were defined in the lecture as follows:
-
-$$
-Y_{ij} = X_{ij}\beta +\phi_i+\epsilon_{ij}
-$$
-
-Here, Y_{ij} represents observation $j$ in group $i$, X_{ij} are a vector of
-covariates with coefficients $\beta$, $\phi_i$ i.i.d. random effects and
-$\epsilon_{ij}$ a Gaussian error term. The distribution of the random effects
-$\phi_i$ is Gaussian with zero mean and precision $\tau_{\phi}$.
-
-
-# Multilevel Modelling
-
-Multilevel models are a particular type of mixed-effects models in which
-observations are nested within groups, so that group effects are modelled using
-random effects. A typical example is that of students nested within classes.
-
-For the next example, the `nlschools` data set (in package `MASS`) will be used.
-This data set records data about students' performance (in particular, about a
-language score test) and other variables. The variables in this data set are:
-
-* `lang`, language score test.
-
-* `IQ`, verbal IQ.
-
-* `class`, class ID.
-
-* `GS`, class size as number of eighth-grade pupils recorded in the class.
-
-* `SES`, social-economic status of pupil’s family.
-
-* `COMB`, whether the pupils are taught in the multi-grade class with 7th-grade students.
-
-The data set can be loaded and summarised as follows:
-
-```{r}
-library("MASS")
-data("nlschools")
-summary(nlschools)
-```
-
-The model to fit will take `lang` as the response variable and include
-`IQ`, `GS`, `SES` and `COMB` as covariates (i.e., fixed effects). This model
-can easily be fit with `INLA` as follows:
-
-```{r message = FALSE, warning = FALSE}
-library("INLA")
-m1 <- inla(lang ~ IQ + GS +  SES + COMB, data = nlschools)
-
-summary(m1)
-```
-
-Note that the previous model only includes fixed effects. The data set includes
-`class` as the class ID to which each student belongs. Class effects can have an
-impact on the performance of the students, with students in the same class
-performing similarly in the language test.
-
-Very conveniently, `INLA` can include random effects in the model by adding a
-term in the right hand side of the formula that defined the model. Specifically, 
-the term to add is `f(class, model = "iid")` (see code below for the
-full model).
-This will create a random effect indexed over variable `class` and which is of
-type `iid`, i.e., the random effects are independent and identically distributed
-using a normal distribution with zero mean and precision $\tau$.
-
-Before fitting the model, the between-class variability can be explored by
-means of boxplots:
-
-```{r fig = TRUE, fig.width = 15, fig.height = 5}
-boxplot(lang ~ class, data = nlschools, las = 2)
-```
-
-The code to fit the model with random effects is:
-
-```{r}
-m2 <- inla(
-  lang ~ IQ + GS +  SES + COMB + f(class, model = "iid"),
-  data = nlschools
-)
-
-summary(m2)
-```
-
-
-# Generalised Linear Mixed Models
-
-Mixed effects models can also be considered within the context of generalised 
-linear models. In this case, the linear predictor of observation $i$, $\eta_i$,
-can be defined as
-
-$$
-\eta_i = X_{ij}\beta +\phi_i
-$$
-
-Compared to the previous setting of linear mixed effects models, note that now
-the distribution of the response could be other than Gaussian and that
-observations are not necessarily nested within groups.
-
-# Poisson regression
-
-
-In this practical we will use the North Carolina Sudden Infant Death Syndrome
-(SIDS) data set. It is available in the `spData` package and it can be loaded
-using:
-
-```{r message = FALSE, warning = FALSE}
-library(spData)
-data(nc.sids)
-summary(nc.sids)
-```
-
-A full description of the data set is provided in the associated manual page
-(check with `?nc.sids`) but in this practical we will only consider these
-variables:
-
-* `BIR74`,  number of births (1974-78).
-
-* `SID74`,  number of SID deaths (1974-78).
-
-* `NWBIR74`, number of non-white births (1974-78).
-
-These variables are measured at the county level in North Carolina, of which
-there are 100.
-
-Because `SID74` records the number of SID deaths, the model is Poisson:
-
-$$
-O_i \mid \mu_i \sim Po(\mu_i),\ i=1,\ldots, 100
-$$
-Here, $O_i$ represents the number of cases in county $i$ and $\mu_i$ the mean.
-In addition, mean $\mu_i$ will be written as $\mu_i = E_i \theta_i$, where $E_i$
-is the *expected* number of cases and $\theta_i$ the relative risk in county $i$.
-
-The relative risk $\theta_i$ is often modelled, on the log-scale, to be equal
-to a linear predictor:
-
-$$
-\log(\theta_i) = \beta_0 + \ldots
-$$
-
-The expected number of cases is computed by multiplying the number of births in
-county $i$ to the overall mortality rate
-
-$$
-r = \frac{\sum_{i=1}^{100}O_i}{\sum_{i=1}^{100}B_i}
-$$
-where $B_i$ represents the total number of births in country $i$. Hence,
-the expected number of cases in county $i$ is $E_i = r B_i$.
-
-
-```{r}
-# Overall mortality rate
-r74 <- sum(nc.sids$SID74) / sum(nc.sids$BIR74)
-# Expected cases
-nc.sids$EXP74 <- r74 * nc.sids$BIR74
-```
-
-A common measure of relative risk is the *standardised mortality ratio*
-($O_i / E_i$):
-
-```{r}
-nc.sids$SMR74 <- nc.sids$SID74 / nc.sids$EXP74
-```
-
-A summary of the SMR can be obtained:
-
-```{r fig = TRUE}
-hist(nc.sids$SMR, xlab = "SMR")
-```
-
-Values above 1 indicate that the county has more observed deaths than expected
-and that there might be an increased risk in the area.
-
-
-As a covariate, we will compute the proportion of non-white births:
-
-```{r}
-nc.sids$NWPROP74 <- nc.sids$NWBIR74/ nc.sids$BIR74
-```
-
-There is a clear relationship between the SMR and the proportion of non-white
-births in a county:
-
-```{r fig = TRUE}
-plot(nc.sids$NWPROP74, nc.sids$SMR74)
-
-# Correlation
-cor(nc.sids$NWPROP74, nc.sids$SMR74)
-```
-
-
-A simple Poisson regression can be fit by using the following code:
-
-```{r}
-m1nc <- inla(
-  SID74 ~ 1 + NWPROP74,
-  family = "poisson",
-  E = nc.sids$EXP74,
-  data = nc.sids
-)
-summary(m1nc)
-```
-
-Random effects can also be included to account for intrinsic differences between
-the counties:
-
-```{r}
-# Index for random effects
-nc.sids$ID <- 1:nrow(nc.sids)
-
-# Model WITH covariate
-m2nc <- inla(
-  SID74 ~  1 + NWPROP74 + f(ID, model = "iid"),
-  family = "poisson",
-  E = nc.sids$EXP74,
-  data = as.data.frame(nc.sids)
-)
-
-summary(m2nc)
-```
-
-The role of the covariate can be explored by fitting a model without it:
-
-
-```{r}
-# Model WITHOUT covariate
-m3nc <- inla(
-  SID74 ~  1 + f(ID, model = "iid"),
-  family = "poisson",
-  E = nc.sids$EXP74,
-  data = as.data.frame(nc.sids)
-)
-
-summary(m3nc)
-```
-
-Now, notice the decrease in the estimate of the precision of the random effects
-(i.e., the variance increases). This means that values of the random effects
-are now larger than in the previous case as the random effects pick some of the
-effect explained by the covariate.
-
-
-```{r fig = TRUE}
-par(mfrow = c(1, 2))
-boxplot(m2nc$summary.random$ID$mean, ylim = c(-1, 1), main = "With NWPROP74")
-boxplot(m3nc$summary.random$ID$mean, ylim = c(-1, 1), main = "Without NWPROP74")
-```
-
-# Further Extensions
-
-Spatial random effects can be defined not to be independent and identically
-distributed. Instead, spatial or temporal correlation can be considered when
-defining them. For example, in the North Carolina SIDS data set, it is common
-to consider that two counties that are neighbours (i.e., share a boundary)
-will have similar relative risks. This can be taken into account in the model
-but assuming that the random effects are spatially autocorrelated. This is out
-of the scope of this introductory course but feel free to ask about this!!
diff --git a/vignettes/p8.Rmd b/vignettes/p8.Rmd
deleted file mode 100644
index 3ed85cf..0000000
--- a/vignettes/p8.Rmd
+++ /dev/null
@@ -1,1451 +0,0 @@
----
-title: 'Practical 5: Numerical approaches'
-author: "VIBASS"
-output:
-  html_vignette:
-    fig_caption: yes
-    number_sections: yes
-    toc: yes
-    fig_width: 6
-    fig_height: 4
-vignette: >
-  %\VignetteIndexEntry{Practical 5: Numerical approaches}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
----
-
-
-# Introduction
-
-In previous practicals you have used Bayesian models with conjugate
-priors where the posterior distribution can be easily worked out. In
-general, this is seldom the case and other approaches need to be
-considered. In particular, Importance Sampling and Markov Chain Monte
-Carlo (MCMC) methods can be used to draw samples from the
-posterior distribution that are in turn used to obtain estimates of the
-posterior mean and variance and other quantities of interest.
-
-# Importance Sampling
-
-As described in the previous lecture, Importance Sampling (IS) is an
-algorithm to estimate some quantities of interest of a target
-distribution by sampling from a different (proposal) distribution and
-reweighting the samples using importance weights. In Bayesian inference,
-IS can be used to sample from the posterior distribution when the
-normalizing constant is not known because
-$$
-\pi(\theta \mid \mathbf{y}) \propto L(\theta \mid \mathbf{y}) \pi(\theta),
-$$
-where $\mathbf{y}$ represents the observed data,
-$L(\theta \mid \mathbf{y})$ the likelihood function and 
-$\pi(\theta)$ the prior distribution on $\theta$.
-
-If $g(\cdot)$ is a proposal distribution, and $\{\theta^{(m)}\}_{m=1}^M$
-are $M$ samples from that distribution, then the importance weights are
-$$
-w(\theta^{(m)}) = \frac{L(\theta^{(m)} \mid \mathbf{y})\,\pi(\theta^{(m)})}
-  {g(\theta^{(m)})} .
-$$
-\noindent
-When the normalizing constant in the posterior distribution is not
-known, the importance weights are rescaled to sum to one. Note that this
-rescaling is done by the denominator in the expression at point 2 on
-slide 9/22 of the Numerical Approaches slides you have just seen. In practice,
-rescaling removes the need for the denominator and simplifies the
-calculations throughout (we do it once, rather than every time).
-
-Hence, the posterior mean can be computed as
-$$
-E\left(\theta \mid \mathbf{y}\right) = \mu = \int \theta\, \pi(\theta \mid \mathbf{y}) \mathop{d\theta}
-  \simeq \sum_{m=1}^M \theta^{(m)}\, w(\theta^{(m)}) = \hat{\mu}.
-$$
-\noindent
-Similarly, the posterior variance can be computed as
-$$
-\mbox{Var}\left(\theta \mid \mathbf{y}\right) = 
-\sigma^2 = \int (\theta - \mu)^2\, \pi(\theta \mid \mathbf{y}) \mathop{d\theta}
-  \simeq \sum_{m=1}^M (\theta^{(m)})^2\, w(\theta^{(m)}) - \hat{\mu}^2 .
-$$
-
-
-
-# The Metropolis-Hastings Algorithm
-
-The Metropolis-Hastings (M-H) algorithm is a popular MCMC method to
-obtain samples from the posterior distribution of an ensemble of
-parameters.
-In the examples below we will only consider models with one parameter,
-but the M-H algorithm can be used on models with a large number of
-parameters.
-
-The M-H algorithm works in a very simple way. At every step of the 
-algorithm a new movement is proposed using a *proposal distribution*.
-This movement is accepted with a known probability, which implies that
-the movement can be rejected so that the algorithm stays at the same
-state in the current iteration.
-
-Hence, in order to code the M-H algorithm for a set of parameters
-$\theta$ we need to define:
-
-* A function to draw observations from the proposal distribution, given
-its current state. This will be denoted by $q(\cdot\mid\cdot)$, so that
-the density of a new proposal $\theta^*$ given a current state 
-$\theta^{(m)}$ is given by $q(\theta^*\mid\theta^{(m)})$.
-
-From the Bayesian model, we already know:
-
-* A prior distribution on the parameters of interest, i.e.,
-$\pi(\theta)$.
-
-* The likelihood of the parameter $\theta$ given the observed data
-$\mathbf{y}$, i.e., $L(\theta\mid\mathbf{y})$.
-
-At step $m$, a new value is drawn from $q(\cdot\mid\theta^{(m)})$ and it
-is accepted with probability:
-
-$$
-\alpha = \min\left\{1,
-\frac{L(\theta^*\mid\mathbf{y})\,\pi(\theta^{*})\,q(\theta^{(m)}
-\mid\theta^{*})}
-{L(\theta^{(m)}\mid\mathbf{y})\,\pi(\theta^{(m)})\,q(\theta^{*}
-\mid\theta^{(m)})}\right\}
-$$
-
-
-If the value is accepted, then the current state is set to the proposed
-value, i.e., $\theta^{(m+1)} = \theta^{*}$. 
-Otherwise we keep the previous value, so
-$\theta^{(m+1)} = \theta^{(m)}$.
-
-# Example: Binomial-Beta Model
-
-The first example that will be considered is based on the data set
-collected on the number of red M&M's in tube $i$ with $n_i$ M&M's.
-The number of red M&M's obtained depends on the total number of M&M's
-extracted and the actual proportion of red ones $\theta$.
-Given the proportion $\theta$, the number of red M&M's obtained
-follows a Binomial distribution. Because $\theta$ takes values between
-0 and 1, a sensible choice for a prior is the Beta distribution.
-
-The model can be stated as follows:
-$$
-\begin{array}{rcl}
-Y_i \mid \theta & \sim & \text{Bi}(n_i,\, \theta)\\
-\theta & \sim & \text{Be}(\alpha,\, \beta)
-\end{array}
-$$
-to allow for multiple tubes.
-
-In particular, we will consider a vague uniform prior in the $[0,1]$ interval, which corresponds to $\alpha=\beta=1$.
-
-You can use the following data set for this exercise:
-
-```{r eval = TRUE}
-mmdata <- data.frame(MMs = c(20, 22, 24), red = c(5, 8, 9))
-```
-
-These data reproduce different counts of red M&M's in three different
-tubes. Variable `MMs` records the total number of M&M's $n_i$ in tube $i$
-and `red` the actual number of red M&M's ($y_i$ in the model).
-
-## Importance sampling
-
-Although the posterior distribution is known in closed form, IS can
-be used to estimate the posterior mean and variance. Given that
-the parameter $\theta$ is bounded, a uniform distribution in the interval $[0,1]$ will
-be used. This is probably not very efficient (as it is likely not
-to be close to the actual posterior) but it will provide a straightforward simulation strategy.
-
-```{r}
-n_simulations <- 10000
-set.seed(12)
-theta_sim <- runif(n_simulations)
-```
-
-Next, importance weights are computed in two steps. First, the
-ratio between the likelihood times the prior and the density of the
-proposal distribution is computed.
-Secondly, weights are re-scaled to sum to one.
-
-```{r}
-# Log-Likelihood (for each value of theta_sim)
-loglik_binom <- sapply(theta_sim, function(theta) {
-  sum(dbinom(mmdata$red, mmdata$MMs, theta, log = TRUE))
-})
-
-# Log-weights: log-lik + log-prior - log-proposal_distribution
-log_ww <- loglik_binom + dbeta(theta_sim, 1, 1, log = TRUE) - log(1)
-
-# Re-scale weights to sum up to one
-log_ww <- log_ww - max(log_ww)
-ww <- exp(log_ww)
-ww <- ww / sum(ww)
-
-```
-
-
-The importance weights can be summarized using a histogram (see below).
-The distribution of weights shows that most samples are far from the
-regions of high posterior density.
-
-
-```{r fig = TRUE}
-hist(ww, xlab = "Importance weights")
-``` 
-
-The posterior mean and variance can be computed as follows:
-
-```{r}
-# Posterior mean
-post_mean <- sum(theta_sim * ww)
-post_mean 
-
-# Posterior variance
-post_var <- sum(theta_sim^2 * ww)- post_mean^2
-post_var
-```
-
-
-Finally, an estimate of the posterior density
-$\pi(\theta \mid \mathbf{y})$ 
-of the parameter can be obtained by using *weighted* kernel density
-estimation.
-
-\hline
-**Aside: weighted kernel density estimation** Standard kernel
-density estimation is a way of producing a non-parametric estimate of
-the distribution of a continuous quantity given a sample. A *kernel*
-function is selected (typically a Normal density), and one of these is
-placed centred on each sample point. The sum of these functions produces
-the kernel density estimate (after scaling - dividing by the number of
-sample points). A *weighted* kernel density estimate simply includes
-weights in the sum of the kernel functions. In both weighted and
-unweighted forms of kernel density estimation, the key parameter
-controlling the smoothness of the resulting density estimate is the
-*bandwidth* (equivalent to the standard deviation if using a Normal
-kernel function); larger values give smoother density estimates, and
-smaller values give *noisier* densities.
-\hline
-
-```{r fig = TRUE}
-plot(density(theta_sim, weights = ww, bw = 0.01),
-     main = "Posterior density")
-```
-
-Note that the value of the bandwidth used (argument `bw`) has been
-set manually to provide a sensible, smooth curve.
-
-Similarly, a sample from the posterior distribution can be obtained by 
-resampling the original values of theta with their corresponding weights.
-
-```{r}
-post_theta_sim <- sample(theta_sim, prob = ww, replace = TRUE)
-hist(post_theta_sim, freq = FALSE)
-```
-
-
-## Metropolis-Hastings
-
-As stated above, the implementation of the M-H algorithm requires a
-proposal distribution to obtain new values of the parameter $\theta$.
-Usually, the proposal distribution is defined so that the proposed
-movement depends on the current value. 
-However, in this case we will use a uniform
-distribution between 0 and 1 as our proposal distribution.
-
-First of all, we will define the proposal distribution, prior
-and likelihood of the model:
-
-```{r}
-# Proposal distribution: sampling
-rq <- function() {
-  runif(1)
-}
-
-# Proposal distribution: log-density
-logdq <- function(x) {
-  dunif(x, log = TRUE)
-}
-
-# Prior distribution: Beta(1, 1), log-density
-logprior <- function(theta) {
-  dbeta(theta, 1, 1, log = TRUE)
-}
-
-# Log-Likelihood
-loglik <- function(y, theta, N) {
-   res <- sum(dbinom(y, N, theta, log = TRUE)) 
-}
-```
-
-Note that all densities and the likelihood are computed on the log-scale.
-
-Next, an implementation of the M-H algorithm is as follows:
-
-```{r}
-# Number of iterations
-n.iter <- 40500
-
-# Simulations of the parameter
-theta <- rep(NA, n.iter)
-
-# Initial value
-theta[1] <- 0.5
-
-# Data
-y <- mmdata$red
-N <- mmdata$MMs
-
-for(i in 2:n.iter) {
-  new.theta <- rq()
-
-  # Log-Acceptance probability
-  logacc.prob <- loglik(y, new.theta, N) + logprior(new.theta) + logdq(theta[i - 1])
-  logacc.prob <- logacc.prob - loglik(y, theta[i - 1], N) - logprior(theta[i - 1]) - 
-    logdq(new.theta)
-  logacc.prob <- min(0, logacc.prob) # Note that 0 = log(1)
-
-  if(log(runif(1)) < logacc.prob) {
-    # Accept
-    theta[i] <- new.theta
-  } else {
-    # Reject
-    theta[i] <- theta[i - 1]
-  }
-}
-```
-
-The simulations we have generated are not independent of one another; each
-is dependent on the previous one. This has two consequences: the chain is
-dependent on the initial, starting value of the parameter(s); and the sampling
-chain itself will exhibit *autocorrelation*.
-
-For this reason, we will remove the first 500 iterations to reduce the 
-dependence of the sampling on the starting value; and we will keep only every
-10^th^ simulation to reduce the autocorrelation in the sampled series.
-The 500 iterations we discard are known as the **burn-in** sample, and
-the process of keeping only every 10^th^ value is called **thinning**.
-
-After that, we will compute summary statistics
-and display a density of the simulations.
-
-```{r}
-# Remove burn-in
-theta <- theta[-c(1:500)]
-
-# Thinning
-theta <- theta[seq(1, length(theta), by = 10)]
-
-# Summary statistics
-summary(theta)
-
-par(mfrow = c(1, 2))
-plot(theta, type = "l", main = "MCMC samples", ylab = expression(theta))
-plot(density(theta), main = "Posterior density", xlab = expression(theta))
-```
-
-## Exercises
-
-
-### Performance of the proposal distribution
-
-The proposal distribution plays a crucial role in IS and it should be
-as close to the posterior as possible. As a way of measuring how good
-a proposal distribution is, it is possible to compute the
-*effective* sample size as follows:
-
-$$
-\text{ESS} = \frac{(\sum_{m=1}^M w(\theta^{(m)}))^2}{\sum_{m=1}^M w(\theta^{(m)})^2}.
-$$
-
-* Compute the effective sample size for the previous example.
-How is this related to the number of IS samples (`n_simulations`)?
-
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    ESS <- function(ww){
-      (sum(ww)^2)/sum(ww^2)
-    }
-    ESS(ww)
-    n_simulations
-    ```
-    
-    </details>
-
-
-* Use a different proposal distribution and check how sampling weights,
-ESS and point estimates differ from those in the current example. For
-example, a $\text{Be}(20,\, 10)$ will put more mass on high values of
-$\theta$, unlike the actual posterior distribution.
-What differences do you find with
-the example presented here using a uniform proposal distribution? Why
-do you think that these differences appear?
-
-
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    n_simulations <- 10000
-    set.seed(12)
-    theta_sim <- rbeta(n_simulations, 20, 10)
-    loglik_binom <- sapply(theta_sim, function(theta) {
-      sum(dbinom(mmdata$red, mmdata$MMs, theta, log = TRUE))
-    })
-    log_ww <- loglik_binom + dbeta(theta_sim, 1, 1, log = TRUE) - dbeta(theta_sim, 20, 10, log = TRUE)
-    log_ww <- log_ww - max(log_ww)
-    ww <- exp(log_ww)
-    ww <- ww / sum(ww)
-    ```
-    
-    ```{r fig = TRUE}
-    hist(ww, xlab = "Importance weights")
-    ```
-    
-    ```{r}
-    (post_mean <- sum(theta_sim * ww))
-    (post_var <- sum(theta_sim^2 * ww)- post_mean^2)
-    ```
-    
-    ```{r fig = TRUE}
-    plot(density(theta_sim, weights = ww, bw = 0.025), main = "Posterior density")
-    ```
-    
-    ```{r}
-    ESS(ww)
-    n_simulations
-    ```
-    
-    </details>
-
-
-
-### Sampling from the actual posterior
-
-Use the posterior distribution (which for this particular case is known
-in a closed form) as the proposal distribution.
-
-* What is the distribution of the importance weights now?
-    
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    n_simulations <- 10000
-    set.seed(12)
-    theta_sim <- rbeta(n_simulations, sum(mmdata$red) + 1, sum(mmdata$MMs) -sum(mmdata$red) + 1)
-    loglik_binom <- sapply(theta_sim, function(theta) {
-      sum(dbinom(mmdata$red, mmdata$MMs, theta, log = TRUE))
-    })
-    log_ww <- loglik_binom + dbeta(theta_sim, 1, 1, log = TRUE) - dbeta(theta_sim, sum(mmdata$red) + 1, sum(mmdata$MMs) -sum(mmdata$red) + 1, log = TRUE)
-    log_ww <- log_ww - max(log_ww)
-    ww <- exp(log_ww)
-    ww <- ww / sum(ww)
-    ```
-    
-    ```{r fig = TRUE}
-    hist(ww, xlab = "Importance weights")
-    ```
-    
-    ```{r}
-    (post_mean <- sum(theta_sim * ww))
-    (post_var <- sum(theta_sim^2 * ww)- post_mean^2)
-    ```
-    
-    ```{r fig = TRUE}
-    plot(density(theta_sim, weights = ww, bw = 0.01), main = "Posterior density")
-    ```
-    
-    </details>
-
-
-* Compute the effective sample size. How large is it? Why do you think this happens?
-    
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    ESS(ww)
-    n_simulations
-    ```
-    
-    </details>
-
-
-### Non-conjugate prior
-
-IS and M-H are algorithms that can be used to make inference about
-$\theta$ when the posterior density of the parameter is not available
-in closed form. This is the case with models which have non-conjugate
-priors. As an exercise, try to obtain the posterior density of the same
-model with the following non-conjugate prior:
-
-$$
-\pi(\theta) \propto (1-\theta)^2,\quad \theta\in[0,1].
-$$
-
-This prior has the following shape:
-
-```{r}
-curve((1-x)^2, xlab = expression(theta), ylab = NA, yaxt = 'n', bty = 'n')
-```
-
-The interpretation of this prior is that smaller values are favoured over
-higher values, i.e., our prior information is that the parameter is more
-likely to have values close to 0 than values close to 1.
-
-Note that the prior is specified up to a normalising constant (which in
-this case is not difficult to compute). However, this constant is not
-needed to implement both the IS and M-H algorithms. In the case of IS,
-the constant will cancel when the weights are re-scaled to sum to one
-and in the case of the M-H algorithm the constant will cancel when the
-acceptance ratio is computed.
-
-* Use the IS algorithm to compute and plot the weights using this
-non-conjugate distribution, using them to estimate the posterior mean
-and variance. Finally, show the posterior density estimate.
-
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    n_simulations <- 10000
-    set.seed(12)
-    theta_sim <- rbeta(n_simulations,20,10)
-    loglik_binom <- sapply(theta_sim, function(theta) {
-      sum(dbinom(mmdata$red, mmdata$MMs, theta, log = TRUE))
-    })
-    log_ww <- loglik_binom + 2*log(1-theta_sim) - dbeta(theta_sim, 20, 10, log = TRUE)
-    log_ww <- log_ww - max(log_ww)
-    ww <- exp(log_ww)
-    ww <- ww / sum(ww)
-    ```
-
-    ```{r fig = TRUE}
-    hist(ww, xlab = "Importance weights")
-    ```
-    
-    ```{r}
-    (post_mean <- sum(theta_sim * ww))
-    (post_var <- sum(theta_sim^2 * ww)- post_mean^2)
-    ```
-    
-    ```{r fig = TRUE}
-    plot(density(theta_sim, weights = ww, bw = 0.025), main = "Posterior density")
-    ```
-    
-    </details>
-
-* Use the M-H algorithm to produce the posterior density estimate.
-
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    # Proposal distribution: sampling
-    rq <- function() {
-      runif(1)
-    }
-
-    # Proposal distribution: log-density
-    logdq <- function(x) {
-      dunif(x, log = TRUE)
-    }
-
-    # Prior distribution: (1-theta)^2, log-density
-    logprior <- function(theta) {
-      2*log(1-theta)
-    }
-
-    # Log-Likelihood
-    loglik <- function(y, theta, N) {
-    res <- sum(dbinom(y, N, theta, log = TRUE))
-    }
-    ```
-
-    ```{r}
-    # Number of iterations
-    n.iter <- 40500
-
-    # Simulations of the parameter
-    theta <- rep(NA, n.iter)
-
-    # Initial value
-    theta[1] <- 0.5
-
-    # Data
-    y <- mmdata$red
-    N <- mmdata$MMs
-
-    for(i in 2:n.iter) {
-      new.theta <- rq()
-
-      # Log-Acceptance probability
-      logacc.prob <- loglik(y, new.theta, N) + logprior(new.theta) + logdq(theta[i - 1])
-      logacc.prob <- logacc.prob - loglik(y, theta[i - 1], N) - logprior(theta[i - 1]) - 
-      logdq(new.theta)
-      logacc.prob <- min(0, logacc.prob) # Note that 0 = log(1)
-
-      if(log(runif(1)) < logacc.prob) {
-        # Accept
-        theta[i] <- new.theta
-      } else {
-        # Reject
-        theta[i] <- theta[i - 1]
-      }
-    }
-    ```
-
-    ```{r}
-    # Remove burn-in
-    theta <- theta[-c(1:500)]
-
-    # Thinning
-    theta <- theta[seq(1, length(theta), by = 10)]
-
-    # Summary statistics
-    summary(theta)
-    ```
-
-    ```{r fig = TRUE}
-    par(mfrow = c(1, 2))
-    plot(theta, type = "l", main = "MCMC samples", ylab = expression(theta))
-    plot(density(theta), main = "Posterior density", xlab = expression(theta))
-    ```
-    
-    </details>
-
-# Example: Poisson-Gamma Model
-
-The second example will be based on the *Game of Thrones* data set.
-Remember that this is made of the observed number of u's on
-a page of a book of Game of Thrones. The model can be stated as:
-
-$$
-\begin{array}{rcl}
-y_i \mid \lambda & \sim & \text{Po}(\lambda)\\
-\lambda & \sim & \text{Ga}(\alpha,\, \beta)
-\end{array}
-$$
-
-In particular, the prior on $\lambda$ will be a Gamma distribution with
-parameters $0.01$ and $0.01$, which is centred at 1 and has a small
-precision (i.e., large variance).
-
-We will denote the observed values by `y` in the `R` code. The data collected can be loaded with:
-
-```{r eval = TRUE}
-GoTdata <- data.frame(Us = c(25, 29, 27, 27, 25, 27, 22, 26, 27, 29, 23,
-                          28, 25, 24, 22, 25, 23, 29, 23, 28, 21, 29,
-                          28, 23, 28))
-y <- GoTdata$Us
-```
-
-
-## Importance sampling
-
-Now the parameter of interest is not bounded, so the proposal
-distribution needs to be chosen with care. We will use a log-Normal
-distribution with mean 3 and standard deviation equal to 0.5 in the log
-scale. This will ensure that all the sampled values are positive (because
-$\lambda$ cannot take negative values) and that the sample values are
-reasonable (i.e., they are not too small or too large). Note that this
-proposal distribution has been chosen having in mind the problem at hand
-and that it may not work well with other problems.
-
-
-```{r}
-n_simulations <- 10000
-set.seed(1)
-lambda_sim <- rlnorm(n_simulations, 3, 0.5)
-```
-
-Next, importance weights are computed in two steps, as in the previous
-example. Note that now the likelihood, prior and proposal distribution
-are different from the ones in the Binomial example.
-
-```{r}
-# Log-Likelihood (for each value of lambda_sim)
-loglik_pois <- sapply(lambda_sim, function(LAMBDA) {
-  sum(dpois(GoTdata$Us, LAMBDA, log = TRUE))
-})
-
-# Log-weights: log-lik + log-prior - log-proposal_distribution
-log_ww <- loglik_pois + dgamma(lambda_sim, 0.01, 0.01, log = TRUE) - dlnorm(lambda_sim, 3, 0.5, log = TRUE)
-
-# Re-scale weights to sum up to one
-log_ww <- log_ww - max(log_ww)
-ww <- exp(log_ww)
-ww <- ww / sum(ww)
-
-```
-
-
-The importance weights can be summarized using a histogram:
-
-
-```{r fig = TRUE}
-hist(ww, xlab = "Importance weights")
-```
-
-The posterior mean and variance can be computed as follows:
-
-```{r}
-# Posterior mean
-(post_mean <- sum(lambda_sim * ww))
-
-# Posterior variance
-(post_var <- sum(lambda_sim^2 * ww)- post_mean^2)
-```
-
-
-Finally, an estimate of the posterior density of the parameter can be
-obtained by using weighted kernel density estimation:
-
-
-```{r fig = TRUE}
-plot(density(lambda_sim, weights = ww, bw = 0.5) , main = "Posterior density", xlim = c(10,40))
-```
-
-Note that the value of the bandwidth used (argument `bw`) has been set
-manually to provide a realistically smooth density function.
-
-Similarly, a sample from the posterior distribution can be obtained by 
-resampling the original values of $\theta$ with their corresponding
-weights.
-
-```{r}
-post_lambda_sim <- sample(lambda_sim, prob = ww, replace = TRUE)
-hist(post_lambda_sim, freq = FALSE)
-```
-
-
-
-
-
-## Metropolis-Hastings
-
-Similarly to the previous example, we will set the proposal
-distribution, as the model has been fully defined above. 
-In this case, the proposal distribution is a log-Normal distribution
-centred at the logarithm of the current value with precision $100$.
-
-
-```{r}
-# Proposal distribution: sampling
-rq <- function(lambda) {
-  rlnorm(1, meanlog = log(lambda), sdlog = sqrt(1 / 100))
-}
-
-# Proposal distribution: log-density
-logdq <- function(new.lambda, lambda) {
-  dlnorm(new.lambda, meanlog = log(lambda), sdlog = sqrt(1 / 100), log = TRUE)
-}
-
-# Prior distribution: Ga(0.01, 0.01)
-logprior <- function(lambda) {
-  dgamma(lambda, 0.01, 0.01, log = TRUE)
-}
-
-# LogLikelihood
-loglik <- function(y, lambda) {
-   res <- sum(dpois(y, lambda, log = TRUE)) 
-}
-```
-
-
-With these definitions we can actually use the same implementation of the
-M-H that we have used in the previous section.
- 
-```{r}
-# Number of iterations
-n.iter <- 40500
-
-# Simulations of the parameter
-lambda <- rep(NA, n.iter)
-
-# Initial value
-lambda[1] <- 30
-
-for(i in 2:n.iter) {
-  new.lambda <- rq(lambda[i - 1])
-  
-  # Log-Acceptance probability
-  logacc.prob <- loglik(y, new.lambda) + logprior(new.lambda) +
-    logdq(lambda[i - 1], new.lambda)
-  logacc.prob <- logacc.prob - loglik(y, lambda[i - 1]) - logprior(lambda[i - 1]) - 
-    logdq(new.lambda, lambda[i - 1])
-  logacc.prob <- min(0, logacc.prob)#0 = log(1)
-  
-  if(log(runif(1)) < logacc.prob) {
-    # Accept
-    lambda[i] <- new.lambda
-  } else {
-    # Reject
-    lambda[i] <- lambda[i - 1]
-  }
-}
-```
-
-
-The same burn-in and thinning as in the Beta-Binomial example will be
-used. Furthermore, summary statistics and plots will be computed now:
-
-
-```{r}
-# Remove burn-in
-lambda <- lambda[-c(1:500)]
-
-# Thinning
-lambda <- lambda[seq(1, length(lambda), by = 10)]
-
-# Summary statistics
-summary(lambda)
-
-par(mfrow = c(1, 2))
-plot(lambda, type = "l", main = "MCMC samples", ylab = expression(lambda))
-plot(density(lambda), main = "Posterior density", xlab = expression(lambda))
-```
-
-
-## Exercises
-
-
-### Performance of the proposal distribution
-
-Similarly to the exercises in the example on Binomial data, you will
-now assess the impact of the proposal distribution on the inference
-process.
-
-
-* Compute the effective sample size for the previous example.
-How is this related to the number of IS samples (`n_simulations`)?
-
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    ESS(ww)
-    n_simulations
-    ```
-    
-    </details>
-
-
-* Use a different proposal distribution and check how sampling weights,
-ESS and point estimates differ from those in the current example. For
-example, a $\text{Ga}(5,\, 0.1)$ will put a higher mass on values
-around 40, unlike the actual posterior distribution. What differences
-do you find with the example presented here using a uniform proposal
-distribution? Why do you think that these differences appear?
-    
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    n_simulations <- 10000
-    set.seed(12)
-    lambda_sim <- rgamma(n_simulations,5,0.1)
-    loglik_pois <- sapply(lambda_sim, function(LAMBDA) {
-      sum(dpois(GoTdata$Us, LAMBDA, log = TRUE))
-    })
-    log_ww <- loglik_pois + dgamma(lambda_sim, 0.01, 0.01, log = TRUE) - dgamma(lambda_sim, 5, 0.1, log=TRUE)
-    log_ww <- log_ww - max(log_ww)
-    ww <- exp(log_ww)
-    ww <- ww / sum(ww)
-    ```
-    
-    ```{r fig = TRUE}
-    hist(ww, xlab = "Importance weights")
-    ```
-    
-    ```{r}
-    post_mean <- sum(lambda_sim * ww)
-    post_mean
-    post_var <- sum(lambda_sim^2 * ww)- post_mean^2
-    post_var
-    ```
-    
-    ```{r fig = TRUE}
-    plot(density(lambda_sim, weights = ww, bw = 0.5), main = "Posterior density", xlim = c(10,40))
-    ```
-    
-    ```{r}
-    ESS(ww)
-    n_simulations
-    ```
-    
-    </details>
-
-
-### Sampling from the actual posterior
-
-Use the posterior distribution (which for this particular case is known
-in a closed form) as the proposal distribution.
-
-* What is the distribution of the importance weights now?
-    
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    n_simulations <- 10000
-    set.seed(12)
-    lambda_sim <- rgamma(n_simulations, sum(GoTdata$Us) + 0.01, length(GoTdata$Us) + 0.01)
-    loglik_pois <- sapply(lambda_sim, function(LAMBDA) {
-      sum(dpois(GoTdata$Us, LAMBDA, log = TRUE))
-    })
-    log_ww <- loglik_pois + dgamma(lambda_sim, 0.01, 0.01, log = TRUE) - dgamma(lambda_sim, sum(GoTdata$Us) + 0.01, length(GoTdata$Us) + 0.01, log=TRUE)
-    log_ww <- log_ww - max(log_ww)
-    ww <- exp(log_ww)
-    ww <- ww / sum(ww)
-    ```
-    
-    ```{r fig = TRUE}
-    hist(ww, xlab = "Importance weights")
-    ```
-    
-    ```{r}
-    (post_mean <- sum(lambda_sim * ww))
-    (post_var <- sum(lambda_sim^2 * ww)- post_mean^2)
-    ```
-    
-    ```{r fig = TRUE}
-    plot(density(lambda_sim, weights = ww, bw = 0.5) , main = "Posterior density")
-    ```
-    
-    </details>
-
-
-* Compute the effective sample size. How large is it? Why do you think this happens?
-
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    ESS(ww)
-    n_simulations
-    ```
-    
-    </details>
-
-
-
-### Non-conjugate prior
-
-Just as in the Binomial example, non-conjugate priors can be used
-for the $\lambda$ parameter. In this case, given that $\lambda$ is
-positive, care must be taken when choosing the prior. For this exercise,
-try to use a log-Normal prior with mean 4 and standard deviation 1.
-This will provide a prior of $\lambda$ as seen in the next plot:
-
-```{r}
-curve(
-  dlnorm(x, 4, 1),
-  xlim = c(0, 600), n = 1e3,
-  xlab = expression(lambda), ylab = NA,
-  yaxt = 'n', bty = 'n'
-)
-```
-
-Hence, with this prior small values of the average number of u's are
-given a higher prior density.
-
-Parameter estimates for a model with this prior can be obtained using IS and M-H algorithms.
-
-* Fit the model with the non-conjugate prior using the IS algorithm.
-    
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    n_simulations <- 10000
-    set.seed(1)
-    lambda_sim <- rlnorm(n_simulations, 3, 0.5)
-    ```
-    
-    ```{r}
-    # Log-Likelihood (for each value of lambda_sim)
-    loglik_pois <- sapply(lambda_sim, function(LAMBDA) {
-      sum(dpois(GoTdata$Us, LAMBDA, log = TRUE))
-    })
-
-    # Log-weights: log-lik + log-prior - log-proposal_distribution
-    log_ww <- loglik_pois + dlnorm(lambda_sim, 4, 1, log = TRUE) - dlnorm(lambda_sim, 3, 0.5, log = TRUE)
-
-    # Re-scale weights to sum up to one
-    log_ww <- log_ww - max(log_ww)
-    ww <- exp(log_ww)
-    ww <- ww / sum(ww)
-    ```
-    
-    ```{r fig = TRUE}
-    hist(ww, xlab = "Importance weights")
-    ```
-    
-    ```{r}
-    # Posterior mean
-    (post_mean <- sum(lambda_sim * ww))
-    # Posterior variance
-    (post_var <- sum(lambda_sim^2 * ww)- post_mean^2)
-    ```
-    
-    ```{r fig = TRUE}
-    plot(density(lambda_sim, weights = ww, bw = 0.5) , main = "Posterior density", xlim = c(10,40))
-    ```
-    
-    ```{r}
-    ESS <- function(ww){
-      (sum(ww)^2)/sum(ww^2)
-    }
-    ESS(ww)
-    n_simulations
-    ```
-    
-    </details>
-    
-* Fit the model with the non-conjugate prior using the M-H algorithm.
-    
-    <details><summary>Solution</summary>
-
-    ```{r}
-    # Proposal distribution: sampling
-    rq <- function(lambda) {
-      rlnorm(1, meanlog = log(lambda), sdlog = sqrt(1 / 100))
-    }
-
-    # Proposal distribution: log-density
-    logdq <- function(new.lambda, lambda) {
-      dlnorm(new.lambda, meanlog = log(lambda), sdlog = sqrt(1 / 100), log = TRUE)
-    }
-
-    # Prior distribution: Log Normal(4,1)
-    logprior <- function(lambda) {
-      dlnorm(lambda_sim, 4, 1, log = TRUE)
-    }
-
-    # LogLikelihood
-    loglik <- function(y, lambda) {
-      res <- sum(dpois(y, lambda, log = TRUE)) 
-    }
-    ```
-    
-    ```{r}
-    # Number of iterations
-    n.iter <- 40500
-
-    # Simulations of the parameter
-    lambda <- rep(NA, n.iter)
-
-    # Initial value
-    lambda[1] <- 30
-
-    for(i in 2:n.iter) {
-      new.lambda <- rq(lambda[i - 1])
-  
-      # Log-Acceptance probability
-      logacc.prob <- loglik(y, new.lambda) + logprior(new.lambda) +
-        logdq(lambda[i - 1], new.lambda)
-      logacc.prob <- logacc.prob - loglik(y, lambda[i - 1]) - logprior(lambda[i - 1]) - logdq(new.lambda, lambda[i - 1])
-      logacc.prob <- min(0, logacc.prob)#0 = log(1)
-  
-      if(log(runif(1)) < logacc.prob) {
-        # Accept
-        lambda[i] <- new.lambda
-      } else {
-        # Reject
-        lambda[i] <- lambda[i - 1]
-      }
-    }
-    ```
-    
-    ```{r}
-    # Remove burn-in
-    lambda <- lambda[-c(1:500)]
-
-    # Thinning
-    lambda <- lambda[seq(1, length(lambda), by = 10)]
-
-    # Summary statistics
-    summary(lambda)
-    ```
-    
-    ```{r fig = TRUE}
-    par(mfrow = c(1, 2))
-    plot(lambda, type = "l", main = "MCMC samples", ylab = expression(lambda))
-    plot(density(lambda), main = "Posterior density", xlab = expression(lambda))
-    ```
-    
-    </details>
-    
-# Gibbs Sampling
-
-As we have seen in the theory session, Gibbs Sampling is an MCMC method
-which allows us to estimate one parameter at a time. This is very useful
-for models which have lots of parameters, as in a sense it reduces a
-very large multidimensional inference problem to a set of single
-dimension problems.
-
-To recap, in order to generate a random sample from the joint density
-$g\left(\mathbf{\theta}\right)=
-g\left(\theta_1,\theta_2,\ldots,\theta_D\right)$ for a model with
-$d$ parameters, we use the following algorithm:
-
-1. Start with an initial set
-$\mathbf{\theta}^{(0)}=
-\left(\theta_1^{(0)},\theta_2^{(0)},\ldots,\theta_D^{(0)}\right)$
-2. Generate $\theta_1^{(1)}$ from the conditional distribution 
-$\theta_1 \mid \left(\theta_2^{(0)},\ldots,\theta_D^{(0)}\right)$
-3. Generate $\theta_2^{(1)}$ from the conditional distribution
-$\theta_2 \mid \left(\theta_1^{(1)},\theta_3^{(0)},\ldots,\theta_D^{(0)}\right)$
-4.  $\quad\quad\cdots$
-5. Generate $\theta_D^{(1)}$ from the conditional distribution
-$\theta_D \mid \left(\theta_1^{(1)},\theta_2^{(1)},\ldots,\theta_{D-1}^{(1)}\right)$
-6. Iterate from Step 2.
-
-As with Metropolis-Hastings, in Gibbs Sampling we typically discard the
-initial simulations (the burn-in period), reducing the dependence on the 
-initial set of parameter values.
-
-As with the other MCMC algorithms, the resulting simulations approximate a
-random sample from the posterior distribution.
-
-# Example: Simple Linear Regression
-
-We will illustrate the use of Gibbs Sampling on a simple linear
-regression model. Recall that we saw yesterday that we can obtain an
-analytical solution for a Bayesian linear regression, but that more
-complex models require a simulation approach.
-
-The approach we take here for Gibbs Sampling on a simple linear
-regression model is very easily generalised to more complex models,
-the key is that whatever your model looks like, you update one
-parameter at a time, using the conditional distribution of that
-parameter given the current values of all the other parameters.
-
-The simple linear regression model we will analyse here is a reduced
-version of the general linear regression model we saw yesterday:
-$$
-Y_i = \beta_0+\beta_1x_i+\epsilon_i
-$$
-for response variable $\mathbf{Y}=\left(Y_1,Y_2,\ldots,Y_n\right)$,
-explanatory variable $\mathbf{x}=\left(x_1,x_2,\ldots,x_n\right)$ and
-residual vector $\mathbf{\epsilon}=
-\left(\epsilon_1,\epsilon_2,\ldots,\epsilon_n\right)$ for a sample of
-size $n$, where $\beta_0$ is the regression intercept,
-$\beta_1$ is the regression slope, and where the
-$\epsilon_i$ are independent with
-$\epsilon_i\sim \textrm{N}\left(0,\sigma^2\right)$ $\forall i=1,2,\ldots,n$. For
-convenience, we refer to the combined set of $\mathbf{Y}$ and
-$\mathbf{x}$ data as $\mathcal{D}$. We also define
-$\hat{\mathbf{y}}$ to be the fitted response vector (i.e., 
-$\hat{y}_i = \beta_0 + \beta_1 x_i$ from the regression equation) using the
-current values of the parameters $\beta_0$, $\beta_1$ and precision $\tau = 1$
-(remember that $\tau=\frac{1}{\sigma^2}$) from the Gibbs Sampling simulations.
-
-For Bayesian inference, it is simpler to work with precisions $\tau$
-rather than with variances $\sigma^2$. Given priors
-$$
-\begin{align*}
-\pi(\tau) &= \textrm{Ga}\left(\alpha, \beta\right), \\
-\pi(\beta_0) &= \textrm{N}\left(\mu_{\beta_0}, \tau_{\beta_0}\right), \quad\textrm{and} \\
-\pi(\beta_1) &= \textrm{N}\left(\mu_{\beta_1}, \tau_{\beta_1}\right)
-\end{align*}
-$$
-then by combining with the likelihood we can derive the full conditional distributions for each parameter. For $\tau$, we have
-$$
-\begin{align*}
-\pi\left(\tau|\mathcal{D}, \beta_0, \beta_1\right) &\propto
-\pi(\tau)\prod_{i=1}^nL\left(y_i|\hat{y_i}\right), \\
-&= \tau^{\alpha-1}e^{-\beta\tau}
-\tau^{\frac{n}{2}}\exp\left\{-
-\frac{\tau}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right\}, \\
-&= \tau^{\alpha-1 + \frac{n}{2}}\exp\left\{-\beta\tau-
-\frac{\tau}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right\},\\
-\textrm{so} \quad \tau|\mathcal{D}, \beta_0, \beta_1 &\sim
-\textrm{Ga}\left(\alpha+\frac{n}{2},
-\beta +\frac{1}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right).
-\end{align*}
-$$
-For $\beta_0$ we will need to expand $\hat{y}_i=\beta_0+\beta_1x_i$,
-and then we have
-$$
-\begin{align*}
-p(\beta_0|\mathcal{D}, \beta_1, \tau) &\propto
-\pi(\beta_0)\prod_{i=1}^nL\left(y_i|\hat{y_i}\right), \\
-&= \tau_{\beta_0}^{\frac{1}{2}}\exp\left\{-
-\frac{\tau_{\beta_0}}{2}(\beta_0-\mu_{\beta_0})^2\right\}\tau^\frac{n}{2}
-\exp\left\{-\frac{\tau}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right\}, \\
-&\propto \exp\left\{-\frac{\tau_{\beta_0}}{2}(\beta_0-\mu_{\beta_0})^2-
-\frac{\tau}{2}\sum_{i=1}^n(y_i-\beta_0-x_i \beta_1)^2\right\}, \\
-&= \exp \left\{ -\frac{1}{2}\left(\beta_0^2(\tau_{\beta_0} + n\tau) +
-\beta_0\left(-2\tau_{\beta_0}\mu_{\beta_0} - 2\tau\sum_{i=1}^n 
-(y_i - x_i\beta_1)\right) \right) + C \right\}, \\
-\textrm{so} \quad \beta_0|\mathcal{D}, \beta_1, \tau &\sim
-\mathcal{N}\left((\tau_{\beta_0} + n\tau)^{-1}
-\left(\tau_{\beta_0}\mu_{\beta_0} + \tau\sum_{i=1}^n (y_i - x_i\beta_1)\right),
-\tau_{\beta_0} + n\tau\right).
-\end{align*}
-$$
-
-In the above, $C$ represents a quantity that is constant with respect to $\beta_0$ and hence can be ignored. Finally, for $\beta_1$ we have
-$$
-\begin{align*}
-p(\beta_1|\mathcal{D}, \beta_0, \tau) &\propto
-\pi(\beta_1)\prod_{i=1}^nL\left(y_i|\hat{y_i}\right), \\
-&= \tau_{\beta_1}^{\frac{1}{2}}\exp\left\{-
-\frac{\tau_{\beta_1}}{2}(\beta_1-\mu_{\beta_1})^2\right\}\tau^\frac{n}{2}
-\exp\left\{-\frac{\tau}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right\}, \\
-&\propto \exp\left\{-\frac{\tau_{\beta_1}}{2}(\beta_1-\mu_{\beta_1})^2-
-\frac{\tau}{2}\sum_{i=1}^n(y_i-\beta_0-x_i \beta_1)^2\right\}, \\
-&= \exp \left\{ -\frac{1}{2}\left(\beta_1^2\left(\tau_{\beta_1} + 
-\tau\sum_{i=1}^n x_i^2\right) +
-\beta_1\left(-2\tau_{\beta_1}\mu_{\beta_1} - 2\tau\sum_{i=1}^n 
-(y_i - \beta_0)x_i\right) \right) + C \right\}, \\
-\textrm{so} \quad \beta_1|\mathcal{D}, \beta_0, \tau
-&\sim \textrm{N}\left((\tau_{\beta_1} + \tau\sum_{i=1}^n x_i^2)^{-1}
-\left(\tau_{\beta_1}\mu_{\beta_1} + \tau\sum_{i=1}^n (y_i - \beta_0)x_i\right),
-\tau_{\beta_1} + \tau\sum_{i=1}^n x_i^2\right).
-\end{align*}
-$$
-This gives us an easy way to run Gibbs Sampling for linear regression; we have
-standard distributions as full conditionals and there are standard
-functions in R to obtain the simulations. We shall do this now for an
-example in ecology, looking at the relationship between water pollution and
-mayfly size - the data come from the book *Statistics for Ecologists Using R and Excel 2nd edition* by Mark Gardener (ISBN 9781784271398), see [the publisher's webpage](https://pelagicpublishing.com/products/statistics-for-ecologists-using-r-and-excel-gardener-2nd-edition).
-
-The data are as follows:
-
-- `length` - the length of a mayfly in mm;
-
-- `BOD` - biological oxygen demand in mg of oxygen per litre, effectively a
-measure of organic pollution (since more organic pollution requires more oxygen
-to break it down).
-
-The data can be read into R:
-```{r}
-# Read in data
-BOD <- c(200,180,135,120,110,120,95,168,180,195,158,145,140,145,165,187,
-         190,157,90,235,200,55,87,97,95)
-mayfly.length <- c(20,21,22,23,21,20,19,16,15,14,21,21,21,20,19,18,17,19,21,13,
-            16,25,24,23,22)
-# Create data frame for the Gibbs Sampling, needs x and y
-Data <- data.frame(x=BOD,y=mayfly.length)
-```
-
-We provide here some simple functions for running the analysis by Gibbs
-Sampling, using the full conditionals derived above. The functions assume
-you have a data frame with two columns, the response `y` and the covariate
-`x`.
-
-```{r}
-# Function to update tau, the precision
-sample_tau <- function(data, beta0, beta1, alphatau, betatau) {
-  rgamma(1,
-    shape = alphatau+ nrow(data) / 2,
-    rate = betatau+ 0.5 * sum((data$y - (beta0 + data$x*beta1))^2)
-  )
-}
-
-# Function to update beta0, the regression intercept
-sample_beta0 <- function(data, beta1, tau, mu0, tau0) {
-  prec <- tau0 + tau * nrow(data)
-  mean <- (tau0*mu0 + tau * sum(data$y - data$x*beta1)) / prec
-  rnorm(1, mean = mean, sd = 1 / sqrt(prec))
-}
-
-# Function to update beta1, the regression slope
-sample_beta1 <- function(data, beta0, tau, mu1, tau1) {
-  prec <- tau1 + tau * sum(data$x * data$x)
-  mean <- (tau1*mu1 + tau * sum((data$y - beta0) * data$x)) / prec
-  rnorm(1, mean = mean, sd = 1 / sqrt(prec))
-}
-
-# Function to run the Gibbs Sampling, where you specify the number of
-# simulations `m`, the starting values for each of the three regression
-# parameters (`tau_start` etc), and the parameters for the prior
-# distributions of tau, beta0 and beta1
-gibbs_sample <- function(data,
-                         tau_start,
-                         beta0_start,
-                         beta1_start,
-                         m,
-                         alpha_tau,
-                         beta_tau,
-                         mu_beta0,
-                         tau_beta0,
-                         mu_beta1,
-                         tau_beta1) {
-  tau <- numeric(m)
-  beta0 <- numeric(m)
-  beta1 <- numeric(m)
-  tau[1] <- tau_start
-  beta0[1] <- beta0_start
-  beta1[1] <- beta1_start
-  
-  for (i in 2:m) {
-    tau[i] <-
-      sample_tau(data, beta0[i-1], beta1[i-1], alpha_tau, beta_tau)
-    beta0[i] <-
-      sample_beta0(data, beta1[i-1], tau[i], mu_beta0, tau_beta0)
-    beta1[i] <-
-	sample_beta1(data, beta0[i], tau[i], mu_beta1, tau_beta1)
-  }
-  
-  Iteration <- 1:m
-  data.frame(Iteration,beta0,beta1,tau)
-}
-```
-
-## Exercises
-
-We will use Gibbs Sampling to fit a Bayesian Linear Regression model to the
-mayfly data. We will use the following prior distributions for the regression
-parameters:
-$$
-\begin{align*}
-\pi(\tau) &= \textrm{Ga}\left(1, 1\right), \\
-\pi(\beta_0) &= \textrm{N}\left(0, 0.0001\right), \quad\textrm{and} \\
-\pi(\beta_1) &= \textrm{N}\left(0, 0.0001\right).
-\end{align*}
-$$
-We will set the initial values of all parameters to 1, i.e. $\beta_0^{(0)}=\beta_1^{(0)}=\tau^{(0)}=1$.
-
-### Data exploration
-
-- Investigate the data to see whether a linear regression model would be
-sensible. [Hint: a scatterplot and a correlation coefficient could be helpful.]
-
-    <details><summary>Solution</summary>
-
-    ```{r fig = TRUE}
-    # Scatterplot
-    plot(BOD,mayfly.length)
-    # Correlation with hypothesis test
-    cor.test(BOD,mayfly.length)
-    ``` 
-
-    </details>
-    
-- Run a frequentist simple linear regression using the `lm` function in R; this
-will be useful for comparison with our Bayesian analysis.
-
-    <details><summary>Solution</summary>
-
-    ```{r}
-    # Linear Regression using lm()
-    linreg <- lm(mayfly.length ~ BOD, data = Data)
-    summary(linreg)
-    ``` 
-
-    </details>
-
-### Running the Gibbs Sampler
-
-- Use the code above to fit a Bayesian simple linear regression using `length`
-as the response variable. Ensure you have a burn-in period so that the initial
-simulations are discarded. Use the output from `gibbs_sample` to estimate the
-regression parameters (with uncertainty measures). Also plot the estimates of
-the posterior distributions.
-
-    <details><summary>Solution</summary>
-
-    ```{r fig = TRUE}
-    # Bayesian Linear Regression using a Gibbs Sampler
-    
-    # Set the number of iterations of the Gibbs Sampler - including how many to
-    # discard as the burn-in
-    m.burnin <- 500
-    m.keep <- 1000
-    m <- m.burnin + m.keep
-    
-    # Obtain the samples
-    results1 <- gibbs_sample(Data,1,1,1,m,1,1,0,0.0001,0,0.0001)
-    # Remove the burn-in
-    results1 <- results1[-(1:m.burnin),]
-    # Add variance and standard deviation columns
-    results1$Variance <- 1/results1$tau
-    results1$StdDev <- sqrt(results1$Variance)
-    # Estimates are the column means
-    apply(results1[,-1],2,mean)
-    # Also look at uncertainty
-    apply(results1[,-1],2,sd)
-    apply(results1[,-1],2,quantile,probs=c(0.025,0.975))
-    
-    # Kernel density plots for beta0, beta1 and the residual stanard deviation
-    plot(density(results1$beta0, bw = 0.5),
-         main = "Posterior density for beta0")
-    plot(density(results1$beta1, bw = 0.25),
-         main = "Posterior density for beta1")
-    plot(density(results1$StdDev, bw = 0.075),
-         main = "Posterior density for StdDev")
-    ``` 
-
-    </details>
-
-- Use the function `ts.plot()` to view the autocorrelation in the Gibbs
-sampling simulation chain. Is there any visual evidence of autocorrelation, or
-do the samples look independent?
-
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE}
-    # Plot sampled series
-    ts.plot(results1$beta0,ylab="beta0",xlab="Iteration")
-    ts.plot(results1$beta1,ylab="beta1",xlab="Iteration")
-    ts.plot(results1$StdDev,ylab="sigma",xlab="Iteration")
-    ```
-
-    </details>
-
-- How do the results compare with the frequentist output?
-
-### Reducing the autocorrelation by mean-centering the covariate
-
-- One method for reducing the autocorrelation in the sampling chains for
-regression parameters is to mean centre the covariate(s); this works because
-it reduces any dependence between the regression intercept and slope(s). Do
-this for the current example, noting that you will need to make a correction
-on the estimate of the regression intercept afterwards.
-
-    <details><summary>Solution</summary>
-
-    ```{r fig = TRUE}
-    # Mean-centre the x covariate
-    DataC <- Data
-    meanx <- mean(DataC$x)
-    DataC$x <- DataC$x - meanx
-    
-    # Bayesian Linear Regression using a Gibbs Sampler
-    
-    # Set the number of iterations of the Gibbs Sampler - including how many to
-    # discard as the burn-in
-    m.burnin <- 500
-    m.keep <- 1000
-    m <- m.burnin + m.keep
-    
-    # Obtain the samples
-    results2 <- gibbs_sample(DataC, 1, 1, 1, m, 1, 1, 0, 0.0001, 0, 0.0001)
-    # Remove the burn-in
-    results2 <- results2[-(1:m.burnin), ]
-    
-    # Correct the effect of the mean-centering on the intercept
-    results2$beta0 <- results2$beta0 - meanx * results2$beta1
-    
-    # Add variance and standard deviation columns
-    results2$Variance <- 1 / results2$tau
-    results2$StdDev <- sqrt(results2$Variance)
-    # Estimates are the column means
-    apply(results2[, -1], 2, mean)
-    # Also look at uncertainty
-    apply(results2[, -1], 2, sd)
-    apply(results2[, -1], 2, quantile, probs = c(0.025, 0.975))
-    
-    # Kernel density plots for beta0, beta1 and the residual standard
-    # deviation
-    plot(density(results2$beta0, bw = 0.5),
-         main = "Posterior density for beta0")
-    plot(density(results2$beta1, bw = 0.25),
-         main = "Posterior density for beta1")
-    plot(density(results2$StdDev, bw = 0.075),
-         main = "Posterior density for StdDev")
-    
-    # Plot sampled series
-    ts.plot(results2$beta0, ylab = "beta0", xlab = "Iteration")
-    ts.plot(results2$beta1, ylab = "beta1", xlab = "Iteration")
-    ts.plot(results2$StdDev, ylab = "sigma", xlab = "Iteration")
-    ```
-    
-    </details>
-
diff --git a/vignettes/p9.Rmd b/vignettes/p9.Rmd
deleted file mode 100644
index b454afb..0000000
--- a/vignettes/p9.Rmd
+++ /dev/null
@@ -1,464 +0,0 @@
----
-title: 'Practical 6: Software and GLMs'
-author: "VIBASS"
-output:
-  html_vignette:
-    fig_caption: yes
-    number_sections: yes
-    toc: yes
-    fig_width: 6
-    fig_height: 4
-vignette: >
-  %\VignetteIndexEntry{Practical 6: Software and GLMs}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
----
-
-```{r setup, include=FALSE}
-knitr::opts_chunk$set(
-  echo = TRUE,
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-# Software for Bayesian Statistical Analysis
-
-So far, simple Bayesian models with conjugate priors have been
-considered.  As explained in previous practicals, when the posterior
-distribution is not available in closed form, MCMC algorithms such as the
-Metropolis-Hastings or
-Gibbs Sampling can be used to obtain samples from it.
-
-In general, posterior distributions are seldom available in closed form and
-implementing MCMC algorithms for complex models can be technically difficult
-and very time-consuming.
-
-For this reason, in this Practical we start by looking at a number of `R`
-packages to fit Bayesian statistical models. These packages will equip us
-with tools which can be used to deal with more complex models efficiently,
-without us having to do a lot of extra coding ourselves. Fitting Bayesian
-models in `R` will then be much like fitting non-Bayesian models, using
-model-fitting functions at the command line, and using standard syntax for
-model specification.
-
-## BayesX and INLA
-
-In particular, the following two software packages will be considered:
-
-* `BayesX`
-
-* `INLA`
-
-`BayesX` (http://www.bayesx.org/) implements MCMC methods to obtain samples
-from the joint posterior and is conveniently accessed from R via the package
-`R2BayesX`.
-
-`INLA` (https://www.r-inla.org/) is based on producing (accurate)
-approximations to the marginal posterior distributions of the model parameters.
-Although this can be enough most of the time, making multivariate inference
-with `INLA` can be difficult or impossible. However, in many cases this is
-not needed and `INLA` can fit some classes of models in a fraction of the
-time it takes with MCMC.
-
-Both `R2BayesX` and `INLA` have a very simple interface to define models
-using a `formula` (in the same way as with `glm()` and `gam()` functions).
-
-While `R2BayesX` can be installed from CRAN, `INLA` is not on CRAN and
-needs to be installed from a specific repository.
-
-## Other Bayesian Software
-
-* Package `MCMCpack` in R contains functions such as `MCMClogit()`, `MCMCPoisson()` and
-`MCMCprobit()` for fitting specific kinds of models.
-* A classic MCMC program is `BUGS`, (Bayesian Analysis using Gibbs Sampling)
-described in Lunn et al. (2000):
-[http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml](http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml).
-BUGS can be used through graphical interfaces `WinBUGS` and `OpenBUGS`. Both of
-these packages can be called from within R using packages `R2WinBUGS` and
-`R2OpenBUGS`.
-* `JAGS`, which stands for “just another Gibbs sampler”. Can also be called from
-R using package `r2jags`.
-* The `NIMBLE` package extends `BUGS` and implements MCMC and other methods
-for Bayesian inference. You can get it from https://r-nimble.org, and is best
-run directly from R.
-* The `Stan` software implements Hamiltonian Monte Carlo and other methods for
-fit hierarchical Bayesian models. It is available from https://mc-stan.org.
-
-# Bayesian Logistic Regression
-
-## Model Formulation
-
-To summarise the model formulation presented in the lecture, given a response
-variable $Y_i$ representing the count of a number of successes from a given
-number of trials $n_i$ with success probability $\theta_i$, we have
-
-* $(Y_i \mid \boldsymbol \theta_i) \sim\mbox{Bi}(n_i, \theta_i),\,\, i.i.d.\,\,, i=1, \ldots, m$
-\begin{align*}
- \mbox{logit}(\theta_i) & =\eta_i \nonumber\\
-\eta_{i} & =\beta_0+\beta_1 x_{i1}+\ldots+\beta_p x_{ip}=\boldsymbol x_i\boldsymbol \beta\nonumber
-\end{align*}
-assuming the logit link function and with linear predictor $\eta_{i}$.
-
-## Example: Fake News
-
-The `fake_news` data set in the `bayesrules` package in `R` contains
-information about 150 news articles, some real news and some fake news.
-
-In this example, we will look at trying to predict whether an article of news
-is fake or not given three explanatory variables.
-
-We can use the following code to extract the variables we want from the
-data set:
-
-```{r}
-library(bayesrules)
-fakenews <- fake_news[,c("type","title_has_excl","title_words","negative")]
-```
-
-The response variable `type` takes values `fake` or `real`, which should be
-self-explanatory. The three explanatory variables are:
-
-* `title_has_excl`, whether or not the article contains an excalamation mark (values `TRUE` or `FALSE`);
-
-* `title_words`, the number of words in the title (a positive integer); and
-
-* `negative`, a sentiment rating, recorded on a continuous scale.
-
-In the exercise to follow, we will examine whether the chance of an article
-being fake news is related to the three covariates here.
-
-## Fitting Bayesian Logistic Regression Models
-
-`BayesX` makes inference via MCMC, via the `R2BayesX` package which as noted 
-makes the syntax for model fitting very similar to that for fitting
-non-Bayesian models using `glm()` in R. If you do not yet have it installed,
-you can install it in the usual way from CRAN.
-
-The package must be loaded into R:
-```{r}
-library(R2BayesX)
-```
-
-The syntax for fitting a Bayesian Logistic Regression Model with one response
-variable and three explanatory variables will be like so:
-
-```{r eval=FALSE}
-model1 <- bayesx(formula = y ~ x1 + x2 + x3,
-                 data = data.set,
-                 family = "binomial")
-```
-
-Alternatively, we can obtain a Bayesian model fit without using MCMC with the
-INLA software, implemented in R via the `INLA` package. If you do not have this
-package installed already, as it is not on CRAN you will need to install it via
-
-```{r eval=FALSE}
-install.packages("INLA",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)
-```
-
-After this, the package can be loaded into R:
-```{r}
-library(INLA)
-```
-
-
-## Model Fitting
-
-Note that the variable `title_has_excl` will need to be either replaced by or
-converted to a factor, for example
-
-```{r}
-fakenews$titlehasexcl <- as.factor(fakenews$title_has_excl)
-```
-
-Functions `summary` and `confint` produce a summary (including parameter
-estimates etc) and confidence intervals for the parameters, respectively.
-
-In order to be able to obtain output plots from BayesX, it seems that we need
-to create a new version of the response variable of type logical:
-
-```{r}
-fakenews$typeFAKE <- fakenews$type == "fake"
-```
-
-## Exercises
-
-* Perform an exploratory assessment of the fake news data set, in particular
-looking at the possible relationships between the explanatory variables
-and the fake/real response variable `typeFAKE`. You may wish to use the R 
-function `boxplot()` here.
-
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE}
-    # Is there a link between the fakeness and whether the title has an exclamation mark?
-    table(fakenews$title_has_excl, fakenews$typeFAKE)
-    # For the quantitative variables, look at boxplots on fake vs real
-    boxplot(fakenews$title_words ~ fakenews$typeFAKE)
-    boxplot(fakenews$negative ~ fakenews$typeFAKE)
-
-    ```
-    
-    </details>
-    
-* Fit a Bayesian model in BayesX using the fake news `typeFAKE` variable as
-response and the others as covariates. Examine the output; does the model fit
-well, and is there any evidence that any of the explanatory variables are
-associated with changes in probability of an article being fake or not?
-
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE}
-    # Produce the BayesX output
-    bayesx.output <- bayesx(formula = typeFAKE ~ titlehasexcl + title_words + negative,
-    data = fakenews,
-    family = "binomial",
-    method = "MCMC",
-    iter = 15000,
-    burnin = 5000)
-    summary(bayesx.output)
-    confint(bayesx.output)
-    ```
-    
-    </details>
-
-* Produce plots of the MCMC sample traces and the estimated posterior
-distributions for the model parameters. Does it seem like convergence has been
-achieved?
-
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE, fig.width = 5, fig.height = 10}
-    # Traces can be obtained separately
-    plot(bayesx.output,which = "coef-samples")
-    ```
-
-    ```{r fig = TRUE} 
-    # And the density plots one-by-one
-    par(mfrow=c(2,2))
-    plot(density(samples(bayesx.output)[,"titlehasexclTRUE"]),main="Title Has Excl")
-    plot(density(samples(bayesx.output)[,"title_words"]),main="Title Words")
-    plot(density(samples(bayesx.output)[,"negative"]),main="Negative")
-    ```
-    
-    </details>
-    
-* Fit the Bayesian model without MCMC using `INLA`; note that the summary output
-provides credible intervals for each parameter to help us make inference.
-Also, in INLA a Binomial response needs to be entered as type integer, so we
-need another conversion:
-    ```{r}
-    fakenews$typeFAKE.int <- as.integer(fakenews$typeFAKE)
-    ```
-    
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE}
-    # Fit model - note similarity with bayesx syntax
-    inla.output <- inla(formula = typeFAKE.int ~ titlehasexcl + title_words + negative,
-      data = fakenews,
-      family = "binomial")
-    # Summarise output
-    summary(inla.output)
-    ```
-    </details>
-
-* Fit a non-Bayesian model using `glm()` for comparison. How do the model fits
-compare?
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE}
-    # Fit model - note similarity with bayesx syntax
-    glm.output <- glm(formula = typeFAKE ~ titlehasexcl + title_words + negative,
-      data = fakenews,
-      family = "binomial")
-    # Summarise output
-    summary(glm.output)
-    # Perform ANOVA on each variable in turn
-    drop1(glm.output,test="Chisq")
-    ```
-    
-    </details>
-    
-# Bayesian Poisson Regression
-
-## Model Formulation
-
-To summarise the model formulation presented in the lecture, given a response
-variable $Y_i$ representing the counts occurring from a process with mean 
-parameter $\lambda_i$:
-
-* $(Y_i \mid \boldsymbol \lambda_i) \sim\mbox{Po}(\lambda_i),\,\, i.i.d.\,\,, i=1, \ldots, n$
-\begin{align*}
- \mbox{log}(\lambda_i) & =\eta_i \nonumber\\
-\eta_{i} & =\beta_0+\beta_1 x_{i1}+\ldots+\beta_p x_{ip}=\boldsymbol x_i\boldsymbol \beta\nonumber
-\end{align*}
-assuming the log link function and with linear predictor $\eta_{i}$.
-
-## Example: Emergency Room Complaints
-
-For this example we will use the `esdcomp` data set, which is available in the
-`faraway` package. This data set records complaints about emergency room
-doctors.  In particular, data was recorded on 44 doctors working in an
-emergency service at a hospital to study the factors affecting the number of
-complaints received.
-
-The response variable that we will use is `complaints`, an integer count of the
-number of complaints received. It is expected that the number of complaints will
-scale by the number of visits (contained in the `visits` column), so we are
-modelling the rate of complaints per visit - thus we will need to include a new
-variable `log.visits` as an offset.
-
-The three explanatory variables we will use in the analysis are:
-
-* `residency`, whether or not the doctor is still in residency training (values
-`N` or `Y`);
-
-* `gender`, the gender of the doctor (values `F` or `M`); and
-
-* `revenue`, dollars per hour earned by the doctor, recorded as an integer.
-
-Our simple aim here is to assess whether the seniority, gender or income of the
-doctor is linked with the rate of complaints against that doctor.
-
-We can use the following code to extract the data we want without having to load
-the whole package:
-
-```{r}
-esdcomp <- faraway::esdcomp    
-```
-
-## Fitting Bayesian Poisson Regression Models
-
-Again we can use `BayesX` and `INLA` to fit this form of Bayesian generalised
-linear model.
-
-If not loaded already, the packages must be loaded into R:
-```{r echo=FALSE}
-library(R2BayesX)
-library(INLA)
-```
-
-In BayesX, the syntax for fitting a Bayesian Poisson Regression Model with one
-response variable, three explanatory variables and an offset will be like so:
-
-```{r eval=FALSE}
-model1 <- bayesx(formula = y~x1+x2+x3+offset(w),
-                 data = data.set,
-                 family="poisson")
-```
-
-As noted above we need to include an offset in this analysis; since
-for a Poisson GLM we will be using a log() link function by default, we must
-compute the log of the number of visits and include that in the data set
-`esdcomp`:
-
-```{r}
-esdcomp$log.visits <- log(esdcomp$visits)    
-```
-
-The offset term in the model is then written
-
-`offset(log.visits)`
-
-in the call to `bayesx`.
-
-## Exercises
-
-* Perform an exploratory assessment of the emergency room complaints data set,
-particularly how the response variable `complaints` varies with the proposed
-explanatory variables relative to the number of visits. To do this, create
-another variable which is the ratio of `complaints` to `visits`.
-
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE}
-    # Compute the ratio
-    esdcomp$ratio <- esdcomp$complaints / esdcomp$visits
-    # Plot the link with revenue
-    plot(esdcomp$revenue,esdcomp$ratio)
-    # Use boxplots against residency and gender
-    boxplot(esdcomp$ratio ~ esdcomp$residency)
-    boxplot(esdcomp$ratio ~ esdcomp$gender)
-    ```
-    
-    </details>
-
-* Fit a Bayesian model in BayesX using the `complaints` variable as Poisson
-response and the others as covariates. Examine the output; does the model fit
-well, and is there any evidence that any of the explanatory variables are
-associated with the rate of complaints?
-
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE}
-    # Fit model - note similarity with bayesx syntax
-    esdcomp$log.visits <- log(esdcomp$visits)
-    bayesx.output <- bayesx(formula = complaints ~ offset(log.visits) + residency + gender + revenue,
-      data = esdcomp,
-      family = "poisson")
-    # Summarise output
-    summary(bayesx.output)
-    ```
-    
-    </details>
-
-
-* Produce plots of the MCMC sample traces and the estimated posterior
-distributions for the model parameters. Does it seem like convergence has been
-achieved?
-
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE, fig.width = 5, fig.height = 10}
-    # An overall plot of sample traces and density estimates
-  #  plot(samples(bayesx.output))
-    # Traces can be obtained separately
-    plot(bayesx.output,which = "coef-samples")
-    ```
-
-    ```{r fig = TRUE}
-    # And the density plots one-by-one
-    par(mfrow =c(2, 2))
-    plot(density(samples(bayesx.output)[, "residencyY"]), main = "Residency")
-    plot(density(samples(bayesx.output)[, "genderM"]), main = "Gender")
-    plot(density(samples(bayesx.output)[, "revenue"]), main = "Revenue")
-    ```
-    
-    </details>
-    
-* Fit the Bayesian model without MCMC using `INLA`; note that the summary output
-provides credible intervals for each parameter to help us make inference.
-
-    <details><summary>Solution</summary>
-    
-    ```{r}
-    # Fit model - note similarity with bayesx syntax
-    inla.output <- inla(formula = complaints ~ offset(log.visits) + residency + gender + revenue,
-      data = esdcomp,
-      family = "poisson")
-    # Summarise output
-    summary(inla.output)
-    ```
-    
-    </details>
-
-* Fit a non-Bayesian model using `glm()` for comparison. How do the model fits
-compare?
-    <details><summary>Solution</summary>
-    
-    ```{r fig = TRUE}
-    # Fit model - note similarity with bayesx syntax
-    esdcomp$log.visits <- log(esdcomp$visits)
-    glm.output <- glm(formula = complaints ~ offset(log.visits) + residency + gender + revenue,
-      data = esdcomp,
-      family = "poisson")
-    # Summarise output
-    summary(glm.output)
-    # Perform ANOVA on each variable in turn
-    drop1(glm.output, test = "Chisq")
-    ```
-    
-    </details>
-