diff --git a/02_about.md b/02_about.md
index 87021aa..88d00f0 100644
--- a/02_about.md
+++ b/02_about.md
@@ -2,24 +2,23 @@
These lecture notes cover the course which will be taught during three weeks from 25 March to 12 April 2024 to a MSc ["AI for Science"](https://ai.aims.ac.za/) cohort at the [African Institute for Mathematical Sciences (AIMS)](https://aims.ac.za/), South Africa. After the course, I plan to keep improving the materials since they will be helpful for future stundents and collaborators.
-If you notice any typos, mistakes or inconsistencies in these course notes, please email them to `elizaveta [dot] p [dot] [insert my surname] [at] gmail [dot] com`.
+If you notice any typos, mistakes or inconsistencies, please email them to `elizaveta [dot] p [dot] [insert my surname] [at] gmail [dot] com`.
-Tentative outline of the course is presented below but might be adjusted during the course.
+Tentative outline of the course is presented below but might be adjusted at a later point.
* Week 1 - Probabilistic programming.
* Day 1
* Introduction to modelling in epidemiology
- * Probability distributions refresher
+ * Probability distributions and random variables
* Bayesian inference
* Focus on priors
* Day 2
- * numerical methods to obtain posterior
- * MCMC by hand
- * convergence diagnostics
- * PPLs
- * Intro to Numpyro: model, inference, check convergence
- * Bayesian workflow: prior predictive and posterior predictive
+ * The Monte Carlo methods and MCMC
+ * Convergence diagnostics
+ * Probabilistic programming
+ * Introduction to Numpyro
+ * Bayesian workflow
* Day 3
* logistic regression with Numpyro
* Poisson and NegativeBinomial regression with Numpyro
diff --git a/03_intro_epi.md b/03_intro_epi.md
index 9b4a470..192cfa0 100644
--- a/03_intro_epi.md
+++ b/03_intro_epi.md
@@ -1,15 +1,15 @@
# Introduction to Modelling in epidemiology
-In this course we will consider a range of models used in epidemiology - from spatial statistics to disease transmission modelling - and their probabilistic formulation. In order to perform Bayesian inference we will use the probabilistic programing language (PPL) Numpyro.
+In this course we will consider a range of models used in epidemiology - from hierarchical modelling and spatial statistics to disease transmission modelling - and their probabilistic formulation. In order to perform Bayesian inference we will use the probabilistic programing language (PPL) Numpyro.
-Let's uncover each of the three key terms of the course - **epidemioligy**, **probabilistic modelling** and **probablistic programming**. You can think of them as the 'What?', 'Why?' and 'How?' of the course, correspondingly.
+Let's uncover each of the three key terms of the course - **epidemioligy**, **Bayesian modelling** and **probablistic programming**. You can think of them as the 'What?', 'Why?' and 'How?' of the course, correspondingly.
(epidemiology)=
## Epidemiology
-Epidemiology is the 'What?' of this course, i.e. 'What real-life phenomena do we want to study?.
+Epidemiology is the 'What?' of this course, i.e. 'What real-life phenomena do we want to study?'
-The range of computational models which we will cover is motivated by questios in epidemiology and public health.
+The range of computational models which we will cover is motivated by questions in epidemiology and public health.
Epidemiology is the study of how diseases and health-related events are distributed within populations and the factors that influence these distributions. It is a branch of public health that focuses on understanding the patterns, causes, and effects of diseases and health conditions on a large scale. Epidemiologists collect and analyze *data* to investigate the occurrence of health outcomes, their risk factors, and the impact of various interventions or preventive measures.
@@ -17,29 +17,35 @@ Epidemiological studies are essential for understanding the health of population
Key aspects of epidemiology include:
-- **Disease Surveillance:** Epidemiologists monitor the occurrence of diseases and health-related events over time and across different geographic areas. This involves tracking the number of cases, identifying outbreaks, and assessing trends in disease incidence and prevalence.
+- **Disease Surveillance:** Epidemiologists monitor the occurrence of diseases and health-related events over time and across different geographic areas. This involves tracking the number of cases, identifying outbreaks, and assessing trends in disease incidence and prevalence.
-- **Outbreak Investigation:** Epidemiologists are often involved in investigating disease outbreaks, such as foodborne illnesses, infectious disease outbreaks, or clusters of chronic diseases. They work to identify the source of the outbreak and implement measures to contain and prevent further spread.
+- **Outbreak Investigation:** Epidemiologists are often involved in investigating disease outbreaks, such as foodborne illnesses, infectious disease outbreaks, or clusters of chronic diseases. They work to identify the source of the outbreak and implement measures to contain and prevent further spread.
-- **Identifying Risk Factors:** Epidemiological studies aim to identify the factors that are associated with increases likelihood of developing a particular disease. These risk factors can include genetic predisposition, environmental exposures, lifestyle choices, and social determinants of health.
+```{margin}
+It is important to distinguish associative stidies with those where researchers try to oncover causal relashionships between risk factors and outcomes.
+```
+- **Identifying Risk Factors:** Epidemiological studies aim to identify the factors that are associated with increases likelihood of developing a particular disease. These risk factors can include genetic predisposition, environmental exposures, lifestyle choices, and social determinants of health.
-- **Disease Prevention and Control:** The insights gained from epidemiological research are crucial for designing and implementing public health interventions and policies aimed at preventing and controlling diseases. This may involve vaccination campaigns, health education programs, quarantine measures, and more.
+- **Disease Prevention and Control:** The insights gained from epidemiological research are crucial for designing and implementing public health interventions and policies aimed at preventing and controlling diseases. This may involve vaccination campaigns, health education programs, quarantine measures, and more.
-- **Public Health Planning:** Epidemiological data and findings play a vital role in informing public health planning and resource allocation. This includes assessing healthcare needs, identifying at-risk populations, and developing strategies to improve overall health outcomes.
+- **Public Health Planning:** Epidemiological data and findings play a vital role in informing public health planning and resource allocation. This includes assessing healthcare needs, identifying at-risk populations, and developing strategies to improve overall health outcomes.
-- **Causality Assessment:** Epidemiologists use various study designs, including cohort studies, case-control studies, and randomized controlled trials, to determine if a specific factor or intervention causes a particular disease.
+- **Causality Assessment:** Epidemiologists use various study designs, including cohort studies, case-control studies, and randomized controlled trials, to determine if a specific factor or intervention causes a particular disease.
-- **Epidemiological Models:** Mathematical and statistical models are frequently used in epidemiology to simulate disease spread and predict future trends. These models help in making informed decisions and planning interventions.
+- **Epidemiological Models:** Mathematical and statistical models are frequently used in epidemiology to simulate disease spread and estimate disease distribution. These models help in making informed decisions and planning interventions.
-Some models that we will build in this course are more relevant to **infectious**, and some to **chronic** diseases. The scope of applicbility will be clarified for each model once it is introduced.
+Some models that we will build in this course are more relevant to **infectious**, and some to **chronic** diseases. The scope of applicability will be clarified for each model when it is introduced.
-## Probabilistic modelling
+## Bayesian modelling
-Probabilistic modelling is the 'How?' of this course, i.e. 'How can we describe the generative process leading to the data we observe?'.
+```{margin}
+You musy have hearda lot recently about generative AI and deep generative modelling (DGM). It is indeed the same 'generative' idea as we are talking here about. The difference is that DGM uses deep learning and neural network for the generative mechanism, and in traditionla epidemioligy it is more common to use statistical and mechanistic models for such generation. Having said that, we will DGMs in this course too.
+```
+Bayesian modelling is the 'How?' of this course, i.e. 'How can we describe the generative process leading to the data we observe?'. We will use the term 'Bayesian' and 'probabilistic' interchangeably.
Probabilistic modeling is a mathematical and statistical framework used to incorporate **uncertainty** and **randomness** into models to account for variability and its sources in real-world phenomena. It involves using probability theory to describe and quantify the uncertainty associated with different events, outcomes, or variables. The primary goal of probabilistic modeling is to make predictions, infer information, or make decisions in situations where there is inherent uncertainty. Probabilistic modeling is a powerful tool for dealing with real-world complexities in a quantitative manner. It plays a crucial role in data analysis, machine learning, and decision-making processes where probabilistic reasoning is necessary.
-Probabilistic modelling in epidemiology helps epidemiologists and public health officials make informed decisions by quantifying uncertainty, simulating realistic disease dynamics, and assessing the potential impact of various interventions. It is a powerful tool for improving our understanding of health outcomes and guiding effective public health responses.
+Probabilistic modelling in epidemiology helps epidemiologists and public health officials make informed decisions by quantifying uncertainty, simulating realistic disease dynamics, and assessing the potential impact of various interventions. It is a powerful tool for improving our understanding of health outcomes and guiding effective public health responses.
%Here's why probabilistic modelling is important for epidemiology:
@@ -74,7 +80,7 @@ Some key concepts and components of probabilistic modeling are as follows:
- **Monte Carlo Methods:** Monte Carlo methods are a class of computational techniques used to estimate complex probabilistic models through random sampling. They involve generating random samples from probability distributions to approximate quantities of interest.
-## Probabilistics programming
+## Probabilistic programming
Probabilistic programming is a specialized approach to building and analyzing probabilistic models that offers several advantages for epidemiology and the study of infectious disease dynamics:
diff --git a/04_probability_distributions.ipynb b/04_probability_distributions.ipynb
index 61bf032..9bce5fc 100644
--- a/04_probability_distributions.ipynb
+++ b/04_probability_distributions.ipynb
@@ -11,13 +11,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "To embark on an exciting journey into the realm of probabilistic thinking and programming, it's essential to establish a solid foundation. This foundation entails gaining a comprehensive understanding of probability distributions, mastering fundamental probability principles, and acquiring the skills to manipulate probabilities within code.\n",
+ "To embark on an exciting journey into the realm of probabilistic thinking and programming, it is essential to establish a solid foundation. This foundation entails gaining a comprehensive understanding of probability distributions, mastering fundamental probability principles, and acquiring the skills to manipulate probabilities within code.\n",
"\n",
"Probability distributions and random variables serve as tools for describing and performing calculations related to random events, specifically those whose outcomes are uncertain. An illustrative instance of such an uncertain event would be the act of flipping a coin or rolling a dice. In the former case, the potential outcomes are heads or tails.\n",
"\n",
- "*In the context of epidemiological modelling, we will encounter data of different type and origin. It is crucial to grasp the suitability of different probability distributions for modeling specific types of data.*\n",
+ "In the context of epidemiological modelling, we will encounter data of different type and origin. It is crucial to grasp the suitability of different probability distributions for modeling specific types of data.\n",
"\n",
- "Since the PPL we will be using for this course is **Numpyro**, also in this section we will use the implementations of distribution from this library `import numpyro.distributions as dist`"
+ "Since the probabilistic programming language that we will be using for this course is **Numpyro**, also in this section we will use the implementations of distributions from this library avalable via `import numpyro.distributions as dist`"
]
},
{
@@ -62,7 +62,7 @@
"source": [
"### The Bernoulli distribution\n",
"\n",
- "A Bernoulli distribution is used to describe random events with two possible outcomes e.g. when we have a random variable $X$ that takes on one of the two values $x \\in \\{0, 1\\}$ with probabilities $1-p$ and $p, 0 \\le p \\le 1$ respectively:\n",
+ "A Bernoulli distribution is used to describe random events with two possible outcomes e.g. when we have a random variable $X$ that takes on one of the two values $x \\in \\{0, 1\\}$ with probabilities $1-p$ and $p, 0 \\le p \\le 1$ respectively:\n",
"\n",
"\\begin{align*}\n",
"p(X = 1) &= p, \\\\\n",
@@ -75,7 +75,7 @@
"A *discrete* probability distribution can be uniquely defined by its *probability mass function (PMF)*.\n",
"\n",
"```{margin}\n",
- "The term 'mass' is used to underline that the support of the distribution is discrete, and each possible values carries a certain `mass` (probability).\n",
+ "The term 'mass' is used to underline that the support of the distribution is discrete, and each possible value carries a certain `mass` (probability).\n",
"For continuous distributions, the analogous is *probability density function (PDF)*, we will see those later.\n",
"```\n",
"For the Bernoulli distribution, we write the PMF as\n",
@@ -92,7 +92,7 @@
"\n",
"Now let's construct a Bernoulli distribution in code so that we can play around with it and get some intuition.\n",
"\n",
- "**Note:** In this practical, we are going to use `numpyro` to construct our distributions. However, there are several other `jax` packages that work similarly (e.g., `distrax`) as well as several options for `tensorflow` (e.g., `tensorflow_probability`) and `pytorch` (e.g., `torch.distribution`). Don't worry too much about the specifics of how `numpyro` works, e.g., the names of the distributions and their arguments. Instead try to understand what the code is doing."
+ "**Note:** In this practical, we are going to use `numpyro` to construct our distributions. However, there are several other `jax` packages that work similarly (e.g., `distrax`) as well as several options for `tensorflow` (e.g., `tensorflow_probability`) and `pytorch` (e.g., `torch.distribution`)."
]
},
{
@@ -364,6 +364,15 @@
"**Exercise:** plot a panel of histograms where you vary probability $p$ horizontally and numher of samples $n$ vertically. "
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Common usage\n",
+ "\n",
+ "Bernoulli dsitribution is commonly used as a likelihood in models with bimary outcomes. For example, to model disease prevalence."
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -641,7 +650,11 @@
{
"cell_type": "markdown",
"metadata": {},
- "source": []
+ "source": [
+ "#### Common usage\n",
+ "\n",
+ "Binomial dsitribution is commonly used as a likelihood in models with bimary outcomes. For example, to model disease prevalence."
+ ]
},
{
"cell_type": "markdown",
diff --git a/05_Bayesian_inference.ipynb b/05_Bayesian_inference.ipynb
index c297213..a3739d8 100644
--- a/05_Bayesian_inference.ipynb
+++ b/05_Bayesian_inference.ipynb
@@ -99,46 +99,6 @@
"metadata": {},
"source": []
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Chosing the prior distribution\n",
- "\n",
- "In the doctor example, if the doctor we go to has access to history, but only from when the patient was a child and not for their recent years as an adult, they might make the wrong inferences about the current cause of a headache. For example, if they don't know that the patient was in a car accident last month and banged their head, they could get the cause of the headache very wrong! 🥴\n",
- "\n",
- "The choice of the `prior` 💭 is really important! It can depend on a few things:\n",
- "\n",
- "- Type of distribution (we will see this in a second)\n",
- "- Hyperparameters/hyperpriors\n",
- "- Often there is a 'natural' candidate for prior choice\n",
- "- Whether it creates a posterior that is mathematically solvable or not\n",
- "- Some do (conjugate `prior`)\n",
- "- Most do not (non-conjugate)...\n",
- "\n",
- "## The influence of prior\n",
- "\n",
- "Let us explore how much `priors` can actually influence the posterior. Since tha marginal distribution $p(y)$ does not depend on the parameters, we will only explore the posterior up the to proportionality term.\n",
- "\n",
- "$$p(\\theta |y ) ∝ p(y| \\theta) p(\\theta).$$\n",
- "\n",
- "If we have access to point-wise evaluations of the `likelihood` $p(y | \\theta)$ and prior $p(\\theta)$, we can compute their product to obtain this posterior.\n",
- "\n",
- "Consider the coin tossing problem, which we describe using the Bernoulli distribution for a single trial, and the product of Bernoullis for multiple trials. When we compute a `likelihood` by multiplying independent Bernoulli trials, this is like a *permutation* in so far as the *order* of the tosses matters.\n",
- "\n",
- "Another formulation for a repeated Bernoulli random variable is to consider the _proportion_ of correct trials without considering order. We can normalise for this using the formula for combinations, which you may know of as \"$n$ choose $k$.\" This lets us define a random variable on the number of succeses in $n$ trials called a **Binomial random variable**.\n",
- "\n",
- "Let's say that out of\n",
- "$$n=10$$\n",
- "tosses we obtained\n",
- "$$h=6$$\n",
- "successes.\n",
- "\n",
- "Let's consider: what is the probability of \"success\" for this coin? We'll simulate some examples using a binomial random variable.\n",
- "\n",
- "**[Optional]:** *Show that the `likelihood` for coin tosses calculated using independent Bernoulli random variables (a Bernoulli process) is proportional (up to a constant) to the likelihood for coin tosses calculated using a Binomial random variable.*"
- ]
- },
{
"cell_type": "markdown",
"metadata": {},
@@ -157,204 +117,6 @@
"import matplotlib.pyplot as plt"
]
},
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [],
- "source": [
- "##############################################\n",
- "# prior x likelihood = posterior\n",
- "##############################################\n",
- "\n",
- "h=6\n",
- "n=9\n",
- "p=h/n\n",
- "\n",
- "# define grid\n",
- "grid_points=100\n",
- "\n",
- "# define regular grid in the (0,1) interval\n",
- "p_grid = jnp.linspace(0, 1, grid_points)\n",
- "\n",
- "# compute likelihood at each point in the grid\n",
- "log_prob_likelihood = dist.Binomial(n, probs=p_grid).log_prob(h)\n",
- "\n",
- "# normalize likelihood to get the likelihood PMF\n",
- "likelihood_pmf = jnp.exp(log_prob_likelihood - jnp.max(log_prob_likelihood)) / jnp.sum(jnp.exp(log_prob_likelihood - jnp.max(log_prob_likelihood)))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [],
- "source": [
- "def computePosterior(likelihood, prior):\n",
- " # this functionm computes posterior\n",
- " # and plots the result\n",
- "\n",
- " # compute product of likelihood and prior\n",
- " unstd_posterior = likelihood * prior\n",
- "\n",
- " # standardize posterior\n",
- " posterior = unstd_posterior / unstd_posterior.sum()\n",
- "\n",
- " plt.figure(figsize=(17, 3))\n",
- " ax1 = plt.subplot(131)\n",
- " ax1.set_title(\"Prior\")\n",
- " ax1.grid(0.3)\n",
- " plt.plot(p_grid, prior,color='purple')\n",
- "\n",
- " ax2 = plt.subplot(132)\n",
- " ax2.set_title(\"Likelihood\")\n",
- " ax2.grid(0.3)\n",
- " plt.plot(p_grid, likelihood,color='teal')\n",
- "\n",
- " ax3 = plt.subplot(133)\n",
- " ax3.set_title(\"Posterior\")\n",
- " plt.plot(p_grid, posterior,color='gray')\n",
- " ax3.grid(0.3)\n",
- " plt.show()\n",
- "\n",
- " return"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Prior 1 - Uniform\n",
- "\n",
- "Our first `prior` will be a Uniform distribution:\n",
- "\n",
- "$$p(\\theta) = 1.$$\n",
- "\n",
- "This means we don't think the coin is likely to be weighted or not: the probability of heads could take any value between 0 and 1 equally.\n",
- "\n",
- "This is the same as not having a prior at all! So we should expect the likelihood and posterior distributions to look the same (if that isn't intuitive to you, speak to a tutor).\n",
- "\n",
- "Run the code cell below to confirm your intuitions."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- "