Skip to content

Commit

Permalink
edits
Browse files Browse the repository at this point in the history
  • Loading branch information
elizavetasemenova committed Feb 17, 2024
1 parent c1a5573 commit a8e282e
Show file tree
Hide file tree
Showing 6 changed files with 57 additions and 508 deletions.
17 changes: 8 additions & 9 deletions 02_about.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,23 @@

These lecture notes cover the course which will be taught during three weeks from 25 March to 12 April 2024 to a MSc ["AI for Science"](https://ai.aims.ac.za/) cohort at the [African Institute for Mathematical Sciences (AIMS)](https://aims.ac.za/), South Africa. After the course, I plan to keep improving the materials since they will be helpful for future stundents and collaborators.

If you notice any typos, mistakes or inconsistencies in these course notes, please email them to `elizaveta [dot] p [dot] [insert my surname] [at] gmail [dot] com`.
If you notice any typos, mistakes or inconsistencies, please email them to `elizaveta [dot] p [dot] [insert my surname] [at] gmail [dot] com`.

Tentative outline of the course is presented below but might be adjusted during the course.
Tentative outline of the course is presented below but might be adjusted at a later point.


* <span style="color:orange">Week 1 - Probabilistic programming</span>.
* Day 1
* Introduction to modelling in epidemiology
* Probability distributions refresher
* Probability distributions and random variables
* Bayesian inference
* Focus on priors
* Day 2
* numerical methods to obtain posterior
* MCMC by hand
* convergence diagnostics
* PPLs
* Intro to Numpyro: model, inference, check convergence
* Bayesian workflow: prior predictive and posterior predictive
* The Monte Carlo methods and MCMC
* Convergence diagnostics
* Probabilistic programming
* Introduction to Numpyro
* Bayesian workflow
* Day 3
* logistic regression with Numpyro
* Poisson and NegativeBinomial regression with Numpyro
Expand Down
38 changes: 22 additions & 16 deletions 03_intro_epi.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,51 @@
# Introduction to Modelling in epidemiology

In this course we will consider a range of models used in epidemiology - from spatial statistics to disease transmission modelling - and their probabilistic formulation. In order to perform Bayesian inference we will use the probabilistic programing language (PPL) Numpyro.
In this course we will consider a range of models used in epidemiology - from <font color='orange'>hierarchical modelling</font> and <font color='orange'>spatial statistics</font> to <font color='orange'>disease transmission modelling</font> - and their probabilistic formulation. In order to perform Bayesian inference we will use the probabilistic programing language (PPL) Numpyro.

Let's uncover each of the three key terms of the course - **epidemioligy**, **probabilistic modelling** and **probablistic programming**. You can think of them as the 'What?', 'Why?' and 'How?' of the course, correspondingly.
Let's uncover each of the three key terms of the course - **epidemioligy**, **Bayesian modelling** and **probablistic programming**. You can think of them as the 'What?', 'Why?' and 'How?' of the course, correspondingly.

(epidemiology)=
## Epidemiology

Epidemiology is the 'What?' of this course, i.e. 'What real-life phenomena do we want to study?.
Epidemiology is the 'What?' of this course, i.e. 'What real-life phenomena do we want to study?'

The range of computational models which we will cover is motivated by questios in epidemiology and public health.
The range of computational models which we will cover is motivated by questions in epidemiology and public health.

Epidemiology is the study of how diseases and health-related events are distributed within populations and the factors that influence these distributions. It is a branch of public health that focuses on understanding the patterns, causes, and effects of diseases and health conditions on a large scale. Epidemiologists collect and analyze *data* to investigate the occurrence of health outcomes, their risk factors, and the impact of various interventions or preventive measures.

Epidemiological studies are essential for understanding the health of populations, identifying health disparities, and guiding public health efforts to improve the well-being of communities and societies.

Key aspects of epidemiology include:

- **Disease Surveillance:** Epidemiologists monitor the occurrence of diseases and health-related events over time and across different geographic areas. This involves tracking the number of cases, identifying outbreaks, and assessing trends in disease incidence and prevalence.
- **Disease Surveillance:** Epidemiologists <font color='orange'>monitor</font> the occurrence of diseases and health-related events over time and across different geographic areas. This involves tracking the number of cases, identifying outbreaks, and assessing trends in disease incidence and prevalence.

- **Outbreak Investigation:** Epidemiologists are often involved in investigating disease outbreaks, such as foodborne illnesses, infectious disease outbreaks, or clusters of chronic diseases. They work to identify the source of the outbreak and implement measures to contain and prevent further spread.
- **Outbreak Investigation:** Epidemiologists are often involved in <font color='orange'>investigating</font> disease outbreaks, such as foodborne illnesses, infectious disease outbreaks, or clusters of chronic diseases. They work to identify the source of the outbreak and implement measures to contain and prevent further spread.

- **Identifying Risk Factors:** Epidemiological studies aim to identify the factors that are associated with increases likelihood of developing a particular disease. These risk factors can include genetic predisposition, environmental exposures, lifestyle choices, and social determinants of health.
```{margin}
It is important to distinguish <font color='orange'>associative</font> stidies with those where researchers try to oncover <font color='orange'>causal</font> relashionships between risk factors and outcomes.
```
- **Identifying Risk Factors:** Epidemiological studies aim to identify the <font color='orange'>factors</font> that are <font color='orange'>associated</font> with increases likelihood of developing a particular disease. These risk factors can include genetic predisposition, environmental exposures, lifestyle choices, and social determinants of health.

- **Disease Prevention and Control:** The insights gained from epidemiological research are crucial for designing and implementing public health interventions and policies aimed at preventing and controlling diseases. This may involve vaccination campaigns, health education programs, quarantine measures, and more.
- **Disease Prevention and Control:** The insights gained from epidemiological research are crucial for designing and implementing public health <font color='orange'>interventions</font> and <font color='orange'>policies</font> aimed at preventing and controlling diseases. This may involve vaccination campaigns, health education programs, quarantine measures, and more.

- **Public Health Planning:** Epidemiological data and findings play a vital role in informing public health planning and resource allocation. This includes assessing healthcare needs, identifying at-risk populations, and developing strategies to improve overall health outcomes.
- **Public Health Planning:** Epidemiological data and findings play a vital role in informing public health planning and <font color='orange'>resource allocation</font>. This includes assessing healthcare needs, identifying at-risk populations, and developing strategies to improve overall health outcomes.

- **Causality Assessment:** Epidemiologists use various study designs, including cohort studies, case-control studies, and randomized controlled trials, to determine if a specific factor or intervention causes a particular disease.
- **Causality Assessment:** Epidemiologists use various study designs, including cohort studies, case-control studies, and randomized controlled trials, to determine if a specific factor or intervention <font color='orange'>causes</font> a particular disease.

- **Epidemiological Models:** Mathematical and statistical models are frequently used in epidemiology to simulate disease spread and predict future trends. These models help in making informed decisions and planning interventions.
- **Epidemiological Models:** Mathematical and statistical models are frequently used in epidemiology to simulate <font color='orange'>disease spread</font> and estimate <font color='orange'>disease distribution</font>. These models help in making informed decisions and planning interventions.

Some models that we will build in this course are more relevant to **infectious**, and some to **chronic** diseases. The scope of applicbility will be clarified for each model once it is introduced.
Some models that we will build in this course are more relevant to **infectious**, and some to **chronic** diseases. The scope of applicability will be clarified for each model when it is introduced.

## Probabilistic modelling
## Bayesian modelling

Probabilistic modelling is the 'How?' of this course, i.e. 'How can we describe the generative process leading to the data we observe?'.
```{margin}
You musy have hearda lot recently about <font color='orange'>generative AI</font> and <font color='orange'>deep generative modelling (DGM)</font>. It is indeed the same 'generative' idea as we are talking here about. The difference is that DGM uses deep learning and neural network for the generative mechanism, and in traditionla epidemioligy it is more common to use statistical and mechanistic models for such generation. Having said that, we will DGMs in this course too.
```
Bayesian modelling is the 'How?' of this course, i.e. 'How can we describe the <font color='orange'>generative process</font> leading to the data we observe?'. We will use the term 'Bayesian' and 'probabilistic' interchangeably.

Probabilistic modeling is a mathematical and statistical framework used to incorporate **uncertainty** and **randomness** into models to account for variability and its sources in real-world phenomena. It involves using probability theory to describe and quantify the uncertainty associated with different events, outcomes, or variables. The primary goal of probabilistic modeling is to make predictions, infer information, or make decisions in situations where there is inherent uncertainty. Probabilistic modeling is a powerful tool for dealing with real-world complexities in a quantitative manner. It plays a crucial role in data analysis, machine learning, and decision-making processes where probabilistic reasoning is necessary.

Probabilistic modelling in epidemiology helps epidemiologists and public health officials make informed decisions by quantifying uncertainty, simulating realistic disease dynamics, and assessing the potential impact of various interventions. It is a powerful tool for improving our understanding of health outcomes and guiding effective public health responses.
Probabilistic modelling in epidemiology helps epidemiologists and public health officials make informed decisions by <font color='orange'>quantifying uncertainty</font>, simulating realistic disease dynamics, and assessing the potential impact of various interventions. It is a powerful tool for improving our understanding of health outcomes and guiding effective public health responses.

%Here's why probabilistic modelling is important for epidemiology:

Expand Down Expand Up @@ -74,7 +80,7 @@ Some key concepts and components of probabilistic modeling are as follows:
- **Monte Carlo Methods:** Monte Carlo methods are a class of computational techniques used to estimate complex probabilistic models through random sampling. They involve generating random samples from probability distributions to approximate quantities of interest.


## Probabilistics programming
## Probabilistic programming


Probabilistic programming is a specialized approach to building and analyzing probabilistic models that offers several advantages for epidemiology and the study of infectious disease dynamics:
Expand Down
27 changes: 20 additions & 7 deletions 04_probability_distributions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"To embark on an exciting journey into the realm of probabilistic thinking and programming, it's essential to establish a solid foundation. This foundation entails gaining a comprehensive understanding of probability distributions, mastering fundamental probability principles, and acquiring the skills to manipulate probabilities within code.\n",
"To embark on an exciting journey into the realm of probabilistic thinking and programming, it is essential to establish a solid foundation. This foundation entails gaining a comprehensive understanding of probability distributions, mastering fundamental probability principles, and acquiring the skills to manipulate probabilities within code.\n",
"\n",
"Probability distributions and random variables serve as tools for describing and performing calculations related to random events, specifically those whose outcomes are uncertain. An illustrative instance of such an uncertain event would be the act of flipping a coin or rolling a dice. In the former case, the potential outcomes are heads or tails.\n",
"\n",
"*In the context of epidemiological modelling, we will encounter data of different type and origin. It is crucial to grasp the suitability of different probability distributions for modeling specific types of data.*\n",
"In the context of epidemiological modelling, we will encounter data of different type and origin. It is crucial to grasp the suitability of different probability distributions for modeling specific types of data.\n",
"\n",
"Since the PPL we will be using for this course is **Numpyro**, also in this section we will use the implementations of distribution from this library `import numpyro.distributions as dist`"
"Since the probabilistic programming language that we will be using for this course is **Numpyro**, also in this section we will use the implementations of distributions from this library avalable via `import numpyro.distributions as dist`"
]
},
{
Expand Down Expand Up @@ -62,7 +62,7 @@
"source": [
"### The Bernoulli distribution\n",
"\n",
"A Bernoulli distribution is used to describe random events with two possible outcomes e.g. when we have a random variable $X$ that takes on one of the two values $x \\in \\{0, 1\\}$ with probabilities $1-p$ and $p, 0 \\le p \\le 1$ respectively:\n",
"A Bernoulli distribution is used to describe random events with <span style=\"color:orange\">two possible outcomes</span> e.g. when we have a random variable $X$ that takes on one of the two values $x \\in \\{0, 1\\}$ with probabilities $1-p$ and $p, 0 \\le p \\le 1$ respectively:\n",
"\n",
"\\begin{align*}\n",
"p(X = 1) &= p, \\\\\n",
Expand All @@ -75,7 +75,7 @@
"A *discrete* probability distribution can be uniquely defined by its *probability mass function (PMF)*.\n",
"\n",
"```{margin}\n",
"The term 'mass' is used to underline that the support of the distribution is discrete, and each possible values carries a certain `mass` (probability).\n",
"The term 'mass' is used to underline that the support of the distribution is discrete, and each possible value carries a certain `mass` (probability).\n",
"For continuous distributions, the analogous is *probability density function (PDF)*, we will see those later.\n",
"```\n",
"For the Bernoulli distribution, we write the PMF as\n",
Expand All @@ -92,7 +92,7 @@
"\n",
"Now let's construct a Bernoulli distribution in code so that we can play around with it and get some intuition.\n",
"\n",
"**Note:** In this practical, we are going to use `numpyro` to construct our distributions. However, there are several other `jax` packages that work similarly (e.g., `distrax`) as well as several options for `tensorflow` (e.g., `tensorflow_probability`) and `pytorch` (e.g., `torch.distribution`). Don't worry too much about the specifics of how `numpyro` works, e.g., the names of the distributions and their arguments. Instead try to understand what the code is doing."
"**Note:** In this practical, we are going to use `numpyro` to construct our distributions. However, there are several other `jax` packages that work similarly (e.g., `distrax`) as well as several options for `tensorflow` (e.g., `tensorflow_probability`) and `pytorch` (e.g., `torch.distribution`)."
]
},
{
Expand Down Expand Up @@ -364,6 +364,15 @@
"**Exercise:** plot a panel of histograms where you vary probability $p$ horizontally and numher of samples $n$ vertically. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Common usage\n",
"\n",
"Bernoulli dsitribution is commonly used as a likelihood in models with bimary outcomes. For example, to model disease prevalence."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -641,7 +650,11 @@
{
"cell_type": "markdown",
"metadata": {},
"source": []
"source": [
"#### Common usage\n",
"\n",
"Binomial dsitribution is commonly used as a likelihood in models with bimary outcomes. For example, to model disease prevalence."
]
},
{
"cell_type": "markdown",
Expand Down
Loading

0 comments on commit a8e282e

Please sign in to comment.