-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
817dd7a
commit e963313
Showing
3 changed files
with
110 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Log-Gaussian Cox Process\n", | ||
"\n", | ||
"We consider a random realization of a point process $S=\\{s_1, \\dots ,s_n\\}$ - a <font color='orange'>point pattern</font> in a study region $D$. The aim of point patterns analysis is to discover the underlying process based on the generated events $S$. Both the number of events $n$ and their locations $S$ are random. Point processes are characterized by their first- and second- order properties. First-order properties, such as <font color='orange'>intensity function</font>, measure the distribution of events in a study region. Second-order properties measure the tendency of events to appear clustered, independently, or regularly-spaced. The intensity function $\\lambda(s)$ represents the non-normalized location probability density and sums up to the expectation of the total number of points within the study region $D$: \n", | ||
"\n", | ||
"$$\n", | ||
"\\text{E}[N(D)] = \\lambda(D) = \\int_D \\lambda(s)ds,\n", | ||
"$$\n", | ||
"\n", | ||
"where $N(.)$ is a counting measure. To construct a probabilistic model for a point pattern $S$ one needs to specify distributions for the total number of points $N(D)$ and the locations of the points:\n", | ||
"\n", | ||
"$$\n", | ||
"L(S) = \\text{P} \\left[ N(D)=n \\right] n! f(s_1,...,s_n \\mid n),\n", | ||
"$$\n", | ||
"\n", | ||
"where the factorial n! comes from the exchangeability of the events within $S$ and $f(s_1,...,s_n \\mid n)$ stands for the location density, given $N(D)=n$. Due to the independence of disjoint sets one obtains\n", | ||
"\n", | ||
"$$\n", | ||
"f(s_1,...,s_n \\mid n) = \\prod\\limits_{i=1}^{n}f(s_i) = \\prod\\limits_{i=1}^{n} \\frac{\\lambda(s_i)}{\\lambda(D)} = \\frac{1}{(\\lambda(D))^n}\\prod\\limits_{i=1}^{n}\\lambda(s_i),\n", | ||
"$$\n", | ||
"\n", | ||
"i.e. the likelihood of a point pattern can be rewritten as\n", | ||
"\n", | ||
"$$\n", | ||
"L(S) = \\text{P}\\left[ N(D)=n \\right] n! \\frac{1}{(\\lambda(D))^n}\\prod\\limits_{i=1}^{n}\\lambda(s_i).\n", | ||
"$$\n", | ||
"\n", | ||
"Log-Gaussian Cox Process is a doubly-stochastic process with Gaussian log-intensity. Further we consider two possible choices for the probability distribution of the number of points: Poisson and Negative Binomial.\n", | ||
"\n", | ||
"## Poisson count distribution\n", | ||
"A customary assumption is that the number of points within a point pattern follows the Poisson distribution:\n", | ||
"\n", | ||
"$$\n", | ||
"P_{po}^{\\lambda} \\left[ N(D)=n \\right] = \\frac{\\exp(\\lambda(D)) (\\lambda(D))^n}{n!}\n", | ||
"$$\n", | ||
"\n", | ||
"and the data is distributed as a Poisson process, conditional on the intensity function $\\lambda(s)$ : \n", | ||
"\n", | ||
"$$\n", | ||
"S \\mid \\lambda(s) \\sim \\text{PP}(\\lambda(s)).\n", | ||
"$$\n", | ||
"\n", | ||
"The likelihood and log-likelihood for point patterns in this case depend on the entire intensity surface and are expressed as\n", | ||
"\n", | ||
"$$\n", | ||
"L_{\\text{po}}(s_1,\\dots,s_n, s_i \\in D; \\lambda(s)) = \\exp(-\\lambda(D)) \\prod\\limits_{i=1}^{n}\\lambda(s_i),\n", | ||
"$$\n", | ||
"\n", | ||
"$$\n", | ||
"\\log L_{\\text{po}}(s_1,\\dots,s_n, s_i \\in D; \\lambda(s)) = -\\lambda(D) + \\sum\\limits_{i=1}^{n}\\log \\lambda(s_i).\n", | ||
"$$\n", | ||
"\n", | ||
"As before, $\\lambda(D)$ should be computed as an integral over the domain $D$.\n", | ||
"\n", | ||
"The fitting of the model can be done either via this likelihood or, alternatively, via discretization of the study region $D$ into cells and the <font color='orange'>grid approach</font>: despite the assumption of the continuity of the process, for computational purposes the study region is discretized and is being covered with a set of disjoint cells. In the framework of the grid approach all observed points are being framed by a discretized observation window. For each cell of this grid $c_i$ the observed number of cases falling within a grid cell $y_i$ are being collected and attributed to the cell:\n", | ||
"\n", | ||
"$$\n", | ||
"y_{i} \\mid \\lambda(c_i) \\sim \\text{Pois}(\\lambda(c_i)),\n", | ||
"$$\n", | ||
"\n", | ||
"$$\n", | ||
"\\lambda(c_i) \\approx \\frac{\\vert D \\vert}{K} \\lambda(g_i).\n", | ||
"$$\n", | ||
"\n", | ||
"where $K$ is the overall number of cells and $g_i$ denotes a cell centroid. I.e. the observed counts follow the Poisson distribution, conditioned on the random field. Further we show that the higher the number of cells, the closer is the discretized representation of the LGCP to its true continuous version. By independence of the counts within disjoint cells we compute\n", | ||
"\n", | ||
"$$\n", | ||
"L_{\\text{po}}^{\\text{grid}} = \\prod\\limits_{i=1}^{K}\\exp(\\lambda(c_i))(\\lambda(c_i))^{y_i}/{y_i!} \\propto \\prod\\limits_{i=1}^{K} \\exp(\\lambda(c_i)) (\\lambda(c_i))^{y_i}\n", | ||
"$$\n", | ||
"\n", | ||
"$$\n", | ||
"= \\exp \\left(-\\sum\\limits_{i=1}^{K}\\lambda(c_i) \\right) \\prod (\\lambda(c_i))^{y_i} = \\exp \\left(-\\lambda(D) \\right) \\prod (\\lambda(c_i))^{y_i},\n", | ||
"$$\n", | ||
"\n", | ||
"the second product is taken over the cells with non-zero counts. As the volume of each cell becomes smaller $\\vert c_i \\vert \\to 0$, the counts $y_i$ within cells become 0 or 1 and we obtain the same likelihood as $L_{\\text{po}}.$ Thus, this fitting approach represents an alternative to the one through direct likelihood: instead of using only the points of the observed events and having to evaluate the integral over the observation region, we create artificial counts on a fine grid and use them to discover the intensity. \n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"colab": { | ||
"provenance": [] | ||
}, | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.7" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters