This library implements information geometry, a framework for quantifying the geometry of probability distributions.
See statistical_manifolds.ipynb
for examples and information_geometry.py
for the primary code.
For external references on this subject from different backgrounds, see statistical manifold learning (machine learning & neuroscience), curved exponential families (pure math), emergent general relativity (theoretical physics), sensor network localization (engineering & optimization).
The objects of interest are statistical manifolds, which are spaces of smoothly connected probability distributions that differ by changes in parameters (such as varying the mean and standard deviation). The distance on the manifold between nearby distributions is interpreted as their difference in information content with respect to a pair of parameters. As a result, the shape ensuing from all possible variations about a point in parameter space exactly characterizes the "latent" information content of the distribution -- that is, its curvature. This is analogous to how the paths taken by light in general relativity characterize a gravitational field -- except that the curvature of the space arises from a parameterized distribution of probabilities, not mass, and the paths are in parameter space, not spacetime.
We consider normalized probability density functions
$x$ : D-dimensional random variable;
$a$ : M-dimensional distribution parameter;
$X=R^{D \times N}$ : sample domain of observations of$x$ , taken as$N\rightarrow \infty$ .
The metric
The Fisher information quantifies the difference between infinitesimally distinct probability distributions. Each component gives the distance (in bits base
where
It can be showed that
The partial derivatives of the metric give the geodesic equations and define the curvature of the space. This curvature can be interpreted as the "latent" information stored in the derivatives of the distribution with respect to its parameter vectors. Geodesics extremize (typically, minimize) the distance between distributions and can be used to smoothly interpolate between points on a manifold.
Analytically it can be shown that for the Fisher information metric the metric derivatives are:
The connection coefficients relate the metric derivatives to the geodesic acceleration by means of a linear combination of functions. These Christoffel symbols are defined by:
where the second equality is the analytic solution obtained for the Fisher information metric specifically, and
The geodesic equations are finally:
where
The solution is obtained numerically using a 1st-order finite-difference method (Euler). The geodesic path is parametrized in terms of an affine, unit interval of
$a^{l}(t+dt) = a^{l}(t) + dt \cdot \dot{a}^{l}(t)$
$\dot{a}^{l}(t+dt) = \dot{a}^{l}(t) + dt \cdot F^{l}(a(t),\dot{a}(t))$
$F^{l}(a(t),\dot{a}(t))= -\sum_{i,j}\Gamma^{l}_{ij}(a(t)) \cdot \dot{a}^{i}(t)\dot{a}^{j}(t)$
All other derivatives are computed numerically using 2nd-order finite-differences (perturbations above and below the given vector).