Pangolin is an early-stage probabilistic inference research project. The focus is to make probabilistic inference fun.
See INSTALL.md
See CHANGELOG.md
All user-facing functions are concisely documented here, with examples, in a single 250-ish line file.
Alternatively, auto-generated docs are available at justindomke.github.io/pangolin. (Currently quite chaotic.)
Simple "probabilistic calculator":
import pangolin as pg
x = pg.normal(0,2) # x ~ normal(0,2)
y = pg.normal(x,6) # y ~ normal(x,6)
print(pg.E(x,y,-2.0)) # E[x|y=-2] (close to -0.2)
Bayesian inference on the 8-schools model:
import pangolin as pg
# data for 8 schools model
num_schools = 8
observed_effects = [28, 8, -3, 7, -1, 1, 18, 12]
stddevs = [15, 10, 16, 11, 9, 11, 10, 18]
# define model
mu = pg.normal(0,10) # μ ~ normal(0,10)
tau = pg.exp(pg.normal(5,1)) # τ ~ lognormal(5,1)
theta = [pg.normal(mu,tau) for i in range(num_schools)] # θ[i] ~ normal(μ,τ)
y = [pg.normal(theta[i],stddevs[i]) for i in range(num_schools)] # y[i] ~ normal(θ[i],stddevs[i])
# do inference / sample from p(theta | y=observed_effects)
theta_samps = pg.inference.numpyro.sample_flat(theta, y, observed_effects)
# plot results (no pangolin here!)
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
sns.swarmplot(np.array(theta_samps)[:,::50].T,s=2,zorder=0)
plt.xlabel('school')
plt.ylabel('treatment effect')
If you're in the market for a PPL, you might want to compare the above to the same (or close) model implemented in other PPLs:
PPL | Comment |
---|---|
Pyro | Requires "sample" statements, passing variable names as strings, and uses slightly mysterious plate construct. |
NumPyro | Requires "sample" statements, passing variable names as strings, and uses slightly mysterious plate construct. |
PyMC | Pretty good, though requires creating a "model" function and passing variables names as strings. |
JAGS | Pretty good, both simple and explicit. We had this in 1991! Requires using a separate language. |
Stan | Looks very simple, but uses somewhat subtle batching semantics. Could be written similarly to the JAGS model, just with mandatory declarations of all types/shapes. Requires a separate language. |
Tensorflow probability | Legend has it that some find this a wee bit complicated. |
For more examples, take a look at the demos.
(For the current Python interface)
- Gradual enhancement. Easy things should be really easy. More complex features should be easily discoverable. Steep learning curves should be avoided.
- Small API surface. The set of abstractions the user needs to learn should be as small as possible.
- Graceful interop. As much as possible, the system should feel lke a natural part of the broader Python NumPy ecosystem, rather than a "new language".
- Look like math. As much as possible, calculations should resemble mathematical notation. Exceptions are allowed when algorithmic limitations make this impossible or where standard mathematical notation is ambiguous or bad.
Long-term, Pangolin has the following goals:
- To "decouple" probabilistic models from inference algorithms. It should be possible to write a model once, and then perform inference using many inference "backends". (Among other things, this should facilitate benchmarks)
- To make it easier to experiment with novel inference algorithms that inspect the target distribution.
- To support different possible interfaces, in different languages.
- To be "unopinionated" about how users might specify models, and how inference might be done.
An earlier version of Pangolin is available and based on much the same ideas, except only supporting JAGS as a backend. It can be found with documentation, in the
pangolin-jags
directory.