Skip to content

Distribution Libraries

Mateo Velásquez-Giraldo edited this page Jun 1, 2022 · 5 revisions

This wiki page is for analysis of Python libraries for representing and manipulating probability distributions, and comparing them with anticipated HARK requirements.

Libraries

This is a list of Python libraries that represent probability distributions.

Features

This is a list of features we want for our representation of distributions in HARK.

  • Discretization. A flexible way of discretizing a continuous distribution. The resulting discretized object should know (reference) its original continuous distribution as well as its discretization method and parameters, so that the discretization can be reproduced at will. #1091
  • Exact match sampling. An alternative way of drawing from a distribution. Rather than sample all N points from the distribution IID, instead enforce that the histogram of the N drawn points matches the probability density function of the discretized distribution exactly. This 'exact match sample' will have the same mean as the original distribution, but a (very slightly) lower variance. See #937
  • Building Distributions. We would like to be able to build distributions from combinations of other distributions, as can be done with the bijection capability of TensorFlow or the similar capabilities in sympy and Mathematica.
  • Expectation calculation. We have some code in HARK for computing expectations, but it would be even better if this were provided by a library. If x and y are HARKDistribution objects, and expr is an expression that is a function of x and y,
    • Expectation(expr, [x,y])
    • This will require the HARKDistribution class to contain the necessary information to compute the expectation, like whether x and y are multivariate distributions, whether they are an analytical continuous distribution or a discrete distribution, etc.
    • Alternative syntax choices are possible; we want as much as possible to use existing tools
  • Marginal distributions from multivariate. For multivariate distributions, the ability to get the marginal distribution factored out. See #1114
  • Markov processes. Some models depend on representations of Markov processes, which are essentially conditional probability distributions. Ideally, this would be an extension of the Distribution representation. See #928
    • We already have some code that calculates an approximating Markov transition matrix for problems whose stochastic shocks are discrete distributions and whose state variables are continuous
    • Ideally this could be accomplished with a syntax something like this:
      • matrix_object = Matricize(next_state_expr,[x,y],next_state_gridpoints)
      • next_state_expr is some expression explaining how the distributions x and y map into some next state
      • the next state need not be x and y (though it will be for a Markov process

Notes on the revision of the libraries

@Mv77: A quick look over the libraries that have been put forward reveals that the stats community does not seem to use the term ''discretization'', or at least not with the same meaning we are using here. I think the analogue to what we are looking for is ''quadratures''.