Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation data support to flow fitting #3

Merged
merged 11 commits into from
Nov 9, 2023
8 changes: 0 additions & 8 deletions .idea/.gitignore

This file was deleted.

95 changes: 48 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# Normalizing flows in PyTorch

This package implements normalizing flows and their building blocks.
The package is meant for researchers, enabling:
It allows:

* easy use of normalizing flows as generative models or density estimators in various applications;
* systematic comparisons of normalizing flows or their building blocks;
* simple implementation of new normalizing flows which belong to either the autoregressive, residual, or continuous
families;
* easy use of normalizing flows as trainable distributions;
* easy implementation of new normalizing flows.

Example use:

```python
import torch
from normalizing_flows import RealNVP, Flow
from normalizing_flows import Flow
from normalizing_flows.architectures import RealNVP


torch.manual_seed(0)

Expand Down Expand Up @@ -53,54 +53,55 @@ We support Python versions 3.7 and upwards.

## Brief background

A normalizing flow (NF) is a flexible distribution, defined as a bijective transformation of a simple statistical
distribution.
The simple distribution is typically a standard Gaussian.
The transformation is typically an invertible neural network that can make the NF arbitrarily complex.
Training a NF using a dataset means optimizing the parameters of transformation to make the dataset likely under the NF.
A normalizing flow (NF) is a flexible trainable distribution.
It is defined as a bijective transformation of a simple distribution, such as a standard Gaussian.
The bijection is typically an invertible neural network.
Training a NF using a dataset means optimizing the bijection's parameters to make the dataset likely under the NF.
We can use a NF to compute the probability of a data point or to independently sample data from the process that
generated our dataset.

A NF $q(x)$ with the bijection $f(z) = x$ and base distribution $p(z)$ is defined as:
$$\log q(x) = \log p(f^{-1}(x)) + \log\left|\det J_{f^{-1}}(x)\right|$$

## Implemented architectures

We implement the following NF transformations:

| Bijection | Inverse | Log determinant | Inverse implemented |
|---------------------------------------------------------------------|:-----------:|:-----------------------:|:-------------------:|
| [NICE](http://arxiv.org/abs/1410.8516) | Exact | Exact | Yes |
| [Real NVP](http://arxiv.org/abs/1605.08803) | Exact | Exact | Yes |
| [MAF](http://arxiv.org/abs/1705.07057) | Exact | Exact | Yes |
| [IAF](http://arxiv.org/abs/1606.04934) | Exact | Exact | Yes |
| [Rational quadratic NSF](http://arxiv.org/abs/1906.04032) | Exact | Exact | Yes |
| [Linear rational NSF](http://arxiv.org/abs/2001.05168) | Exact | Exact | Yes |
| [NAF](http://arxiv.org/abs/1804.00779) | | | |
| [Block NAF](http://arxiv.org/abs/1904.04676) | | | |
| [UMNN](http://arxiv.org/abs/1908.05164) | Approximate | Exact | No |
| [Planar](https://onlinelibrary.wiley.com/doi/abs/10.1002/cpa.21423) | Approximate | Exact | No |
| [Radial](https://proceedings.mlr.press/v37/rezende15.html) | Approximate | Exact | No |
| [Sylvester](http://arxiv.org/abs/1803.05649) | Approximate | Exact | No |
| [Invertible ResNet](http://arxiv.org/abs/1811.00995) | Approximate | Biased approximation | Yes |
| [ResFlow](http://arxiv.org/abs/1906.02735) | Approximate | Unbiased approximation | Yes |
| [Proximal ResFlow](http://arxiv.org/abs/2211.17158) | Approximate | Exact (if single layer) | Yes |
| [FFJORD](http://arxiv.org/abs/1810.01367) | Approximate | Approximate | Yes |
| [RNODE](http://arxiv.org/abs/2002.02798) | Approximate | Approximate | Yes |
| [DDNF](http://arxiv.org/abs/1810.03256) | Approximate | Approximate | Yes |
| [OT flow](http://arxiv.org/abs/2006.00104) | Approximate | Exact | Yes |

Note: inverse approximations can be made arbitrarily accurate with stricter convergence conditions.
Architectures without an implemented inverse support either sampling or density estimation, but not both at once.
Such architectures are unsuitable for downstream tasks which require both functionalities.

We also implement simple bijections that can be used in the same manner:
The density of a NF $q(x)$ with the bijection $f(z) = x$ and base distribution $p(z)$ is defined as:
$$\log q(x) = \log p(f^{-1}(x)) + \log\left|\det J_{f^{-1}}(x)\right|.$$
Sampling from a NF means sampling from the simple distribution and transforming the sample using the bijection.

## Supported architectures

We list supported NF architectures below.
We classify architectures as either autoregressive, residual, or continuous; as defined
by [Papamakarios et al. (2021)](https://arxiv.org/abs/1912.02762).
Exact architectures do not use numerical approximations to generate data or compute the log density.

| Architecture | Bijection type | Exact | Two-way |
|--------------------------------------------------------------------------|:--------------------------:|:-------:|:-------:|
| [NICE](http://arxiv.org/abs/1410.8516) | Autoregressive | ✔ | ✔ |
| [Real NVP](http://arxiv.org/abs/1605.08803) | Autoregressive | ✔ | ✔ |
| [MAF](http://arxiv.org/abs/1705.07057) | Autoregressive | ✔ | ✔ |
| [IAF](http://arxiv.org/abs/1606.04934) | Autoregressive | ✔ | ✔ |
| [Rational quadratic NSF](http://arxiv.org/abs/1906.04032) | Autoregressive | ✔ | ✔ |
| [Linear rational NSF](http://arxiv.org/abs/2001.05168) | Autoregressive | ✔ | ✔ |
| [NAF](http://arxiv.org/abs/1804.00779) | Autoregressive | ✗ | ✔ |
| [UMNN](http://arxiv.org/abs/1908.05164) | Autoregressive | ✗ | ✔ |
| [Planar](https://onlinelibrary.wiley.com/doi/abs/10.1002/cpa.21423) | Residual | ✗ | ✗ |
| [Radial](https://proceedings.mlr.press/v37/rezende15.html) | Residual | ✗ | ✗ |
| [Sylvester](http://arxiv.org/abs/1803.05649) | Residual | ✗ | ✗ |
| [Invertible ResNet](http://arxiv.org/abs/1811.00995) | Residual | ✗ | ✔* |
| [ResFlow](http://arxiv.org/abs/1906.02735) | Residual | ✗ | ✔* |
| [Proximal ResFlow](http://arxiv.org/abs/2211.17158) | Residual | ✗ | ✔* |
| [FFJORD](http://arxiv.org/abs/1810.01367) | Continuous | ✗ | ✔* |
| [RNODE](http://arxiv.org/abs/2002.02798) | Continuous | ✗ | ✔* |
| [DDNF](http://arxiv.org/abs/1810.03256) | Continuous | ✗ | ✔* |
| [OT flow](http://arxiv.org/abs/2006.00104) | Continuous | ✗ | ✔ |

Two-way architectures support both sampling and density estimation.
Two-way architectures marked with an asterisk (*) support both, but use a numerical approximation to sample or estimate
density.
One-way architectures support either sampling or density estimation, but not both at once.

We also support simple bijections (all exact and two-way):

* Permutation
* Elementwise translation (shift vector)
* Elementwise scaling (diagonal matrix)
* Rotation (orthogonal matrix)
* Triangular matrix
* Dense matrix (using the QR or LU decomposition)

All of these have exact inverses and log determinants.
2 changes: 1 addition & 1 deletion examples/Computing log determinants.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The code is as follows:
```python
import torch
from normalizing_flows import Flow
from normalizing_flows.bijections import RealNVP
from normalizing_flows.architectures import RealNVP

torch.manual_seed(0)

Expand Down
2 changes: 1 addition & 1 deletion examples/Modifying architectures.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ We give an example on how to modify a bijection's architecture.
We use the Masked Autoregressive Flow (MAF) as an example.
We can manually set the number of invertible layers as follows:
```python
from normalizing_flows.bijections import MAF
from normalizing_flows.architectures import MAF

event_shape = (10,)
flow = MAF(event_shape=event_shape, n_layers=5)
Expand Down
2 changes: 1 addition & 1 deletion examples/Training a normalizing flow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The code is as follows:
```python
import torch
from normalizing_flows import Flow
from normalizing_flows.bijections import RealNVP
from normalizing_flows.architectures import RealNVP

torch.manual_seed(0)

Expand Down
20 changes: 20 additions & 0 deletions normalizing_flows/architectures.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from normalizing_flows.bijections.finite.autoregressive.architectures import (
NICE,
RealNVP,
MAF,
IAF,
CouplingRQNSF,
MaskedAutoregressiveRQNSF,
InverseAutoregressiveRQNSF,
CouplingLRS,
MaskedAutoregressiveLRS,
CouplingDSF,
UMNNMAF
)

from normalizing_flows.bijections.continuous.ddnf import DeepDiffeomorphicBijection
from normalizing_flows.bijections.continuous.rnode import RNODE
from normalizing_flows.bijections.continuous.ffjord import FFJORD
from normalizing_flows.bijections.continuous.otflow import OTFlow

from normalizing_flows.bijections.finite.residual.architectures import ResFlow, ProximalResFlow, InvertibleResNet
4 changes: 2 additions & 2 deletions normalizing_flows/bijections/continuous/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ def forward(self, t, states):
y = states[0]
self._n_evals += 1

t = torch.tensor(t).type_as(y)
t = torch.as_tensor(t).type_as(y)

with torch.enable_grad():
y.requires_grad_(True)
Expand Down Expand Up @@ -198,7 +198,7 @@ def forward(self, t, states):
y = states[0]
self._n_evals += 1

t = torch.tensor(t).type_as(y)
t = torch.as_tensor(t).type_as(y)

if self.hutch_noise is None:
self.hutch_noise = torch.randn_like(y)
Expand Down
Loading