From a489c4624512e5480f167211f4cf419c517bb7fb Mon Sep 17 00:00:00 2001
From: David Nabergoj <davidnabergoj4@gmail.com>
Date: Wed, 14 Aug 2024 13:35:38 +0200
Subject: [PATCH] Update docs

---
 README.md                                     | 66 ++-----------------
 .../source/guides/mathematical_background.rst | 16 +++++
 docs/source/guides/usage.rst                  |  3 +-
 docs/source/index.rst                         |  3 +
 4 files changed, 28 insertions(+), 60 deletions(-)
 create mode 100644 docs/source/guides/mathematical_background.rst

diff --git a/README.md b/README.md
index 3fe5c48..016d1bd 100644
--- a/README.md
+++ b/README.md
@@ -30,16 +30,20 @@ print(log_prob.shape)  # (100,)
 print(x_new.shape)  # (50, 3)
 ```
 
-We provide more examples [here](examples/).
+Check examples and documentation, including the list of supported architectures [here](torchflows.readthedocs.io/en/latest/).
+We also provide examples [here](examples/).
 
 ## Installing
 
-Install via pip:
+We support Python versions 3.7 and upwards.
+
+Install Torchflows via pip:
+
 ```
 pip install torchflows
 ```
 
-Install the package directly from Github:
+Install Torchflows directly from Github:
 
 ```
 pip install git+https://github.com/davidnabergoj/torchflows.git
@@ -53,59 +57,3 @@ cd torchflows
 pip install -r requirements.txt
 ```
 
-We support Python versions 3.7 and upwards.
-
-## Brief background
-
-A normalizing flow (NF) is a flexible trainable distribution.
-It is defined as a bijective transformation of a simple distribution, such as a standard Gaussian.
-The bijection is typically an invertible neural network.
-Training a NF using a dataset means optimizing the bijection's parameters to make the dataset likely under the NF.
-We can use a NF to compute the probability of a data point or to independently sample data from the process that
-generated our dataset.
-
-The density of a NF $q(x)$ with the bijection $f(z) = x$ and base distribution $p(z)$ is defined as:
-$$\log q(x) = \log p(f^{-1}(x)) + \log\left|\det J_{f^{-1}}(x)\right|.$$
-Sampling from a NF means sampling from the simple distribution and transforming the sample using the bijection.
-
-## Supported architectures
-
-We list supported NF architectures below.
-We classify architectures as either autoregressive, residual, or continuous; as defined
-by [Papamakarios et al. (2021)](https://arxiv.org/abs/1912.02762).
-We specify whether the forward and inverse passes are exact; otherwise they are numerical or not implemented (Planar,
-Radial, and Sylvester flows).
-An exact forward pass guarantees exact density estimation, whereas an exact inverse pass guarantees exact sampling.
-Note that the directions can always be reversed, which enables exact computation for the opposite task.
-We also specify whether the logarithm of the Jacobian determinant of the transformation is exact or computed numerically.
-
-| Architecture                                                           	 | Bijection type           	 | Exact forward 	 | Exact inverse | Exact log determinant |
-|--------------------------------------------------------------------------|:--------------------------:|:---------------:|:-------------:|:---------------------:|
-| [NICE](http://arxiv.org/abs/1410.8516)                              	    |      Autoregressive 	      |     ✔     	     |       ✔       |           ✔           |
-| [Real NVP](http://arxiv.org/abs/1605.08803)                         	    |      Autoregressive 	      |     ✔     	     |       ✔       |           ✔           |
-| [MAF](http://arxiv.org/abs/1705.07057)                              	    |      Autoregressive 	      |     ✔     	     |       ✔       |           ✔           |
-| [IAF](http://arxiv.org/abs/1606.04934)                              	    |      Autoregressive 	      |     ✔     	     |       ✔       |           ✔           |
-| [Rational quadratic NSF](http://arxiv.org/abs/1906.04032)           	    |      Autoregressive 	      |     ✔     	     |       ✔       |           ✔           |
-| [Linear rational NSF](http://arxiv.org/abs/2001.05168)              	    |      Autoregressive 	      |     ✔     	     |       ✔       |           ✔           |
-| [NAF](http://arxiv.org/abs/1804.00779)                              	    |      Autoregressive 	      |     ✔     	     |       ✗       |           ✔           |
-| [UMNN](http://arxiv.org/abs/1908.05164)                             	    |      Autoregressive 	      |     ✗     	     |       ✗       |           ✔           |
-| [Planar](https://onlinelibrary.wiley.com/doi/abs/10.1002/cpa.21423) 	    |      Residual       	      |     ✔     	     |       ✗       |           ✔           | 
-| [Radial](https://proceedings.mlr.press/v37/rezende15.html)          	    |      Residual       	      |     ✔     	     |       ✗       |           ✔           |
-| [Sylvester](http://arxiv.org/abs/1803.05649)                        	    |      Residual       	      |     ✔     	     |       ✗       |           ✔           |
-| [Invertible ResNet](http://arxiv.org/abs/1811.00995)                	    |      Residual       	      |     ✔     	     |       ✗       |           ✗           |
-| [ResFlow](http://arxiv.org/abs/1906.02735)                          	    |      Residual       	      |     ✔     	     |       ✗       |           ✗           |
-| [Proximal ResFlow](http://arxiv.org/abs/2211.17158)                 	    |      Residual       	      |     ✔     	     |       ✗       |           ✗           |
-| [FFJORD](http://arxiv.org/abs/1810.01367)                           	    |      Continuous     	      |     ✗     	     |       ✗       |           ✗           |
-| [RNODE](http://arxiv.org/abs/2002.02798)                            	    |      Continuous     	      |     ✗     	     |       ✗       |           ✗           |
-| [DDNF](http://arxiv.org/abs/1810.03256)                             	    |      Continuous     	      |     ✗     	     |       ✗       |           ✗           |
-| [OT flow](http://arxiv.org/abs/2006.00104)                          	    |      Continuous     	      |     ✗     	     |       ✗       |           ✗           |
-
-
-We also support simple bijections (all with exact forward passes, inverse passes, and log determinants):
-
-* Permutation
-* Elementwise translation (shift vector)
-* Elementwise scaling (diagonal matrix)
-* Rotation (orthogonal matrix)
-* Triangular matrix
-* Dense matrix (using the QR or LU decomposition)
diff --git a/docs/source/guides/mathematical_background.rst b/docs/source/guides/mathematical_background.rst
new file mode 100644
index 0000000..cb4dd94
--- /dev/null
+++ b/docs/source/guides/mathematical_background.rst
@@ -0,0 +1,16 @@
+What is a normalizing flow
+==========================
+
+A normalizing flow (NF) is a flexible trainable distribution.
+It is defined as a bijective transformation of a simple distribution, such as a standard Gaussian.
+The bijection is typically an invertible neural network.
+Training a NF using a dataset means optimizing the bijection's parameters to make the dataset likely under the NF.
+We can use a NF to compute the probability of a data point or to independently sample data from the process that
+generated our dataset.
+
+The density of a NF :math:`q(x)` with the bijection :math:`f(z) = x` and base distribution :math:`p(z)` is defined as:
+
+.. math::
+    \log q(x) = \log p(f^{-1}(x)) + \log\left|\det J_{f^{-1}}(x)\right|.
+
+Sampling from a NF means sampling from the simple distribution and transforming the sample using the bijection.
diff --git a/docs/source/guides/usage.rst b/docs/source/guides/usage.rst
index fd8cd2c..9b91ff6 100644
--- a/docs/source/guides/usage.rst
+++ b/docs/source/guides/usage.rst
@@ -5,7 +5,8 @@ We provide tutorials and notebooks for typical Torchflows use cases.
 
 .. toctree::
 
+    mathematical_background
     basic_usage
     event_shapes
     image_modeling
-    choosing_base_distributions
\ No newline at end of file
+    choosing_base_distributions
diff --git a/docs/source/index.rst b/docs/source/index.rst
index bb5696f..606b626 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -12,6 +12,9 @@ It implements many normalizing flow architectures and their building blocks for:
 * easy use of normalizing flows as trainable distributions;
 * easy implementation of new normalizing flows.
 
+Torchflows is structured according to the review paper `Normalizing Flows for Probabilistic Modeling and Inference <(https://arxiv.org/abs/1912.02762)>`_ by Papamakarios et al. (2021), which classifies flow architectures as autoregressive, residual, or continuous.
+Visit the `Github page <https://github.com/davidnabergoj/torchflows>`_ to keep up to date and post any questions or issues `here <https://github.com/davidnabergoj/torchflows/issues>`_.
+
 Installing
 ---------------
 Torchflows can be installed easily using pip: