diff --git a/docs/source/guide/guide.md b/docs/source/guide/guide.md
index 40528efa7..d75a3571c 100644
--- a/docs/source/guide/guide.md
+++ b/docs/source/guide/guide.md
@@ -8,11 +8,12 @@ core-concepts.md
 zne.md
 pec.md
 cdr.md
-shadows.md
 ddd.md
+lre.md
 rem.md
 qse.md
 pt.md
+shadows.md
 error-mitigation.md
 glossary.md
 ```
diff --git a/docs/source/guide/lre-5-theory.md b/docs/source/guide/lre-5-theory.md
new file mode 100644
index 000000000..0bf40c810
--- /dev/null
+++ b/docs/source/guide/lre-5-theory.md
@@ -0,0 +1,87 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.11.4
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+
+```{warning}
+The user guide for LRE in Mitiq is currently under construction.
+```
+
+# What is the theory behind LRE?
+
+Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
+{cite}`Russo_2024_LRE` extends the ideas found in ZNE by allowing users to create multiple noise-scaled variations of the input
+circuit such that the noiseless expectation value is extrapolated from the execution of each
+noisy circuit.
+
+Similar to [ZNE](zne.md), this process works in two steps:
+
+- **Step 1:** Intentionally create multiple noise-scaled but logically equivalent circuits by scaling each layer or chunk of the input circuit through unitary folding.
+
+- **Step 2:** Extrapolate to the noiseless limit using multivariate richardson extrapolation.
+
+LRE leverages the flexible configuration space of layerwise unitary folding,
+allowing for a more nuanced mitigation of errors by treating the noise level of each layer of
+the quantum circuit as an independent variable.
+
+## Step 1: Intentionally create multiple noise-scaled but logically equivalent circuits
+
+The goal is to create noise-scaled circuits of different depths where the layers in each circuit are scaled in
+a specific pattern as a result of unitary folding. This pattern is often described by the vector of scale factor vectors
+generated by the fold multiplier and the chosen degree for multivariate Richardson extrapolation polynomial. For more information
+on unitary folding, go to [What is the theory behind ZNE?](zne-5-theory.md).
+
+Suppose we're interested in the value of some observable in an $n$-qubit circuit with $l$ layers.
+
+Each layer can have a different scale factor and we can create $M$ such variations of the scaled circuit. Let $\{λ_1, λ_2, λ_3, \ldots, λ_M\}$ be the scale factors vectors used to create multiple variations of the noise-scaled circuits $\{C_{λ_1}, C_{λ_2}, C_{λ_3}, \ldots, C_{λ_M}\}$ such that each vector $λ_i$ defines the scale factors for the different layers in the input circuit $\{{λ^1}_i, {λ^2}_i, {λ^3}_i, \ldots, {λ^l}_i\}^T$.
+
+If $d$ is the chosen degree of our multivariate polynomial, $M_j(λ_i, d)$ corresponds to the terms in the polynomial arranged in increasing order. In general, the monomial terms for a variable $l$ up to degree $d$ can be determined through the [stars and bars method](https://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29).
+
+$$
+\text{total number of terms in the monomial basis with max degree } d = \binom{d + l}{d}
+$$
+
+$$
+\text{number of terms in the monomial basis with total degree } d = \binom{d + l - 1}{d}
+$$
+
+These monomial terms define the rows of the square sample matrix as shown below:
+
+$$
+\mathbf{A}(\Lambda, d) = 
+\begin{bmatrix}
+    M_1(λ_1, d) & M_2(λ_1, d) & \cdots & M_N(λ_1, d) \\
+    M_1(λ_2, d) & M_2(λ_2, d) & \cdots & M_N(λ_2, d) \\
+    \vdots & \vdots & \ddots & \vdots \\
+    M_1(λ_N, d) & M_2(λ_N, d) & \cdots & M_N(λ_N, d)
+\end{bmatrix}
+$$
+
+Each monomial term in the sample matrix $\mathbf{A}$ is evaluated using the values in the scale factor vectors. In Step 2, we aim to define $O_{\mathrm{LRE}}$ as a linear combination of the noisy expectation values.
+
+Finding the coefficients in the linear combination becomes a problem solvable through a system of linear equations $\mathbf{A} c = z$ where $c$ is the coefficients vector $(\eta_1, \eta_2, \ldots, \eta_N)^T$, $z$ is the vector of the noisy expectation values and $\mathbf{A}$ is the sample matrix evaluated using the values in the scale factor vectors.
+
+## Step 2: Extrapolate to the noiseless limit
+
+Each noise scaled circuit $C_{λ_i}$ has an expectation value $\langle O(λ_i) \rangle$ associated with it such that we can define a vector of the noisy expectation values $z = (\langle O(λ_1) \rangle, \langle O(λ_2) \rangle, \ldots, \langle O(λ_M)\rangle)^T$. These have a coefficient of linear combination associated with them as shown below: 
+
+$$
+O_{\mathrm{LRE}} = \sum_{i=1}^{M} \eta_i \langle O(λ_i) \rangle.
+$$
+
+The system of linear equations is used to find the numerous $\eta_i$ in vector $c$. As we only need to find the noiseless expectation value, we can skip calculating the full vector of linear combination coefficients if we use the [Lagrange interpolation formula](https://files.eric.ed.gov/fulltext/EJ1231189.pdf) evaluated at $λ = 0$.
+
+$$
+O_{\rm LRE} = \sum_{i=1}^M \langle O (\boldsymbol{\lambda}_i)\rangle  \frac{\det \left(\mathbf{M}_i (\boldsymbol{0}) \right)}{\det \left(\mathbf{A}\right)}.
+$$
+
+To get the matrix $\mathbf{M}_i(\mathbf{0})$, replace the $i$-th row of the sample matrix $\mathbf{A}$ by $\mathbf{e}_1=(1, 0, \ldots, 0)^T$ where except $M_1(0, d) = 1$ all the other monomial terms are zero.
diff --git a/docs/source/guide/lre.md b/docs/source/guide/lre.md
new file mode 100644
index 000000000..410bef69c
--- /dev/null
+++ b/docs/source/guide/lre.md
@@ -0,0 +1,23 @@
+
+```{warning}
+The user guide for LRE in Mitiq is currently under construction.
+```
+
+# Layerwise Richardson Extrapolation
+
+Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
+{cite}`Russo_2024_LRE` works by creating multiple noise-scaled variations of the input
+circuit such that the noiseless expectation value is extrapolated from the execution of each
+noisy circuit (see the section [What is the theory behind LRE?](lre-5-theory.md)). Compared to
+Zero-Noise Extrapolation, this technique treats the noise in each layer of the circuit
+as an independent variable to be scaled and then extrapolated independently.
+ 
+
+You can get started with LRE in Mitiq with the following sections of the user guide:
+
+```{toctree}
+---
+maxdepth: 1
+---
+lre-5-theory.md
+```
\ No newline at end of file