From d2d1496d582fc14dbd277cc9da1b23a1781b78d5 Mon Sep 17 00:00:00 2001
From: H <houman.kamali@outlook.com>
Date: Thu, 10 Mar 2022 21:48:38 -0800
Subject: [PATCH] Sampling of VPG should be over D*T

---
 docs/spinningup/rl_intro3.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/spinningup/rl_intro3.rst b/docs/spinningup/rl_intro3.rst
index 34e4d5d57..43f3cad36 100644
--- a/docs/spinningup/rl_intro3.rst
+++ b/docs/spinningup/rl_intro3.rst
@@ -91,7 +91,7 @@ This is an expectation, which means that we can estimate it with a sample mean.
 
 .. math::
 
-    \hat{g} = \frac{1}{|\mathcal{D}|} \sum_{\tau \in \mathcal{D}} \sum_{t=0}^{T} \nabla_{\theta} \log \pi_{\theta}(a_t |s_t) R(\tau),
+    \hat{g} = \frac{1}{|\mathcal{D*T}|} \sum_{\tau \in \mathcal{D}} \sum_{t=0}^{T} \nabla_{\theta} \log \pi_{\theta}(a_t |s_t) R(\tau),
 
 where :math:`|\mathcal{D}|` is the number of trajectories in :math:`\mathcal{D}` (here, :math:`N`).
 
@@ -474,4 +474,4 @@ In this chapter, we described the basic theory of policy gradient methods and co
 .. _`advantage of an action`: ../spinningup/rl_intro.html#advantage-functions
 .. _`this page`: ../spinningup/extra_pg_proof2.html
 .. _`Generalized Advantage Estimation`: https://arxiv.org/abs/1506.02438
-.. _`Vanilla Policy Gradient`: ../algorithms/vpg.html
\ No newline at end of file
+.. _`Vanilla Policy Gradient`: ../algorithms/vpg.html