google-deepmind · erez-tom · Mar 3, 2024 · Mar 3, 2024 · Mar 3, 2024 · Mar 3, 2024
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ Google DeepMind.
 MJPC allows the user to easily author and solve complex robotics tasks, and
 currently supports multiple shooting-based planners. Derivative-based methods include iLQG and
 Gradient Descent, while derivative-free methods include a simple yet very competitive planner
-called Predictive Sampling and the Cross Entropy Method (with diagonal covariance).
+called Predictive Sampling.
 
 - [Overview](#overview)
 - [Graphical User Interface](#graphical-user-interface)
@@ -96,38 +96,39 @@ sudo apt-get install libgl1-mesa-dev libxinerama-dev libxcursor-dev libxrandr-de
 
 ### Build Issues
 If you encounter build issues, please see the
-[Github Actions configuration](https://github.com/google-deepmind/mujoco_mpc/blob/main/.github/workflows/build.yml).
-Note, we are using `clang-14`.
+[Github Actions configuration](https://github.com/google-deepmind/mujoco_mpc/blob/main/.github/workflows/build.yml). 
+This provides the exact setup we use for building MJPC for testing. 
+
+We recommend building with `clang` and not `gcc`.
 
 # Python API
 
 We provide a simple Python API for MJPC. This API is still experimental and expects some more experience from its users. For example, the correct usage requires that the model (defined in Python) and the MJPC task (i.e., the residual and transition functions defined in C++) are compatible with each other. Currently, the Python API does not provide any particular error handling for verifying this compatibility and may be difficult to debug without more in-depth knowledge about mujoco and MJPC.
 
-- [agent.py](python/mujoco_mpc/agent.py) for available methods for planning.
-
-- [filter.py](python/mujoco_mpc/filter.py) for available methods for state estimation.
+## Installing via Pip
+First, build MJPC (see above).
 
-- [direct.py](python/mujoco_mpc/direct.py) for available methods for direct optimization.
+Next, change to the python directory:
+```sh
+cd python
+```
 
-## Installing via Pip
-First, build MJPC (see above), then the MJPC Python module can be installed with:
+Install the Python module:
 ```sh
-python "${MUJOCO_MPC_ROOT}/python/${API}.py" install
+python setup.py install
 ```
 
 Test that installation was successful:
 ```sh
-python "${MUJOCO_MPC_ROOT}/python/mujoco_mpc/${API_TEST}.py"
+python "mujoco_mpc/agent_test.py"
 ```
 
-where API(_TEST) can be: agent(_test), filter(_test), or direct(_test). Additionally, the [Python version of MuJoCo](https://pypi.org/project/mujoco/#history) must match the MJPC version (this information can be found in the terminal while the application is running).
+Additionally, the [Python version of MuJoCo](https://pypi.org/project/mujoco/#history) should match the MJPC version (this information can be found in the terminal while the application is running).
 
 
 ## Example Usage
 See [cartpole.py](python/mujoco_mpc/demos/agent/cartpole.py) for example usage for planning.
 
-See [cartpole_trajopt.py](python/mujoco_mpc/demos/direct/cartpole_trajopt.py) for usage for direct optimization.
-
 ## Predictive Control
 
 See the [Predictive Control](docs/OVERVIEW.md) documentation for more

diff --git a/docs/DIRECT.md b/docs/DIRECT.md
@@ -0,0 +1,103 @@
+# Direct optimizer
+
+**Table of Contents**
+
+- [Direct optimizer](#direct-optimizer)
+  - [Cost Function](#cost-function)
+  - [Cost Derivatives](#cost-derivatives)
+  - [Reference](#reference)
+  - [API](#api)
+
+## Cost Function
+```math
+\begin{aligned}
+    \underset{q_{0:T}}{\text{minimize }} & \quad \sum_{t = 1}^{T - 1} \Big(\sum_{i = 1}^{S} w_s^{(i)} \textbf{n}^{(i)}(s^{(i)}(q_t, v_t, a_t, u_t) - y_t^{(i)}) + \frac{1}{2} \lVert g(q_t, v_t, a_t, u_t) - \tau_t \rVert_{\textbf{diag}(w_{g})}^2 \Big) \\
+    \text{subject to} & \quad v_t = (q_t - q_{t - 1}) / h\\
+    & \quad a_t = (q_{t + 1} - 2 q_t + q_{t - 1}) / h^2 = (v_{t+1} - v_t) / h,
+\end{aligned}
+```
+
+The constraints are handled implicitly. Velocities and accelerations are computed using finite-difference approximations from the configuration decision variables.
+
+**Variables**
+- $q \in \mathbf{R}^{n_q}$: configuration ```[qpos]```
+- $v \in \mathbf{R}^{n_v}$: velocity ```[qvel]```
+- $a \in \mathbf{R}^{n_v}$: accelerations ```[qacc]```
+- $u \in \mathbf{R}^{n_u}$: action ```[ctrl]```
+- $y \in \mathbf{R}^{n_s}$: sensor measurement ```[sensordata]```
+- $\tau \in \mathbf{R}^{n_v}$: inverse dynamics force ```[qfrc_actuator]```
+- $h \in \mathbf{R}_{+}\,$: time step ```[timestep]```
+- $T$: estimation horizon
+- $t$: discrete time step
+
+**Models**
+- $s : \mathbf{R}^{n_q} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_s}$: sensor model
+- $g : \mathbf{R}^{n_q} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_v}$: inverse dynamics model
+- $\textbf{n}: \mathbf{R}^{n} \rightarrow \mathbf{R}_{+}$: user-specified [convex norm](../mjpc/norm.h)
+
+**Weights**
+
+Sensors:
+- $w_s \in \mathbf{R}_{+}^{S}$:
+
+Forces:
+- $w_g \in \mathbf{R}_{+}^{n_v}$:
+
+Computed with user inputs:
+- $\sigma_s \in \mathbf{R}_{+}^S$: sensor noise
+- $\sigma_g \in \mathbf{R}_{+}^{n_v}$: process noise
+
+Rescaled:
+- $w_s^{(i)} = p / (\sigma_s^{(i)} \cdot n_s^{(i)} \cdot (T - 1))$
+```math
+p = \begin{cases} h^2 & {\texttt{settings.time\_scaling \& acceleration sensor}} \\ h & {\texttt{settings.time\_scaling \& velocity sensor}} \\ 1 & {\texttt{else}}\end{cases}
+```
+- $w_g^{(i)} = p / (\sigma_g^{(i)} \cdot n_v \cdot (T - 1))$
+```math
+p = \begin{cases} h^2 & {\texttt{settings.time\_scaling}} \\ 1 & {\texttt{else}} \end{cases}
+```
+
+## Cost Derivatives
+
+**Residuals**
+
+Sensor residual
+
+$r_s = \Big(s^{(0)}(q_1, v_1, a_1, u_1) - y_1^{(0)}, \dots, s^{(i)}(q_t, v_t, a_t, u_t) - y_t^{(i)}, \dots, s^{(S)}(q_{T - 1}, v_{T - 1}, a_{T - 1}, u_{T - 1}) - y_{T -1 }^{(S)}\Big) \in \mathbf{R}^{n_s (T - 2)}$
+
+Force residual
+
+$r_g = \Big(g(q_1, v_1, a_1, u_1) - \tau_1, \dots, g(q_t, v_t, a_t, u_t) - \tau_t, \dots, g(q_{T - 1}, v_{T - 1}, a_{T - 1}, u_{T - 1}) - \tau_{T - 1}\Big) \in \mathbf{R}^{n_v (T - 2)}$
+
+**Gradient**
+
+The gradient of the cost $c$ with respect to the configuration trajectory $q_{0:T}$:
+
+$d c/ d q_{0:T} = J_s^T N_s + J_g^T N_g$
+
+where
+
+- $J_s = dr_s / d q_{0:T} \in \mathbf{R}^{n_s (T - 2) \times n_v T}$: Jacobian of sensor residual with respect to configuration trajectory
+- $J_g = dr_g / d q_{0:T} \in \mathbf{R}^{n_v (T - 2) \times n_v T}$: Jacobian of force residual with respect to configuration trajectory
+- $N_s = \Big(w_s^{(0)} d \textbf{n}^{(0)} / d r_{s_1}^{(0)}, \dots, w_s^{(i)} d \textbf{n}^{(i)} / d r_{s_t}^{(i)}, \dots, w_s^{(S)}  d \textbf{n}^{(S)} / d r_{s_{T-1}}^{(S)} \Big)$
+- $N_g = \Big(\textbf{diag}(w_g) r_{g_{1}}, \dots, \textbf{diag}(w_g) r_{g_{t}} \dots, \textbf{diag}(w_g) r_{g_{T-1}} \Big)$
+
+**Hessian**
+
+The [Gauss-Newton](https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm) approximation of the cost Hessian:
+
+$d^2 c / d q_{0:T}^2 \approx J_s^T N_{ss} J_s + J_g^T N_{gg} J_g$
+
+where
+
+- $N_{ss} = \textbf{diag}\Big(w_s^{(0)} d^2 \textbf{n}^{(0)} / d (r_{s_1}^{(0)})^2, \dots, w_s^{(i)} d^2 \textbf{n}^{(i)} / d (r_{s_t}^{(i)})^2, \dots, w_s^{(S)} d^2 \textbf{n}^{(S)} / d (r_{s_{T-1}}^{(S)})^2\Big)$
+- $N_{gg} = \textbf{diag}\Big(\textbf{diag}(w_g) r_{g_1}, \dots, \textbf{diag}(w_g) r_{g_t} \dots, \textbf{diag}(w_g) r_{g_{T-1}} \Big)$
+
+This approximation: 1) is computationally less expensive compared to computing the exact Hessian 2) ensures the matrix is [positive semidefinite](https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm).
+
+## Reference
+[Physically-Consistent Sensor Fusion in Contact-Rich Behaviors](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=31c0842aa1d4808541a64014c24928123e1d949e).
+Kendall Lowrey, Svetoslav Kolev, Yuval Tassa, Tom Erez, Emo Todorov. 2014.
+
+## API
+[Direct API](../mjpc/direct/direct.h)
diff --git a/docs/ESTIMATORS.md b/docs/ESTIMATORS.md
@@ -2,139 +2,22 @@
 
 **Table of Contents**
 
-- [Batch Estimator](#batch-estimator)
-  - [Cost Function](#cost-function)
-  - [Cost Derivatives](#cost-derivatives)
-  - [Filter](#batch-filter)
-  - [Settings](#batch-settings)
-  - [Reference](#batch-reference)
-- [Extended Kalman Filter](#kalman-filter)
+- [Extended Kalman Filter](#extended-kalman-filter)
   - [Algorithm](#kalman-algorithm)
   - [Reference](#kalman-reference)
-- [Unscented Kalman Filter](#unscented-filter)
+  - [API](#kalman-api)
+- [Unscented Kalman Filter](#unscented-kalman-filter)
   - [Algorithm](#unscented-algorithm)
   - [Reference](#unscented-reference)
+  - [API](#unscented-api)
+- [Batch Estimator](#batch-estimator)
+  - [Prior](#batch-prior)
+  - [Reference](#reference)
 
-# Batch Estimator
-
-## Cost Function
-```math
-\begin{aligned}
-    \underset{q_{0:T}}{\text{minimize }} & \quad \sum_{t = 1}^{T - 1} \Big(\sum_{i = 1}^{S} w_s^{(i)} \textbf{n}^{(i)}(s^{(i)}(q_t, v_t, a_t, u_t) - y_t^{(i)}) + \frac{1}{2} \lVert g(q_t, v_t, a_t, u_t) - \tau_t \rVert_{\textbf{diag}(w_{g})}^2 \Big) \\
-    \text{subject to} & \quad v_t = (q_t - q_{t - 1}) / h\\
-    & \quad a_t = (q_{t + 1} - 2 q_t + q_{t - 1}) / h^2 = (v_{t+1} - v_t) / h,
-\end{aligned}
-```
-
-The constraints are handled implicitly. Velocities and accelerations are computed using finite-difference approximations from the configuration decision variables.
-
-**Variables**
-- $q \in \mathbf{R}^{n_q}$: configuration ```[qpos]```
-- $v \in \mathbf{R}^{n_v}$: velocity ```[qvel]```
-- $a \in \mathbf{R}^{n_v}$: accelerations ```[qacc]```
-- $u \in \mathbf{R}^{n_u}$: action ```[ctrl]```
-- $y \in \mathbf{R}^{n_s}$: sensor measurement ```[sensordata]```
-- $\tau \in \mathbf{R}^{n_v}$: inverse dynamics force ```[qfrc_actuator]```
-- $h \in \mathbf{R}_{+}\,$: time step ```[timestep]```
-- $T$: estimation horizon
-- $t$: discrete time step
-
-**Models**
-- $s : \mathbf{R}^{n_q} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_s}$: sensor model
-- $g : \mathbf{R}^{n_q} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_v}$: inverse dynamics model
-- $\textbf{n}: \mathbf{R}^{n} \rightarrow \mathbf{R}_{+}$: user-specified [convex norm](../mjpc/norm.h)
-
-**Weights**
-
-Sensors:
-- $w_s \in \mathbf{R}_{+}^{S}$:
-
-Forces:
-- $w_g \in \mathbf{R}_{+}^{n_v}$:
-
-Computed with user inputs:
-- $\sigma_s \in \mathbf{R}_{+}^S$: sensor noise
-- $\sigma_g \in \mathbf{R}_{+}^{n_v}$: process noise
-
-Rescaled:
-- $w_s^{(i)} = p / (\sigma_s^{(i)} \cdot n_s^{(i)} \cdot (T - 1))$
-```math
-p = \begin{cases} h^2 & {\texttt{settings.time\_scaling \& acceleration sensor}} \\ h & {\texttt{settings.time\_scaling \& velocity sensor}} \\ 1 & {\texttt{else}}\end{cases}
-```
-- $w_g^{(i)} = p / (\sigma_g^{(i)} \cdot n_v \cdot (T - 1))$
-```math
-p = \begin{cases} h^2 & {\texttt{settings.time\_scaling}} \\ 1 & {\texttt{else}} \end{cases}
-```
-
-## Cost Derivatives
-
-**Residuals**
-
-Sensor residual
-
-$r_s = \Big(s^{(0)}(q_1, v_1, a_1, u_1) - y_1^{(0)}, \dots, s^{(i)}(q_t, v_t, a_t, u_t) - y_t^{(i)}, \dots, s^{(S)}(q_{T - 1}, v_{T - 1}, a_{T - 1}, u_{T - 1}) - y_{T -1 }^{(S)}\Big) \in \mathbf{R}^{n_s (T - 2)}$
-
-Force residual
-
-$r_g = \Big(g(q_1, v_1, a_1, u_1) - \tau_1, \dots, g(q_t, v_t, a_t, u_t) - \tau_t, \dots, g(q_{T - 1}, v_{T - 1}, a_{T - 1}, u_{T - 1}) - \tau_{T - 1}\Big) \in \mathbf{R}^{n_v (T - 2)}$
-
-**Gradient**
-
-The gradient of the cost $c$ with respect to the configuration trajectory $q_{0:T}$:
-
-$d c/ d q_{0:T} = J_s^T N_s + J_g^T N_g$
-
-where
-
-- $J_s = dr_s / d q_{0:T} \in \mathbf{R}^{n_s (T - 2) \times n_v T}$: Jacobian of sensor residual with respect to configuration trajectory
-- $J_g = dr_g / d q_{0:T} \in \mathbf{R}^{n_v (T - 2) \times n_v T}$: Jacobian of force residual with respect to configuration trajectory
-- $N_s = \Big(w_s^{(0)} d \textbf{n}^{(0)} / d r_{s_1}^{(0)}, \dots, w_s^{(i)} d \textbf{n}^{(i)} / d r_{s_t}^{(i)}, \dots, w_s^{(S)}  d \textbf{n}^{(S)} / d r_{s_{T-1}}^{(S)} \Big)$
-- $N_g = \Big(\textbf{diag}(w_g) r_{g_{1}}, \dots, \textbf{diag}(w_g) r_{g_{t}} \dots, \textbf{diag}(w_g) r_{g_{T-1}} \Big)$
-
-**Hessian**
-
-The [Gauss-Newton](https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm) approximation of the cost Hessian:
-
-$d^2 c / d q_{0:T}^2 \approx J_s^T N_{ss} J_s + J_g^T N_{gg} J_g$
-
-where
-
-- $N_{ss} = \textbf{diag}\Big(w_s^{(0)} d^2 \textbf{n}^{(0)} / d (r_{s_1}^{(0)})^2, \dots, w_s^{(i)} d^2 \textbf{n}^{(i)} / d (r_{s_t}^{(i)})^2, \dots, w_s^{(S)} d^2 \textbf{n}^{(S)} / d (r_{s_{T-1}}^{(S)})^2\Big)$
-- $N_{gg} = \textbf{diag}\Big(\textbf{diag}(w_g) r_{g_1}, \dots, \textbf{diag}(w_g) r_{g_t} \dots, \textbf{diag}(w_g) r_{g_{T-1}} \Big)$
-
-This approximation: 1) is computationally less expensive compared to computing the exact Hessian 2) ensures the matrix is [positive semidefinite](https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm).
-
-## Filter
-
-An additional *prior* cost
-
-```math
-\frac{1}{2} (q_{0:T} - \bar{q}_{0:T})^T P (q_{0:T} - \bar{q}_{0:T})
-```
-
-with $P \in \mathbf{S}_{++}^{n_v  T}$ and overbar denoting a reference configuration is added to the cost when filtering.
-
-The prior weights
-```math
-P_{0:T} = \begin{bmatrix} P_{[0:t, 0:t]} & P_{[0:t, t+1:T]} \\ P_{[t+1:T, 0:t]} & P_{[t+1:T, t+1:T]} \end{bmatrix}
-```
-
-are recursively updated by conditioning on a subset of the weights
-```math
-P_{t+1:T | 0:t} = P_{[t+1:T, t+1:T]} - P_{[t+1:T, 0:t]} P_{[0:t, 0:t]}^{-1} P_{[0:t, t+1:T]}
-```
-
-## Settings
-
-## Reference
-[Physically-Consistent Sensor Fusion in Contact-Rich Behaviors](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=31c0842aa1d4808541a64014c24928123e1d949e).
-Kendall Lowrey, Svetoslav Kolev, Yuval Tassa, Tom Erez, Emo Todorov. 2014.
-
-[Batch API](../mjpc/estimators/batch.h)
 
 # Extended Kalman Filter
 
-## Algorithm
+## Kalman Algorithm
 
 ### Prediction Update
 
@@ -168,18 +51,19 @@ P_t &\mathrel{+}= P_t C_t^T (C_t P_t C_t^T + R)^{-1} (C_t * P_t)
 - $f: \mathbf{R}^{n_q + n_v + n_a} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_q + n_v + n_a}$: forward dynamics
 - $s: \mathbf{R}^{n_q + n_v + n_a} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_s}$: sensor model
 
-## Reference
+## Kalman Reference
 [A New Approach to Linear Filtering and Prediction Problems](https://www.cs.unc.edu/~welch/kalman/media/pdf/Kalman1960.pdf).
 Rudolph Kalman. 1960.
 
 [Application Of Statistical Filter Theory To The Optimal Estimation Of Position And Velocity On Board A Circumlunar Vehicle](https://archive.org/details/nasa_techdoc_19620006857/page/n31/mode/2up).
 Gerald Smith, Stanley Schmidt, Leonard McGee. 1962.
 
+## Kalman API
 [Kalman API](../mjpc/estimators/kalman.h)
 
 # Unscented Kalman Filter
 
-## Algorithm
+## Unscented Algorithm
 
 **Sigma points**
 
@@ -254,12 +138,43 @@ P_{t+1} &= \Sigma_{xx} - \Sigma_{xs} \Sigma_{ss}^{-1} \Sigma_{xs}^T
 - $w_c^{(0)} = w_m^{(0)} + 1 - \alpha^2 + \beta$: covariance weight
 - $w_m^{(i)} = w_c^{(i)} = 1 / (2 (n_{dx} + \lambda))$: weights
 
-### Settings
+### Unscented Settings
 - $\alpha$: controls spread of sigma points around mean
 - $\beta$: encodes prior information (Gaussian: $\beta = 2$)
 
-## Reference
+## Unscented Reference
 [Unscented Filtering and Nonlinear Estimation](https://www.cs.ubc.ca/~murphyk/Papers/Julier_Uhlmann_mar04.pdf).
 Simon Julier, Jeffrey Uhulmann. 2004.
 
+## Unscented API
 [Unscented API](../mjpc/estimators/unscented.h)
+
+# Batch Estimator
+
+This estimator utilizes the [direct optimizer](DIRECT.md) to formulate a [fixed-lag smoother](https://en.wikipedia.org/wiki/Kalman_filter#Fixed-lag_smoother).
+
+## Batch Prior
+
+An additional *prior* cost
+
+```math
+\frac{1}{2} (q_{0:T} - \bar{q}_{0:T})^T P (q_{0:T} - \bar{q}_{0:T})
+```
+
+with $P \in \mathbf{S}_{++}^{n_v  T}$ and overbar denoting a reference configuration is added to the cost.
+
+The prior weights
+```math
+P_{0:T} = \begin{bmatrix} P_{[0:t, 0:t]} & P_{[0:t, t+1:T]} \\ P_{[t+1:T, 0:t]} & P_{[t+1:T, t+1:T]} \end{bmatrix}
+```
+
+are recursively updated by conditioning on a subset of the weights
+```math
+P_{t+1:T | 0:t} = P_{[t+1:T, t+1:T]} - P_{[t+1:T, 0:t]} P_{[0:t, 0:t]}^{-1} P_{[0:t, t+1:T]}
+```
+
+## Batch Reference
+[Physically-Consistent Sensor Fusion in Contact-Rich Behaviors](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=31c0842aa1d4808541a64014c24928123e1d949e).
+Kendall Lowrey, Svetoslav Kolev, Yuval Tassa, Tom Erez, Emo Todorov. 2014.
+
+[Batch API](../mjpc/estimators/batch.h)