Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #304

Merged
merged 3 commits into from
Mar 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Google DeepMind.
MJPC allows the user to easily author and solve complex robotics tasks, and
currently supports multiple shooting-based planners. Derivative-based methods include iLQG and
Gradient Descent, while derivative-free methods include a simple yet very competitive planner
called Predictive Sampling and the Cross Entropy Method (with diagonal covariance).
called Predictive Sampling.

- [Overview](#overview)
- [Graphical User Interface](#graphical-user-interface)
Expand Down Expand Up @@ -96,38 +96,39 @@ sudo apt-get install libgl1-mesa-dev libxinerama-dev libxcursor-dev libxrandr-de

### Build Issues
If you encounter build issues, please see the
[Github Actions configuration](https://github.com/google-deepmind/mujoco_mpc/blob/main/.github/workflows/build.yml).
Note, we are using `clang-14`.
[Github Actions configuration](https://github.com/google-deepmind/mujoco_mpc/blob/main/.github/workflows/build.yml).
This provides the exact setup we use for building MJPC for testing.

We recommend building with `clang` and not `gcc`.

# Python API

We provide a simple Python API for MJPC. This API is still experimental and expects some more experience from its users. For example, the correct usage requires that the model (defined in Python) and the MJPC task (i.e., the residual and transition functions defined in C++) are compatible with each other. Currently, the Python API does not provide any particular error handling for verifying this compatibility and may be difficult to debug without more in-depth knowledge about mujoco and MJPC.

- [agent.py](python/mujoco_mpc/agent.py) for available methods for planning.

- [filter.py](python/mujoco_mpc/filter.py) for available methods for state estimation.
## Installing via Pip
First, build MJPC (see above).

- [direct.py](python/mujoco_mpc/direct.py) for available methods for direct optimization.
Next, change to the python directory:
```sh
cd python
```

## Installing via Pip
First, build MJPC (see above), then the MJPC Python module can be installed with:
Install the Python module:
```sh
python "${MUJOCO_MPC_ROOT}/python/${API}.py" install
python setup.py install
```

Test that installation was successful:
```sh
python "${MUJOCO_MPC_ROOT}/python/mujoco_mpc/${API_TEST}.py"
python "mujoco_mpc/agent_test.py"
```

where API(_TEST) can be: agent(_test), filter(_test), or direct(_test). Additionally, the [Python version of MuJoCo](https://pypi.org/project/mujoco/#history) must match the MJPC version (this information can be found in the terminal while the application is running).
Additionally, the [Python version of MuJoCo](https://pypi.org/project/mujoco/#history) should match the MJPC version (this information can be found in the terminal while the application is running).


## Example Usage
See [cartpole.py](python/mujoco_mpc/demos/agent/cartpole.py) for example usage for planning.

See [cartpole_trajopt.py](python/mujoco_mpc/demos/direct/cartpole_trajopt.py) for usage for direct optimization.

## Predictive Control

See the [Predictive Control](docs/OVERVIEW.md) documentation for more
Expand Down
103 changes: 103 additions & 0 deletions docs/DIRECT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Direct optimizer

**Table of Contents**

- [Direct optimizer](#direct-optimizer)
- [Cost Function](#cost-function)
- [Cost Derivatives](#cost-derivatives)
- [Reference](#reference)
- [API](#api)

## Cost Function
```math
\begin{aligned}
\underset{q_{0:T}}{\text{minimize }} & \quad \sum_{t = 1}^{T - 1} \Big(\sum_{i = 1}^{S} w_s^{(i)} \textbf{n}^{(i)}(s^{(i)}(q_t, v_t, a_t, u_t) - y_t^{(i)}) + \frac{1}{2} \lVert g(q_t, v_t, a_t, u_t) - \tau_t \rVert_{\textbf{diag}(w_{g})}^2 \Big) \\
\text{subject to} & \quad v_t = (q_t - q_{t - 1}) / h\\
& \quad a_t = (q_{t + 1} - 2 q_t + q_{t - 1}) / h^2 = (v_{t+1} - v_t) / h,
\end{aligned}
```

The constraints are handled implicitly. Velocities and accelerations are computed using finite-difference approximations from the configuration decision variables.

**Variables**
- $q \in \mathbf{R}^{n_q}$: configuration ```[qpos]```
- $v \in \mathbf{R}^{n_v}$: velocity ```[qvel]```
- $a \in \mathbf{R}^{n_v}$: accelerations ```[qacc]```
- $u \in \mathbf{R}^{n_u}$: action ```[ctrl]```
- $y \in \mathbf{R}^{n_s}$: sensor measurement ```[sensordata]```
- $\tau \in \mathbf{R}^{n_v}$: inverse dynamics force ```[qfrc_actuator]```
- $h \in \mathbf{R}_{+}\,$: time step ```[timestep]```
- $T$: estimation horizon
- $t$: discrete time step

**Models**
- $s : \mathbf{R}^{n_q} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_s}$: sensor model
- $g : \mathbf{R}^{n_q} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_v}$: inverse dynamics model
- $\textbf{n}: \mathbf{R}^{n} \rightarrow \mathbf{R}_{+}$: user-specified [convex norm](../mjpc/norm.h)

**Weights**

Sensors:
- $w_s \in \mathbf{R}_{+}^{S}$:

Forces:
- $w_g \in \mathbf{R}_{+}^{n_v}$:

Computed with user inputs:
- $\sigma_s \in \mathbf{R}_{+}^S$: sensor noise
- $\sigma_g \in \mathbf{R}_{+}^{n_v}$: process noise

Rescaled:
- $w_s^{(i)} = p / (\sigma_s^{(i)} \cdot n_s^{(i)} \cdot (T - 1))$
```math
p = \begin{cases} h^2 & {\texttt{settings.time\_scaling \& acceleration sensor}} \\ h & {\texttt{settings.time\_scaling \& velocity sensor}} \\ 1 & {\texttt{else}}\end{cases}
```
- $w_g^{(i)} = p / (\sigma_g^{(i)} \cdot n_v \cdot (T - 1))$
```math
p = \begin{cases} h^2 & {\texttt{settings.time\_scaling}} \\ 1 & {\texttt{else}} \end{cases}
```

## Cost Derivatives

**Residuals**

Sensor residual

$r_s = \Big(s^{(0)}(q_1, v_1, a_1, u_1) - y_1^{(0)}, \dots, s^{(i)}(q_t, v_t, a_t, u_t) - y_t^{(i)}, \dots, s^{(S)}(q_{T - 1}, v_{T - 1}, a_{T - 1}, u_{T - 1}) - y_{T -1 }^{(S)}\Big) \in \mathbf{R}^{n_s (T - 2)}$

Force residual

$r_g = \Big(g(q_1, v_1, a_1, u_1) - \tau_1, \dots, g(q_t, v_t, a_t, u_t) - \tau_t, \dots, g(q_{T - 1}, v_{T - 1}, a_{T - 1}, u_{T - 1}) - \tau_{T - 1}\Big) \in \mathbf{R}^{n_v (T - 2)}$

**Gradient**

The gradient of the cost $c$ with respect to the configuration trajectory $q_{0:T}$:

$d c/ d q_{0:T} = J_s^T N_s + J_g^T N_g$

where

- $J_s = dr_s / d q_{0:T} \in \mathbf{R}^{n_s (T - 2) \times n_v T}$: Jacobian of sensor residual with respect to configuration trajectory
- $J_g = dr_g / d q_{0:T} \in \mathbf{R}^{n_v (T - 2) \times n_v T}$: Jacobian of force residual with respect to configuration trajectory
- $N_s = \Big(w_s^{(0)} d \textbf{n}^{(0)} / d r_{s_1}^{(0)}, \dots, w_s^{(i)} d \textbf{n}^{(i)} / d r_{s_t}^{(i)}, \dots, w_s^{(S)} d \textbf{n}^{(S)} / d r_{s_{T-1}}^{(S)} \Big)$
- $N_g = \Big(\textbf{diag}(w_g) r_{g_{1}}, \dots, \textbf{diag}(w_g) r_{g_{t}} \dots, \textbf{diag}(w_g) r_{g_{T-1}} \Big)$

**Hessian**

The [Gauss-Newton](https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm) approximation of the cost Hessian:

$d^2 c / d q_{0:T}^2 \approx J_s^T N_{ss} J_s + J_g^T N_{gg} J_g$

where

- $N_{ss} = \textbf{diag}\Big(w_s^{(0)} d^2 \textbf{n}^{(0)} / d (r_{s_1}^{(0)})^2, \dots, w_s^{(i)} d^2 \textbf{n}^{(i)} / d (r_{s_t}^{(i)})^2, \dots, w_s^{(S)} d^2 \textbf{n}^{(S)} / d (r_{s_{T-1}}^{(S)})^2\Big)$
- $N_{gg} = \textbf{diag}\Big(\textbf{diag}(w_g) r_{g_1}, \dots, \textbf{diag}(w_g) r_{g_t} \dots, \textbf{diag}(w_g) r_{g_{T-1}} \Big)$

This approximation: 1) is computationally less expensive compared to computing the exact Hessian 2) ensures the matrix is [positive semidefinite](https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm).

## Reference
[Physically-Consistent Sensor Fusion in Contact-Rich Behaviors](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=31c0842aa1d4808541a64014c24928123e1d949e).
Kendall Lowrey, Svetoslav Kolev, Yuval Tassa, Tom Erez, Emo Todorov. 2014.

## API
[Direct API](../mjpc/direct/direct.h)
173 changes: 44 additions & 129 deletions docs/ESTIMATORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,139 +2,22 @@

**Table of Contents**

- [Batch Estimator](#batch-estimator)
- [Cost Function](#cost-function)
- [Cost Derivatives](#cost-derivatives)
- [Filter](#batch-filter)
- [Settings](#batch-settings)
- [Reference](#batch-reference)
- [Extended Kalman Filter](#kalman-filter)
- [Extended Kalman Filter](#extended-kalman-filter)
- [Algorithm](#kalman-algorithm)
- [Reference](#kalman-reference)
- [Unscented Kalman Filter](#unscented-filter)
- [API](#kalman-api)
- [Unscented Kalman Filter](#unscented-kalman-filter)
- [Algorithm](#unscented-algorithm)
- [Reference](#unscented-reference)
- [API](#unscented-api)
- [Batch Estimator](#batch-estimator)
- [Prior](#batch-prior)
- [Reference](#reference)

# Batch Estimator

## Cost Function
```math
\begin{aligned}
\underset{q_{0:T}}{\text{minimize }} & \quad \sum_{t = 1}^{T - 1} \Big(\sum_{i = 1}^{S} w_s^{(i)} \textbf{n}^{(i)}(s^{(i)}(q_t, v_t, a_t, u_t) - y_t^{(i)}) + \frac{1}{2} \lVert g(q_t, v_t, a_t, u_t) - \tau_t \rVert_{\textbf{diag}(w_{g})}^2 \Big) \\
\text{subject to} & \quad v_t = (q_t - q_{t - 1}) / h\\
& \quad a_t = (q_{t + 1} - 2 q_t + q_{t - 1}) / h^2 = (v_{t+1} - v_t) / h,
\end{aligned}
```

The constraints are handled implicitly. Velocities and accelerations are computed using finite-difference approximations from the configuration decision variables.

**Variables**
- $q \in \mathbf{R}^{n_q}$: configuration ```[qpos]```
- $v \in \mathbf{R}^{n_v}$: velocity ```[qvel]```
- $a \in \mathbf{R}^{n_v}$: accelerations ```[qacc]```
- $u \in \mathbf{R}^{n_u}$: action ```[ctrl]```
- $y \in \mathbf{R}^{n_s}$: sensor measurement ```[sensordata]```
- $\tau \in \mathbf{R}^{n_v}$: inverse dynamics force ```[qfrc_actuator]```
- $h \in \mathbf{R}_{+}\,$: time step ```[timestep]```
- $T$: estimation horizon
- $t$: discrete time step

**Models**
- $s : \mathbf{R}^{n_q} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_s}$: sensor model
- $g : \mathbf{R}^{n_q} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_v} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_v}$: inverse dynamics model
- $\textbf{n}: \mathbf{R}^{n} \rightarrow \mathbf{R}_{+}$: user-specified [convex norm](../mjpc/norm.h)

**Weights**

Sensors:
- $w_s \in \mathbf{R}_{+}^{S}$:

Forces:
- $w_g \in \mathbf{R}_{+}^{n_v}$:

Computed with user inputs:
- $\sigma_s \in \mathbf{R}_{+}^S$: sensor noise
- $\sigma_g \in \mathbf{R}_{+}^{n_v}$: process noise

Rescaled:
- $w_s^{(i)} = p / (\sigma_s^{(i)} \cdot n_s^{(i)} \cdot (T - 1))$
```math
p = \begin{cases} h^2 & {\texttt{settings.time\_scaling \& acceleration sensor}} \\ h & {\texttt{settings.time\_scaling \& velocity sensor}} \\ 1 & {\texttt{else}}\end{cases}
```
- $w_g^{(i)} = p / (\sigma_g^{(i)} \cdot n_v \cdot (T - 1))$
```math
p = \begin{cases} h^2 & {\texttt{settings.time\_scaling}} \\ 1 & {\texttt{else}} \end{cases}
```

## Cost Derivatives

**Residuals**

Sensor residual

$r_s = \Big(s^{(0)}(q_1, v_1, a_1, u_1) - y_1^{(0)}, \dots, s^{(i)}(q_t, v_t, a_t, u_t) - y_t^{(i)}, \dots, s^{(S)}(q_{T - 1}, v_{T - 1}, a_{T - 1}, u_{T - 1}) - y_{T -1 }^{(S)}\Big) \in \mathbf{R}^{n_s (T - 2)}$

Force residual

$r_g = \Big(g(q_1, v_1, a_1, u_1) - \tau_1, \dots, g(q_t, v_t, a_t, u_t) - \tau_t, \dots, g(q_{T - 1}, v_{T - 1}, a_{T - 1}, u_{T - 1}) - \tau_{T - 1}\Big) \in \mathbf{R}^{n_v (T - 2)}$

**Gradient**

The gradient of the cost $c$ with respect to the configuration trajectory $q_{0:T}$:

$d c/ d q_{0:T} = J_s^T N_s + J_g^T N_g$

where

- $J_s = dr_s / d q_{0:T} \in \mathbf{R}^{n_s (T - 2) \times n_v T}$: Jacobian of sensor residual with respect to configuration trajectory
- $J_g = dr_g / d q_{0:T} \in \mathbf{R}^{n_v (T - 2) \times n_v T}$: Jacobian of force residual with respect to configuration trajectory
- $N_s = \Big(w_s^{(0)} d \textbf{n}^{(0)} / d r_{s_1}^{(0)}, \dots, w_s^{(i)} d \textbf{n}^{(i)} / d r_{s_t}^{(i)}, \dots, w_s^{(S)} d \textbf{n}^{(S)} / d r_{s_{T-1}}^{(S)} \Big)$
- $N_g = \Big(\textbf{diag}(w_g) r_{g_{1}}, \dots, \textbf{diag}(w_g) r_{g_{t}} \dots, \textbf{diag}(w_g) r_{g_{T-1}} \Big)$

**Hessian**

The [Gauss-Newton](https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm) approximation of the cost Hessian:

$d^2 c / d q_{0:T}^2 \approx J_s^T N_{ss} J_s + J_g^T N_{gg} J_g$

where

- $N_{ss} = \textbf{diag}\Big(w_s^{(0)} d^2 \textbf{n}^{(0)} / d (r_{s_1}^{(0)})^2, \dots, w_s^{(i)} d^2 \textbf{n}^{(i)} / d (r_{s_t}^{(i)})^2, \dots, w_s^{(S)} d^2 \textbf{n}^{(S)} / d (r_{s_{T-1}}^{(S)})^2\Big)$
- $N_{gg} = \textbf{diag}\Big(\textbf{diag}(w_g) r_{g_1}, \dots, \textbf{diag}(w_g) r_{g_t} \dots, \textbf{diag}(w_g) r_{g_{T-1}} \Big)$

This approximation: 1) is computationally less expensive compared to computing the exact Hessian 2) ensures the matrix is [positive semidefinite](https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm).

## Filter

An additional *prior* cost

```math
\frac{1}{2} (q_{0:T} - \bar{q}_{0:T})^T P (q_{0:T} - \bar{q}_{0:T})
```

with $P \in \mathbf{S}_{++}^{n_v T}$ and overbar denoting a reference configuration is added to the cost when filtering.

The prior weights
```math
P_{0:T} = \begin{bmatrix} P_{[0:t, 0:t]} & P_{[0:t, t+1:T]} \\ P_{[t+1:T, 0:t]} & P_{[t+1:T, t+1:T]} \end{bmatrix}
```

are recursively updated by conditioning on a subset of the weights
```math
P_{t+1:T | 0:t} = P_{[t+1:T, t+1:T]} - P_{[t+1:T, 0:t]} P_{[0:t, 0:t]}^{-1} P_{[0:t, t+1:T]}
```

## Settings

## Reference
[Physically-Consistent Sensor Fusion in Contact-Rich Behaviors](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=31c0842aa1d4808541a64014c24928123e1d949e).
Kendall Lowrey, Svetoslav Kolev, Yuval Tassa, Tom Erez, Emo Todorov. 2014.

[Batch API](../mjpc/estimators/batch.h)

# Extended Kalman Filter

## Algorithm
## Kalman Algorithm

### Prediction Update

Expand Down Expand Up @@ -168,18 +51,19 @@ P_t &\mathrel{+}= P_t C_t^T (C_t P_t C_t^T + R)^{-1} (C_t * P_t)
- $f: \mathbf{R}^{n_q + n_v + n_a} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_q + n_v + n_a}$: forward dynamics
- $s: \mathbf{R}^{n_q + n_v + n_a} \times \mathbf{R}^{n_u} \rightarrow \mathbf{R}^{n_s}$: sensor model

## Reference
## Kalman Reference
[A New Approach to Linear Filtering and Prediction Problems](https://www.cs.unc.edu/~welch/kalman/media/pdf/Kalman1960.pdf).
Rudolph Kalman. 1960.

[Application Of Statistical Filter Theory To The Optimal Estimation Of Position And Velocity On Board A Circumlunar Vehicle](https://archive.org/details/nasa_techdoc_19620006857/page/n31/mode/2up).
Gerald Smith, Stanley Schmidt, Leonard McGee. 1962.

## Kalman API
[Kalman API](../mjpc/estimators/kalman.h)

# Unscented Kalman Filter

## Algorithm
## Unscented Algorithm

**Sigma points**

Expand Down Expand Up @@ -254,12 +138,43 @@ P_{t+1} &= \Sigma_{xx} - \Sigma_{xs} \Sigma_{ss}^{-1} \Sigma_{xs}^T
- $w_c^{(0)} = w_m^{(0)} + 1 - \alpha^2 + \beta$: covariance weight
- $w_m^{(i)} = w_c^{(i)} = 1 / (2 (n_{dx} + \lambda))$: weights

### Settings
### Unscented Settings
- $\alpha$: controls spread of sigma points around mean
- $\beta$: encodes prior information (Gaussian: $\beta = 2$)

## Reference
## Unscented Reference
[Unscented Filtering and Nonlinear Estimation](https://www.cs.ubc.ca/~murphyk/Papers/Julier_Uhlmann_mar04.pdf).
Simon Julier, Jeffrey Uhulmann. 2004.

## Unscented API
[Unscented API](../mjpc/estimators/unscented.h)

# Batch Estimator

This estimator utilizes the [direct optimizer](DIRECT.md) to formulate a [fixed-lag smoother](https://en.wikipedia.org/wiki/Kalman_filter#Fixed-lag_smoother).

## Batch Prior

An additional *prior* cost

```math
\frac{1}{2} (q_{0:T} - \bar{q}_{0:T})^T P (q_{0:T} - \bar{q}_{0:T})
```

with $P \in \mathbf{S}_{++}^{n_v T}$ and overbar denoting a reference configuration is added to the cost.

The prior weights
```math
P_{0:T} = \begin{bmatrix} P_{[0:t, 0:t]} & P_{[0:t, t+1:T]} \\ P_{[t+1:T, 0:t]} & P_{[t+1:T, t+1:T]} \end{bmatrix}
```

are recursively updated by conditioning on a subset of the weights
```math
P_{t+1:T | 0:t} = P_{[t+1:T, t+1:T]} - P_{[t+1:T, 0:t]} P_{[0:t, 0:t]}^{-1} P_{[0:t, t+1:T]}
```

## Batch Reference
[Physically-Consistent Sensor Fusion in Contact-Rich Behaviors](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=31c0842aa1d4808541a64014c24928123e1d949e).
Kendall Lowrey, Svetoslav Kolev, Yuval Tassa, Tom Erez, Emo Todorov. 2014.

[Batch API](../mjpc/estimators/batch.h)
Loading