MPPI #257

alberthli · 2024-01-22T01:51:32Z

Implements a vanilla version of MPPI (only updates the mean of the policy distribution - the variance is fixed and diagonal). There are no bells and whistles attached to the policy update - the weights are strictly computed using the return over each noisy rollout. There are a number of possible improvements/variations in the literature.

Currently seems extremely sensitive to the temperature parameter lambda. If this is too high, then the weight associated with bad trajectories will be too high and the planner will do nothing. If this is too low, then the planner essentially becomes predictive sampling.

Features to consider:

adding terminal costs
including "exploration variance" terms (e.g., the parameter nu [here])
updating the variance in any way
using a non-diagonal covariance (e.g., the MPOPI paper [here])

…tion bug

thowell · 2024-01-22T02:17:36Z

Take a look at the action plots for the OP3 and Walker examples. Seems to be incorrect. Probably something related to nominal_trajectory.

thowell · 2024-01-23T17:49:16Z

Can you add the additional cost term:

?
Algorithm 1

Should be something like

double c = 0.0;
double* u = trajectory[i].actions.data()
double* eps = noise.data() + i * (model->nu * kMaxTrajectoryHorizon)
for (int t = 0; t < horizon - 1; t++) {
  for (int j = 0; j < model->nu; j++) {
    c += u[t * model->nu + j] * eps[t * model->nu + j] / noise_exploration;
  }
}
trajectory[i].total_return += gamma * c;

added after each rollout to total_return here. Probably need to add gamma as a planner parameter and expose it in the GUI. To make it thread safe, make a copy before the rollouts and pass this copy to the threadpool.

alberthli · 2024-01-23T18:44:33Z

Can you add the additional cost term

One wrinkle here is that total_return generally includes penalties against actuation as well. In path integral control theory, the cost q should only be a function of state. The mjpc codebase decouples the residuals from the planners, so this is hard to enforce - any thoughts?

thowell · 2024-01-24T00:30:33Z

Let's add the cost term from the algorithm and default $\gamma$ to zero.

alberthli · 2024-01-24T03:37:42Z

Let's add the cost term from the algorithm and default γ to zero.

But then when γ is nonzero and we have actuation costs in total_return the result still won't match Algorithm 1 for the reason above, no?

thowell · 2024-01-24T18:13:07Z

Let's add the cost term. The user can still set the task control cost terms to zero.

thowell · 2024-03-03T16:36:44Z

@alberthli can you update this branch so we can merge?

alberthli · 2024-03-03T16:40:50Z

I'll put some time into it next week - currently don't have access to my workstation.

alberthli added 6 commits November 30, 2023 21:20

try fix from issue google-deepmind#211 in mujoco_mpc for pip installa…

eb29a21

…tion bug

fixed merge conflict

1493ad5

MPPI first cut

2d967df

format with clang-format

162389f

update docs

bcf9372

expose temperature parameter + change default

65d7614

fixed plotting bug - wrong copy size for nominal traj action data

0331b30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPPI #257

MPPI #257

alberthli commented Jan 22, 2024 •

edited

Loading

thowell commented Jan 22, 2024

thowell commented Jan 23, 2024

alberthli commented Jan 23, 2024

thowell commented Jan 24, 2024

alberthli commented Jan 24, 2024

thowell commented Jan 24, 2024

thowell commented Mar 3, 2024

alberthli commented Mar 3, 2024

MPPI #257

Are you sure you want to change the base?

MPPI #257

Conversation

alberthli commented Jan 22, 2024 • edited Loading

thowell commented Jan 22, 2024

thowell commented Jan 23, 2024

alberthli commented Jan 23, 2024

thowell commented Jan 24, 2024

alberthli commented Jan 24, 2024

thowell commented Jan 24, 2024

thowell commented Mar 3, 2024

alberthli commented Mar 3, 2024

alberthli commented Jan 22, 2024 •

edited

Loading