Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPPI #257

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

MPPI #257

wants to merge 7 commits into from

Conversation

alberthli
Copy link
Contributor

@alberthli alberthli commented Jan 22, 2024

Implements a vanilla version of MPPI (only updates the mean of the policy distribution - the variance is fixed and diagonal). There are no bells and whistles attached to the policy update - the weights are strictly computed using the return over each noisy rollout. There are a number of possible improvements/variations in the literature.

Currently seems extremely sensitive to the temperature parameter lambda. If this is too high, then the weight associated with bad trajectories will be too high and the planner will do nothing. If this is too low, then the planner essentially becomes predictive sampling.

Features to consider:

  • adding terminal costs
  • including "exploration variance" terms (e.g., the parameter nu [here])
  • updating the variance in any way
  • using a non-diagonal covariance (e.g., the MPOPI paper [here])

@thowell
Copy link
Collaborator

thowell commented Jan 22, 2024

Take a look at the action plots for the OP3 and Walker examples. Seems to be incorrect. Probably something related to nominal_trajectory.

@thowell
Copy link
Collaborator

thowell commented Jan 23, 2024

Can you add the additional cost term:

image

?
Algorithm 1

Should be something like

double c = 0.0;
double* u = trajectory[i].actions.data()
double* eps = noise.data() + i * (model->nu * kMaxTrajectoryHorizon)
for (int t = 0; t < horizon - 1; t++) {
  for (int j = 0; j < model->nu; j++) {
    c += u[t * model->nu + j] * eps[t * model->nu + j] / noise_exploration;
  }
}
trajectory[i].total_return += gamma * c;

added after each rollout to total_return here. Probably need to add gamma as a planner parameter and expose it in the GUI. To make it thread safe, make a copy before the rollouts and pass this copy to the threadpool.

@alberthli
Copy link
Contributor Author

Can you add the additional cost term

One wrinkle here is that total_return generally includes penalties against actuation as well. In path integral control theory, the cost q should only be a function of state. The mjpc codebase decouples the residuals from the planners, so this is hard to enforce - any thoughts?

@thowell
Copy link
Collaborator

thowell commented Jan 24, 2024

Let's add the cost term from the algorithm and default $\gamma$ to zero.

@alberthli
Copy link
Contributor Author

Let's add the cost term from the algorithm and default γ to zero.

But then when γ is nonzero and we have actuation costs in total_return the result still won't match Algorithm 1 for the reason above, no?

@thowell
Copy link
Collaborator

thowell commented Jan 24, 2024

Let's add the cost term. The user can still set the task control cost terms to zero.

@thowell
Copy link
Collaborator

thowell commented Mar 3, 2024

@alberthli can you update this branch so we can merge?

@alberthli
Copy link
Contributor Author

I'll put some time into it next week - currently don't have access to my workstation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants